Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] demonstrate using NVTXW to export internal CUPTI events to nsys-rep #2450

Draft
wants to merge 2 commits into
base: branch-24.10
Choose a base branch
from

Conversation

tcourtneynv
Copy link

This branch adds support for exporting internally collected CUPTI events using NVTXW. NVTXW is an API that allows applications to record their own profiling events and then insert them into an Nsight Systems report (nsys-rep file) in a post-processing step.

The README-nvtxw.txt file outlines the steps to use this feature. The "libNvtxwBackend.so" library mentioned in the readme is included in the Nsight Systems release. The latest release is available here: https://developer.nvidia.com/nsight-systems/get-started. Nsight Systems 2024.6.1 should be posted in the next few days, which is the version I used for testing. In the mean time, 2024.5.1 may also work, or I can provide a link to an internal build server with pre-release build of 2024.6.1, if that would be useful.

Marking this feature as WIP and draft. It works as is but is intended to be a demonstration of how to use NVTXW. Feel free to refine, refactor and improve this code as appropriate.

add nvtxw API headers and implementation code
to export CUPTI data to an nys-rep using nvtxw
Copy link
Contributor

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @tcourtneynv! This is exciting!

@@ -0,0 +1,413 @@
/*
* SPDX-FileCopyrightText: Copyright (c) <year> NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Year is not filled in. Also curious why this is using a slightly different license than the rest of the repository (i.e.: ASLv2 with LLVM exception instead of vanilla ASLv2).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooof. will fix the year. thanks.

We just got this updated header from OSRB for use with NVTX. NvtxwEvents.{h,cpp} are needed inside the NVTXW backend library and also needed by the client. I thought I would use the same header both places but think it is fine to use the normal Spark header too.

For NVTX, we add the "LLVM exceptions" because NVTX is compiled into 3rd party libraries. The exception grants this:

"As an exception, if, as a result of your compiling your source code, portions of this Software are embedded into an Object form of such source code, you may redistribute such embedded portions in such Object form without complying with the conditions of Sections 4(a), 4(b) and 4(d) of the License."

@@ -0,0 +1,188 @@
/*
* SPDX-FileCopyrightText: Copyright (c) <year> NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copyright year not updated from template, same license diff question here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the files under nvtx3 part of an NVTX3 release? If so, we're already pulling in NVTX3 explicitly, and I'm wondering if we can avoid adding all these files under nvtx3 by changing the version of NVTX3 being fetched.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these files are pre-release. They will become part of the public NVTX release, hopefully in early 2025.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this file originate from somewhere else and needs to be kept in sync (or ideally fetched during the build)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file is part of the pre-released version of NVTXW. It will become part of the public NVTX release, hopefully in early 2025.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same questions on origin of this file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing license and copyrights

@@ -740,6 +1134,24 @@ int main(int argc, char* argv[])
} else {
convert_to_nvtxt(in, std::cout, opts);
}
} else if (opts.nvtxw) {
if (opts.output_path) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be code to handle when an output path is not provided, e.g.: automatically generating an output path derived from the input path by appending .nsys-rep or emitting an error stating an output path is required. As it is now, it silently does nothing which is confusing to the user.

nvtxwInterfaceCore_t *nvtxwInterface = nullptr;
nvtxwSessionHandle_t session;
nvtxwStreamHandle_t stream;
errorCode = initialize_nvtxw(in, opts.output_path.value().stem(), nvtxwModuleHandle, nvtxwInterface, session, stream);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling .stem() here silently truncates the user's intended path and, surprisingly, emits the file in the current directory rather than where the user asked it to be placed.

@jlowe
Copy link
Contributor

jlowe commented Oct 3, 2024

CI signoff check is failing because at least one commit must have a signoff. See https://github.com/NVIDIA/spark-rapids-jni/blob/branch-24.10/CONTRIBUTING.md#sign-your-work for details.

Comment on lines +3 to +4
2. Need to set the NVTXW_BACKEND environment variable for the libNvtxwBackend.so library in the host directory a current build of Nsight Systems. For example:
> export NVTXW_BACKEND=/opt/nvidia/nsight-systems/2024.6.0/host-linux-x64/libNvtxwBackend.so
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be an cmdline option of the converter? That makes it much more obvious to the user how to set it. Either way, we should update the error when the backend library is not found to point to how the user can override the path to find it.

Copy link
Collaborator

@abellina abellina Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have a shared dependency with nsight systems like this, I assume we run the risk of having an older report generated by spark-rapids-jni with an older set of NvtxEvents.h/cpp that are not compatible with the installed nsys at profile conversion time? @tcourtneynv question for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants