-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] demonstrate using NVTXW to export internal CUPTI events to nsys-rep #2450
base: branch-24.10
Are you sure you want to change the base?
[WIP] demonstrate using NVTXW to export internal CUPTI events to nsys-rep #2450
Conversation
add nvtxw API headers and implementation code to export CUPTI data to an nys-rep using nvtxw
09a4358
to
0f1a40e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @tcourtneynv! This is exciting!
@@ -0,0 +1,413 @@ | |||
/* | |||
* SPDX-FileCopyrightText: Copyright (c) <year> NVIDIA CORPORATION & AFFILIATES. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Year is not filled in. Also curious why this is using a slightly different license than the rest of the repository (i.e.: ASLv2 with LLVM exception instead of vanilla ASLv2).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ooof. will fix the year. thanks.
We just got this updated header from OSRB for use with NVTX. NvtxwEvents.{h,cpp} are needed inside the NVTXW backend library and also needed by the client. I thought I would use the same header both places but think it is fine to use the normal Spark header too.
For NVTX, we add the "LLVM exceptions" because NVTX is compiled into 3rd party libraries. The exception grants this:
"As an exception, if, as a result of your compiling your source code, portions of this Software are embedded into an Object form of such source code, you may redistribute such embedded portions in such Object form without complying with the conditions of Sections 4(a), 4(b) and 4(d) of the License."
@@ -0,0 +1,188 @@ | |||
/* | |||
* SPDX-FileCopyrightText: Copyright (c) <year> NVIDIA CORPORATION & AFFILIATES. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copyright year not updated from template, same license diff question here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the files under nvtx3 part of an NVTX3 release? If so, we're already pulling in NVTX3 explicitly, and I'm wondering if we can avoid adding all these files under nvtx3 by changing the version of NVTX3 being fetched.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these files are pre-release. They will become part of the public NVTX release, hopefully in early 2025.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this file originate from somewhere else and needs to be kept in sync (or ideally fetched during the build)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this file is part of the pre-released version of NVTXW. It will become part of the public NVTX release, hopefully in early 2025.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same questions on origin of this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing license and copyrights
@@ -740,6 +1134,24 @@ int main(int argc, char* argv[]) | |||
} else { | |||
convert_to_nvtxt(in, std::cout, opts); | |||
} | |||
} else if (opts.nvtxw) { | |||
if (opts.output_path) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be code to handle when an output path is not provided, e.g.: automatically generating an output path derived from the input path by appending .nsys-rep or emitting an error stating an output path is required. As it is now, it silently does nothing which is confusing to the user.
nvtxwInterfaceCore_t *nvtxwInterface = nullptr; | ||
nvtxwSessionHandle_t session; | ||
nvtxwStreamHandle_t stream; | ||
errorCode = initialize_nvtxw(in, opts.output_path.value().stem(), nvtxwModuleHandle, nvtxwInterface, session, stream); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling .stem()
here silently truncates the user's intended path and, surprisingly, emits the file in the current directory rather than where the user asked it to be placed.
CI signoff check is failing because at least one commit must have a signoff. See https://github.com/NVIDIA/spark-rapids-jni/blob/branch-24.10/CONTRIBUTING.md#sign-your-work for details. |
2. Need to set the NVTXW_BACKEND environment variable for the libNvtxwBackend.so library in the host directory a current build of Nsight Systems. For example: | ||
> export NVTXW_BACKEND=/opt/nvidia/nsight-systems/2024.6.0/host-linux-x64/libNvtxwBackend.so |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be an cmdline option of the converter? That makes it much more obvious to the user how to set it. Either way, we should update the error when the backend library is not found to point to how the user can override the path to find it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have a shared dependency with nsight systems like this, I assume we run the risk of having an older report generated by spark-rapids-jni with an older set of NvtxEvents.h/cpp
that are not compatible with the installed nsys at profile conversion time? @tcourtneynv question for you.
This branch adds support for exporting internally collected CUPTI events using NVTXW. NVTXW is an API that allows applications to record their own profiling events and then insert them into an Nsight Systems report (nsys-rep file) in a post-processing step.
The README-nvtxw.txt file outlines the steps to use this feature. The "libNvtxwBackend.so" library mentioned in the readme is included in the Nsight Systems release. The latest release is available here: https://developer.nvidia.com/nsight-systems/get-started. Nsight Systems 2024.6.1 should be posted in the next few days, which is the version I used for testing. In the mean time, 2024.5.1 may also work, or I can provide a link to an internal build server with pre-release build of 2024.6.1, if that would be useful.
Marking this feature as WIP and draft. It works as is but is intended to be a demonstration of how to use NVTXW. Feel free to refine, refactor and improve this code as appropriate.