Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow cdi mode to work with --gpus flag #894

Merged

Conversation

elezar
Copy link
Member

@elezar elezar commented Feb 5, 2025

This changes ensures that the cdi modifier also removes the NVIDIA
Container Runtime Hook from the incoming spec. This aligns with what is
done for CSV modifications and prevents an error when starting the
container.

@elezar elezar added the must-backport The changes in PR need to be backported to at least one stable release branch. label Feb 5, 2025
@elezar elezar requested a review from jgehrcke February 5, 2025 10:15
@elezar elezar self-assigned this Feb 5, 2025
@elezar
Copy link
Member Author

elezar commented Feb 5, 2025

cc @tmonty12

@elezar elezar force-pushed the remove-nvidia-container-runtime-hook-in-cdi-mode branch from d7ffa9d to b7762b1 Compare February 5, 2025 10:17
@elezar elezar force-pushed the remove-nvidia-container-runtime-hook-in-cdi-mode branch 3 times, most recently from 0ea4964 to cb7c1cf Compare February 5, 2025 17:03
This changes ensures that the cdi modifier also removes the NVIDIA
Container Runtime Hook from the incoming spec. This aligns with what is
done for CSV modifications and prevents an error when starting the
container.

Signed-off-by: Evan Lezar <[email protected]>
@elezar elezar force-pushed the remove-nvidia-container-runtime-hook-in-cdi-mode branch from cb7c1cf to 03152db Compare February 5, 2025 18:01
@@ -74,6 +74,12 @@ var _ = Describe("docker", Ordered, func() {
Expect(containerOutput).To(Equal(hostOutput))
})

It("should support automatic CDI spec generation with the --gpus flag", func(ctx context.Context) {
Copy link

@jgehrcke jgehrcke Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Judging by the CI check names I see have run on this PR -- this test wasn't executed on those, right? (I was looking for some kind of e2e name).

If this test ran on this PR -- can you point me to a log?

If this test did not run on this PR -- can you briefly explain where/when we run this test? Also, did you run this manually?

(I trust that you did all the right things here; this is just a good opportunity for me to learn what we do/have)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that this triggered the e2e tests, and if it did, I think it probably failed. We definitely need to improve the visibility / traceability of the tests.

One issue is that we trigger the tests once the images are built, but this does not seem to trigger a "checks" entry at a PR level.

@@ -165,3 +166,181 @@ func TestFactoryMethod(t *testing.T) {
})
}
}

func TestNewSpecModifier(t *testing.T) {
Copy link

@jgehrcke jgehrcke Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did this run as part of the CI checks on this PR?

Maybe in https://github.com/NVIDIA/nvidia-container-toolkit/actions/runs/13163645114/job/36738412538?pr=894#step:5:80?

Need to get used to interpreting such test runner log. I see ok github.com/NVIDIA/nvidia-container-toolkit/internal/runtime 0.059s coverage: 39.6% of statements, but I see no mention of TestNewSpecModifier, or of any of the strings below such as "csv mode removes nvidia-container-runtime-hook".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't currently have verbose output enabled for the tests. Running locally with the diff:

diff --git a/Makefile b/Makefile
index be9da4bc..d2cf9b18 100644
--- a/Makefile
+++ b/Makefile
@@ -103,7 +103,7 @@ licenses:
 
 COVERAGE_FILE := coverage.out
 test: build cmds
-	go test -coverprofile=$(COVERAGE_FILE) $(MODULE)/...
+	go test -v -coverprofile=$(COVERAGE_FILE) $(MODULE)/...
 
 coverage: test
 	cat $(COVERAGE_FILE) | grep -v "_mock.go" > $(COVERAGE_FILE).no-mocks
$ make test > test-output-txt
$ grep TestNewSpecModifier test-output.txt
=== RUN   TestNewSpecModifier
=== RUN   TestNewSpecModifier/csv_mode_removes_nvidia-container-runtime-hook
=== RUN   TestNewSpecModifier/csv_mode_removes_nvidia-container-toolkit
=== RUN   TestNewSpecModifier/cdi_mode_removes_nvidia-container-runtime-hook
=== RUN   TestNewSpecModifier/cdi_mode_removes_nvidia-container-toolkit
=== RUN   TestNewSpecModifier/legacy_mode_keeps_nvidia-container-runtime-hook
=== RUN   TestNewSpecModifier/legacy_mode_keeps_nvidia-container-toolkit
--- PASS: TestNewSpecModifier (0.00s)
    --- PASS: TestNewSpecModifier/csv_mode_removes_nvidia-container-runtime-hook (0.00s)
    --- PASS: TestNewSpecModifier/csv_mode_removes_nvidia-container-toolkit (0.00s)
    --- PASS: TestNewSpecModifier/cdi_mode_removes_nvidia-container-runtime-hook (0.00s)
    --- PASS: TestNewSpecModifier/cdi_mode_removes_nvidia-container-toolkit (0.00s)
    --- PASS: TestNewSpecModifier/legacy_mode_keeps_nvidia-container-runtime-hook (0.00s)
    --- PASS: TestNewSpecModifier/legacy_mode_keeps_nvidia-container-toolkit (0.00s)

@elezar elezar merged commit 1f2232f into NVIDIA:main Feb 10, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
must-backport The changes in PR need to be backported to at least one stable release branch.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants