[codex] fix gpu toolkit cdi preflight detection#5529
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (5)
📝 WalkthroughWalkthroughFixes preflight skipping ChangesGPU Detection and CDI Opt-Out Fix
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
c36dfb5 to
f246231
Compare
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR improves host GPU preflight detection and aligns the CDI guard behavior with “explicit opt-out” semantics, including a new test case for PCI-based NVIDIA detection.
Changes:
- Extend NVIDIA GPU detection to fall back to
lspcion Linux whennvidia-smiis unavailable. - Move WSL detection earlier in
assessHostand pass WSL/platform context into the GPU probe. - Update preflight handler logic + tests to keep the CDI guard active on resume unless GPU was explicitly disabled.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/lib/onboard/preflight.ts | Adds PCI fallback for NVIDIA detection and computes isWslHost earlier for downstream checks. |
| src/lib/onboard/preflight-cdi.test.ts | Adds coverage for toolkit bootstrap planning when PCI detects NVIDIA but nvidia-smi/nvidia-ctk are missing. |
| src/lib/onboard/machine/handlers/preflight.ts | Refines “opted out” logic on resume to reflect explicit disable/flags. |
| src/lib/onboard/machine/handlers/preflight.test.ts | Updates expectations + adds regression coverage for resume behavior with auto GPU mode. |
| src/lib/onboard.ts | Renames and refines the “explicit opt-out” passthrough flag used by the CDI guard. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
f246231 to
8c0cc12
Compare
|
@coderabbitai resume |
|
✅ Action performedReview finished.
|
|
@coderabbitai full review |
✅ Action performedFull review finished. |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/lib/onboard.ts (1)
1626-1654: Run the recommended onboard E2E matrix for this behavioral guard change.Given this touches preflight/CDI gating logic in
src/lib/onboard.ts, please run the listed onboarding E2E jobs (especiallycloud-e2e,sandbox-operations-e2e,channels-stop-start-e2e, andissue-3600-gpu-proof-optional-e2e) before merge.As per coding guidelines, changes in
src/lib/onboard.tsshould be validated with the specified nightly E2E job set.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lib/onboard.ts` around lines 1626 - 1654, Before merging the changes to the preflight function in src/lib/onboard.ts which modify the CDI/GPU gating logic, run the complete E2E test matrix including the cloud-e2e, sandbox-operations-e2e, channels-stop-start-e2e, and issue-3600-gpu-proof-optional-e2e jobs to validate the behavioral guard changes to the onboarding flow. This testing is required per coding guidelines for any modifications to src/lib/onboard.ts that touch preflight validation logic.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@src/lib/onboard.ts`:
- Around line 1626-1654: Before merging the changes to the preflight function in
src/lib/onboard.ts which modify the CDI/GPU gating logic, run the complete E2E
test matrix including the cloud-e2e, sandbox-operations-e2e,
channels-stop-start-e2e, and issue-3600-gpu-proof-optional-e2e jobs to validate
the behavioral guard changes to the onboarding flow. This testing is required
per coding guidelines for any modifications to src/lib/onboard.ts that touch
preflight validation logic.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: f97d1410-4b17-4826-b58c-b6a81cf5cc33
📒 Files selected for processing (5)
src/lib/onboard.tssrc/lib/onboard/machine/handlers/preflight.test.tssrc/lib/onboard/machine/handlers/preflight.tssrc/lib/onboard/preflight-cdi.test.tssrc/lib/onboard/preflight.ts
|
@coderabbitai summary |
✅ Action performedSummary regeneration triggered. |
bea9b95 to
40b779c
Compare
|
@coderabbitai full review |
✅ Action performedFull review finished. |
Summary
lspcifallback GPU detection whennvidia-smiis unavailable, so preflight still recognizes NVIDIA PCI GPU hardware.install_nvidia_container_toolkitremediation active for missing toolkit plus missing/invalid NVIDIA CDI spec.Validation
NODE_PATH=/Users/holim/code/NemoClaw/node_modules /Users/holim/code/NemoClaw/node_modules/.bin/vitest run src/lib/onboard/preflight-cdi.test.ts src/lib/onboard/machine/handlers/preflight.test.tsNODE_PATH=/Users/holim/code/NemoClaw/node_modules /Users/holim/code/NemoClaw/node_modules/.bin/tsc -p tsconfig.src.json --noEmitgit diff --checkNotes: #5489 currently has an assignee, so this PR is intentionally limited to the missing preflight detection/remediation path.
Fixes #5489
Summary by CodeRabbit
Release Notes
Bug Fixes
nvidia-smifirst and falling back tolspciwhen needed.noGpuand sandbox GPU mode0) and updated CDI GPU spec guarding accordingly.dockerwhen it was previously unknown.Tests
nvidia-ctkis unavailable and expected blocking remediation/actions are verified.