Skip to content

Conversation

@hemildesai
Copy link
Contributor

@hemildesai hemildesai commented May 21, 2025

What does this PR do ?

  • Create a venv per each node for the worker group using a ray task
  • Set CUDA_VISIBLE_DEVICES based on bundle index
  • Always reuse external clusters

image

8b run matches expectations

Issues

List issues that this PR closes (syntax):

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

  • ...

@hemildesai hemildesai force-pushed the hemil/k8s-changes branch from b249973 to 8e155f9 Compare May 22, 2025 03:05
Copy link

@gwarmstrong gwarmstrong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested with NeMo-Run updated and it works well

@terrykong terrykong linked an issue May 29, 2025 that may be closed by this pull request
@hemildesai hemildesai force-pushed the hemil/k8s-changes branch from 01286a2 to a9e3f0e Compare June 4, 2025 01:59
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Jun 4, 2025
@hemildesai hemildesai force-pushed the hemil/k8s-changes branch from f0ce5db to ed09205 Compare June 4, 2025 02:01
@hemildesai hemildesai marked this pull request as ready for review June 4, 2025 02:04
@hemildesai hemildesai added the CI:L1 Run doctests, unit tests, and functional tests label Jun 4, 2025
@hemildesai hemildesai force-pushed the hemil/k8s-changes branch 2 times, most recently from 4e37337 to 2cd3f4d Compare June 5, 2025 18:11
@hemildesai hemildesai added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Jun 5, 2025
@hemildesai hemildesai added the CI:L0 Run doctests and unit tests label Jun 5, 2025
terrykong
terrykong previously approved these changes Jun 5, 2025
@hemildesai hemildesai added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Jun 5, 2025
@hemildesai hemildesai force-pushed the hemil/k8s-changes branch from 3f8de44 to 32ed15a Compare June 6, 2025 01:19
@hemildesai hemildesai added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Jun 6, 2025
@hemildesai hemildesai removed the CI:L1 Run doctests, unit tests, and functional tests label Jun 9, 2025
@terrykong terrykong added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L0 Run doctests and unit tests CI:L1 Run doctests, unit tests, and functional tests labels Jun 10, 2025
@hemildesai hemildesai changed the title fix: Changes to support ray job submit fix: Changes to support ray job submit, prefetch venvs Jun 10, 2025
@hemildesai hemildesai changed the title fix: Changes to support ray job submit, prefetch venvs fix: Changes to support ray job submit and prefetch venvs Jun 10, 2025
@hemildesai hemildesai added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Jun 10, 2025
terrykong
terrykong previously approved these changes Jun 10, 2025
@terrykong terrykong enabled auto-merge June 10, 2025 21:29
@hemildesai hemildesai disabled auto-merge June 10, 2025 21:51
Signed-off-by: Hemil Desai <[email protected]>
@hemildesai hemildesai added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Jun 10, 2025
Signed-off-by: Hemil Desai <[email protected]>
@terrykong terrykong added this pull request to the merge queue Jun 10, 2025
Merged via the queue into main with commit 51f8b26 Jun 11, 2025
13 of 14 checks passed
@terrykong terrykong deleted the hemil/k8s-changes branch June 11, 2025 01:58
@terrykong terrykong mentioned this pull request Jun 19, 2025
YzjiaoNvd pushed a commit to YzjiaoNvd/NeMo-RL that referenced this pull request Jul 14, 2025
KiddoZhu pushed a commit that referenced this pull request Jul 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:L1 Run doctests, unit tests, and functional tests documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

NeMo-Run Integration

4 participants