Skip to content
Open
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 29 additions & 0 deletions .github/workflows/claude.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
name: Claude

on:
issue_comment:
types: [created]
pull_request_review_comment:
types: [created]
issues:
types: [opened]
pull_request_review:
types: [submitted]

jobs:
claude:
if: |
(github.event_name == 'issue_comment' && contains(github.event.comment.body, '@claude')) ||
(github.event_name == 'pull_request_review_comment' && contains(github.event.comment.body, '@claude')) ||
(github.event_name == 'pull_request_review' && contains(github.event.review.body, '@claude')) ||
(github.event_name == 'issues' && contains(github.event.issue.body, '@claude'))
runs-on: ubuntu-latest
permissions:
contents: write
pull-requests: write
issues: write
id-token: write
steps:
- uses: anthropics/claude-code-action@v1
with:
anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
4 changes: 2 additions & 2 deletions .github/workflows/config/.secrets.baseline
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,7 @@
"filename": "docs/testing.md",
"hashed_secret": "3f3b8ce7c4fec509b2b74ee3e1d98170278ffe4b",
"is_verified": false,
"line_number": 116
"line_number": 113
}
],
"tests/unit/test_version_check.py": [
Expand All @@ -144,5 +144,5 @@
}
]
},
"generated_at": "2026-02-24T15:55:12Z"
"generated_at": "2026-04-02T18:53:27Z"
}
2 changes: 1 addition & 1 deletion 3rdparty/Megatron-LM-workspace/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@
# TODO(https://github.com/NVIDIA-NeMo/RL/issues/2111): upgrade to core_cu13 when we move to CUDA 13 base container
"transformer-engine[pytorch,core_cu12]",
# VCS dependency - must match pyproject.toml [tool.uv.sources]
"nvidia-resiliency-ext @ git+https://github.com/NVIDIA/nvidia-resiliency-ext.git@63154570cea17f8805a7fd15cc3b8cc2919ba575",
"nvidia-resiliency-ext @ git+https://github.com/NVIDIA/nvidia-resiliency-ext.git@15a851565f06e279f18c3ac5e1296b1bcb63be24",
"tqdm",
"einops~=0.8",
"tensorstore~=0.1,!=0.1.46,!=0.1.72",
Expand Down
2 changes: 1 addition & 1 deletion docs/about/algorithms/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,12 @@ NeMo RL supports multiple training algorithms for post-training large language m
| Algorithms | Single Node | Multi-node |
|------------|-------------|------------|
| [GRPO](grpo.md) | [GRPO Single Node](grpo.md#grpo-single-node) | [GRPO Multi-node](grpo.md#grpo-multi-node): [GRPO Qwen2.5-32B](grpo.md#grpo-qwen25-32b), [GRPO Multi-Turn](grpo.md#grpo-multi-turn) |
|DAPO (dapo.md)| similar to GRPO example| similar to GRPO example|
| [DAPO](dapo.md) | [DAPO Single Node](dapo.md#dapo-single-node) | [DAPO Multi-node](dapo.md#dapo-multi-node) |
| [On-policy Distillation](on-policy-distillation.md) | [Distillation Single Node](on-policy-distillation.md#on-policy-distillation-single-node) | [Distillation Multi-node](on-policy-distillation.md#on-policy-distillation-multi-node) |
| [Supervised Fine-Tuning (SFT)](sft.md) | [SFT Single Node](sft.md#sft-single-node) | [SFT Multi-node](sft.md#sft-multi-node) |
| [DPO](dpo.md) | [DPO Single Node](dpo.md#dpo-single-node) | [DPO Multi-node](dpo.md#dpo-multi-node) |
| [RM](rm.md) | [RM Single Node](rm.md#rm-single-node) | [RM Multi-node](rm.md#rm-multi-node) |

On-policy distillation is also supported in the PyTorch DTensor path.
```{toctree}
:maxdepth: 2
Expand Down
2 changes: 1 addition & 1 deletion docs/about/model-support.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## Broad coverage for 🤗Hugging Face models via [NeMo AutoModel](https://github.com/NVIDIA-NeMo/Automodel)

NeMo-RL support 🤗Hugging Face models from the following classes
NeMo-RL supports 🤗Hugging Face models from the following classes
- LLMs ([AutoModelForCausalLM](https://huggingface.co/docs/transformers/en/model_doc/auto#transformers.AutoModelForCausalLM))
- VLMs ([AutoModelForImageTextToText](https://huggingface.co/docs/transformers/en/model_doc/auto#transformers.AutoModelForImageTextToText))

Expand Down
8 changes: 4 additions & 4 deletions docs/about/performance-summary.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@

# Performance

As part of the NVIDIA NeMo Framework, NeMo RL, provides optimal performance for reinforcement learning on generative AI models by incorporating the latest optimizations - such as refit optimizations, mixed-precision training, and off-policy training.
As part of the NVIDIA NeMo Framework, NeMo RL provides optimal performance for reinforcement learning on generative AI models by incorporating the latest optimizations - such as refit optimizations, mixed-precision training, and off-policy training.

This page provides performance benchmarks for LLMs and VLMs using NeMo RL across different GPU systems and configurations. The recipes to reproduce these runs, in yaml file form, can be found under [this folder](https://github.com/NVIDIA-NeMo/RL/tree/r0.5.0/examples/configs/recipes/llm/performance).

Expand All @@ -16,13 +16,13 @@ This page provides performance benchmarks for LLMs and VLMs using NeMo RL across
- **EP**: Expert Parallel Size
- **T-**: Training related
- **G-**: Generation related
- **Training backend**: NeMo RL have two training backends: Megatron and PyTorch DTensor. This performance summary currently only shows number from Megatron backend.
- **Training backend**: NeMo RL has two training backends: Megatron and PyTorch DTensor. This performance summary currently only shows numbers from the Megatron backend.

## Performance Metrics

Since reinforcement learning consists of training, generation and transition between the two, performance measurement also reflects this. Specifically, we track the following metrics:
- **Step time**: Time for each step, which includes training, generation, policy logprobs, and refit time.
- **Tokens/sec/GPU**: The rate at the tokens are processed by a stage (such as training, generation, or refitting) on a single GPU:
- **Tokens/sec/GPU**: The rate at which the tokens are processed by a stage (such as training, generation, or refitting) on a single GPU:

$$
\text{Tokens/sec/GPU} = \frac{\text{Total Tokens Processed}}{\text{Time for Stage} \times \text{Number of GPUs}}
Expand Down Expand Up @@ -98,4 +98,4 @@ The performance data includes:
Note:

* All Mixture-of-expert (MoE) model training uses token drop-less.
* The following metrics are extracted from the average of 5 steps: G-Average Seq len, Tokens/sec/gpu, Total Step time(s). Because of the averaging, the numbers in table does not completely match the equation stated in Performance Metrics above but the difference is small.
* The following metrics are extracted from the average of 5 steps: G-Average Seq len, Tokens/sec/gpu, Total Step time(s). Because of the averaging, the numbers in the table do not completely match the equation stated in Performance Metrics above but the difference is small.
2 changes: 1 addition & 1 deletion docs/debugging.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ The first node is always the head node, so we need to port forward the dashboard
# on the login node is likely taken by someone else.
ssh -L $LOCAL_PORT:localhost:$DASHBOARD_PORT -N node-12

# Example chosing a port other than 8265 for the LOCAL_PORT
# Example choosing a port other than 8265 for the LOCAL_PORT
ssh -L 52640:localhost:8265 -N node-12
```

Expand Down
2 changes: 1 addition & 1 deletion docs/fp8.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,4 +93,4 @@ The above results are from Llama-3.1-8B-Instruct GRPO experiments. You can run t
* For FP8: `examples/configs/grpo_math_8B_megatron_fp8.yaml`

In the experiment in this figure, enabling FP8 rollout and training gives 15%-25% decrease in step time, and the validation accuracy curves match up to 1000 steps.
Efforts are ongoing to performs longer runs and further optimize performance.
Efforts are ongoing to perform longer runs and further optimize performance.
2 changes: 1 addition & 1 deletion docs/nsys-profiling.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ export NRL_NSYS_WORKER_PATTERNS="*policy*,*vllm*"

Set the `NRL_NSYS_PROFILE_STEP_RANGE` environment variable to control which training steps the profiler captures. Its
format is colon separated integers representing `start:stop`, where `start` is inclusive and `stop` is exclusive
(same as slice syntax `arr[start:stop]`). Note that the `start` is 1-index, so `NRL_NSYS_PROFILE_STEP_RANGE=0:10` would error.
(same as slice syntax `arr[start:stop]`). Note that the `start` is 1-indexed, so `NRL_NSYS_PROFILE_STEP_RANGE=0:10` would error.

```bash
export NRL_NSYS_PROFILE_STEP_RANGE=3:5
Expand Down
8 changes: 1 addition & 7 deletions docs/testing.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,6 @@ Limitations and tips:
- The remote-aware selection uses a conservative static import map (no dynamic import resolution). If a test loads code dynamically that isn’t visible via imports, you may need to run it explicitly once to seed the map.
- The helper is test-only and does not alter library behavior. It activates automatically when you pass `--testmon`.

Refreshing remote-selection artifacts
### Refreshing Remote-Selection Artifacts
If you change test layout or significantly refactor imports, the remote-selection artifacts may become stale.
To rebuild them, delete the following files at the repo root and re-run with `--testmon` to seed again:
Expand All @@ -68,9 +67,7 @@ rm .nrl_remote_map.json .nrl_remote_state.json

### Run Unit Tests in a Hermetic Environment

For environments lacking necessary dependencies (e.g., `gcc`, `nvcc`)
or where environmental configuration may be problematic, tests can be run
in Docker with this script:
For environments lacking necessary dependencies (e.g., `gcc`, `nvcc`) or where environmental configuration may be problematic, tests can be run in Docker with this script:

```sh
CONTAINER=... bash tests/run_unit_in_docker.sh
Expand Down Expand Up @@ -155,7 +152,6 @@ Functional tests are located under `tests/functional/`.
uv run bash tests/functional/sft.sh
```

At the end of each functional test, the metric checks will be printed as well as
At the end of each functional test, the metric checks will be printed as well as whether they pass or fail. Here is an example:

```text
Expand All @@ -169,8 +165,6 @@ At the end of each functional test, the metric checks will be printed as well as

### Run Functional Tests in a Hermetic Environment

For environments lacking necessary dependencies (e.g., `gcc`, `nvcc`)
or where environmental configuration may be problematic, tests can be run
For environments lacking necessary dependencies (e.g., `gcc`, `nvcc`) or where environmental configuration may be problematic, tests can be run in Docker with this script:

```sh
Expand Down
14 changes: 13 additions & 1 deletion nemo_rl/algorithms/grpo.py
Original file line number Diff line number Diff line change
Expand Up @@ -2515,9 +2515,21 @@ def async_grpo_train(
},
}

# Register trajectory collector as a named Ray actor so the rlix pipeline can
# look it up for set_weight_version calls (spec: nemorl-port-plan.md lines 490, 538, 603).
_rlix_pipeline_id = os.environ.get("PIPELINE_ID", "")
_rlix_ray_namespace = os.environ.get("ROLL_RAY_NAMESPACE", "")
_tc_name = (
f"rlix:trajectory_collector:{_rlix_pipeline_id}"
if _rlix_pipeline_id
else None
)
_tc_namespace = _rlix_ray_namespace if _rlix_ray_namespace else None

# Initialize trajectory collector with synchronized collection
trajectory_collector = AsyncTrajectoryCollector.options(
runtime_env=_tc_runtime_env
runtime_env=_tc_runtime_env,
**({"name": _tc_name, "namespace": _tc_namespace} if _tc_name else {}),
).remote(
policy_generation=policy_generation,
tokenizer=tokenizer,
Expand Down
Loading