rlops · zhenyulincs · Apr 12, 2026 · Apr 13, 2026 · Apr 19, 2026 · Apr 23, 2026
diff --git a/.github/workflows/claude.yaml b/.github/workflows/claude.yaml
@@ -0,0 +1,29 @@
+name: Claude
+
+on:
+  issue_comment:
+    types: [created]
+  pull_request_review_comment:
+    types: [created]
+  issues:
+    types: [opened]
+  pull_request_review:
+    types: [submitted]
+
+jobs:
+  claude:
+    if: |
+      (github.event_name == 'issue_comment' && contains(github.event.comment.body, '@claude')) ||
+      (github.event_name == 'pull_request_review_comment' && contains(github.event.comment.body, '@claude')) ||
+      (github.event_name == 'pull_request_review' && contains(github.event.review.body, '@claude')) ||
+      (github.event_name == 'issues' && contains(github.event.issue.body, '@claude'))
+    runs-on: ubuntu-latest
+    permissions:
+      contents: write
+      pull-requests: write
+      issues: write
+      id-token: write
+    steps:
+      - uses: anthropics/claude-code-action@v1
+        with:
+          anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
diff --git a/.github/workflows/config/.secrets.baseline b/.github/workflows/config/.secrets.baseline
@@ -131,7 +131,7 @@
         "filename": "docs/testing.md",
         "hashed_secret": "3f3b8ce7c4fec509b2b74ee3e1d98170278ffe4b",
         "is_verified": false,
-        "line_number": 116
+        "line_number": 113
       }
     ],
     "tests/unit/test_version_check.py": [
@@ -144,5 +144,5 @@
       }
     ]
   },
-  "generated_at": "2026-02-24T15:55:12Z"
+  "generated_at": "2026-04-02T18:53:27Z"
 }
diff --git a/3rdparty/Megatron-LM-workspace/setup.py b/3rdparty/Megatron-LM-workspace/setup.py
@@ -51,7 +51,7 @@
     # TODO(https://github.com/NVIDIA-NeMo/RL/issues/2111): upgrade to core_cu13 when we move to CUDA 13 base container
     "transformer-engine[pytorch,core_cu12]",
     # VCS dependency - must match pyproject.toml [tool.uv.sources]
-    "nvidia-resiliency-ext @ git+https://github.com/NVIDIA/nvidia-resiliency-ext.git@63154570cea17f8805a7fd15cc3b8cc2919ba575",
+    "nvidia-resiliency-ext @ git+https://github.com/NVIDIA/nvidia-resiliency-ext.git@15a851565f06e279f18c3ac5e1296b1bcb63be24",
     "tqdm",
     "einops~=0.8",
     "tensorstore~=0.1,!=0.1.46,!=0.1.72",

diff --git a/docs/about/algorithms/index.md b/docs/about/algorithms/index.md
@@ -7,12 +7,12 @@ NeMo RL supports multiple training algorithms for post-training large language m
 | Algorithms | Single Node | Multi-node |
 |------------|-------------|------------|
 | [GRPO](grpo.md) | [GRPO Single Node](grpo.md#grpo-single-node) | [GRPO Multi-node](grpo.md#grpo-multi-node): [GRPO Qwen2.5-32B](grpo.md#grpo-qwen25-32b), [GRPO Multi-Turn](grpo.md#grpo-multi-turn) |
-|DAPO (dapo.md)| similar to GRPO example| similar to GRPO example|
 | [DAPO](dapo.md) | [DAPO Single Node](dapo.md#dapo-single-node) | [DAPO Multi-node](dapo.md#dapo-multi-node) |
 | [On-policy Distillation](on-policy-distillation.md) | [Distillation Single Node](on-policy-distillation.md#on-policy-distillation-single-node) | [Distillation Multi-node](on-policy-distillation.md#on-policy-distillation-multi-node) |
 | [Supervised Fine-Tuning (SFT)](sft.md) | [SFT Single Node](sft.md#sft-single-node) | [SFT Multi-node](sft.md#sft-multi-node) |
 | [DPO](dpo.md) | [DPO Single Node](dpo.md#dpo-single-node) | [DPO Multi-node](dpo.md#dpo-multi-node) |
 | [RM](rm.md) | [RM Single Node](rm.md#rm-single-node) | [RM Multi-node](rm.md#rm-multi-node) |
+
 On-policy distillation is also supported in the PyTorch DTensor path.
 ```{toctree}
 :maxdepth: 2

diff --git a/docs/about/model-support.md b/docs/about/model-support.md
@@ -2,7 +2,7 @@
 
 ## Broad coverage for 🤗Hugging Face models via [NeMo AutoModel](https://github.com/NVIDIA-NeMo/Automodel)
 
-NeMo-RL support 🤗Hugging Face models from the following classes
+NeMo-RL supports 🤗Hugging Face models from the following classes
 - LLMs ([AutoModelForCausalLM](https://huggingface.co/docs/transformers/en/model_doc/auto#transformers.AutoModelForCausalLM))
 - VLMs ([AutoModelForImageTextToText](https://huggingface.co/docs/transformers/en/model_doc/auto#transformers.AutoModelForImageTextToText))
 

diff --git a/docs/about/performance-summary.md b/docs/about/performance-summary.md
@@ -1,7 +1,7 @@
 
 # Performance
 
-As part of the NVIDIA NeMo Framework, NeMo RL, provides optimal performance for reinforcement learning on generative AI models by incorporating the latest optimizations - such as refit optimizations, mixed-precision training, and off-policy training.
+As part of the NVIDIA NeMo Framework, NeMo RL provides optimal performance for reinforcement learning on generative AI models by incorporating the latest optimizations - such as refit optimizations, mixed-precision training, and off-policy training.
 
 This page provides performance benchmarks for LLMs and VLMs using NeMo RL across different GPU systems and configurations. The recipes to reproduce these runs, in yaml file form, can be found under [this folder](https://github.com/NVIDIA-NeMo/RL/tree/r0.5.0/examples/configs/recipes/llm/performance).
 
@@ -16,13 +16,13 @@ This page provides performance benchmarks for LLMs and VLMs using NeMo RL across
 - **EP**: Expert Parallel Size
 - **T-**: Training related
 - **G-**: Generation related
-- **Training backend**: NeMo RL have two training backends: Megatron and PyTorch DTensor. This performance summary currently only shows number from Megatron backend.
+- **Training backend**: NeMo RL has two training backends: Megatron and PyTorch DTensor. This performance summary currently only shows numbers from the Megatron backend.
 
 ## Performance Metrics
 
 Since reinforcement learning consists of training, generation and transition between the two, performance measurement also reflects this. Specifically, we track the following metrics:
 - **Step time**: Time for each step, which includes training, generation, policy logprobs, and refit time.
-- **Tokens/sec/GPU**: The rate at the tokens are processed by a stage (such as training, generation, or refitting) on a single GPU:
+- **Tokens/sec/GPU**: The rate at which the tokens are processed by a stage (such as training, generation, or refitting) on a single GPU:
 
     $$
     \text{Tokens/sec/GPU} = \frac{\text{Total Tokens Processed}}{\text{Time for Stage} \times \text{Number of GPUs}}
@@ -98,4 +98,4 @@ The performance data includes:
 Note:
 
 * All Mixture-of-expert (MoE) model training uses token drop-less. 
-* The following metrics are extracted from the average of 5 steps: G-Average Seq len, Tokens/sec/gpu, Total Step time(s). Because of the averaging, the numbers in table does not completely match the equation stated in Performance Metrics above but the difference is small.
+* The following metrics are extracted from the average of 5 steps: G-Average Seq len, Tokens/sec/gpu, Total Step time(s). Because of the averaging, the numbers in the table do not completely match the equation stated in Performance Metrics above but the difference is small.
diff --git a/docs/debugging.md b/docs/debugging.md
@@ -33,7 +33,7 @@ The first node is always the head node, so we need to port forward the dashboard
 #   on the login node is likely taken by someone else.
 ssh -L $LOCAL_PORT:localhost:$DASHBOARD_PORT -N node-12
 
-# Example chosing a port other than 8265 for the LOCAL_PORT
+# Example choosing a port other than 8265 for the LOCAL_PORT
 ssh -L 52640:localhost:8265 -N node-12
 ```
 

diff --git a/docs/fp8.md b/docs/fp8.md
@@ -93,4 +93,4 @@ The above results are from Llama-3.1-8B-Instruct GRPO experiments. You can run t
 * For FP8: `examples/configs/grpo_math_8B_megatron_fp8.yaml`
 
 In the experiment in this figure, enabling FP8 rollout and training gives 15%-25% decrease in step time, and the validation accuracy curves match up to 1000 steps.
-Efforts are ongoing to performs longer runs and further optimize performance.
+Efforts are ongoing to perform longer runs and further optimize performance.
diff --git a/docs/nsys-profiling.md b/docs/nsys-profiling.md
@@ -22,7 +22,7 @@ export NRL_NSYS_WORKER_PATTERNS="*policy*,*vllm*"
 
 Set the `NRL_NSYS_PROFILE_STEP_RANGE` environment variable to control which training steps the profiler captures. Its
 format is colon separated integers representing `start:stop`, where `start` is inclusive and `stop` is exclusive
-(same as slice syntax `arr[start:stop]`). Note that the `start` is 1-index, so `NRL_NSYS_PROFILE_STEP_RANGE=0:10` would error.
+(same as slice syntax `arr[start:stop]`). Note that the `start` is 1-indexed, so `NRL_NSYS_PROFILE_STEP_RANGE=0:10` would error.
 
 ```bash
 export NRL_NSYS_PROFILE_STEP_RANGE=3:5

diff --git a/docs/testing.md b/docs/testing.md
@@ -55,7 +55,6 @@ Limitations and tips:
 - The remote-aware selection uses a conservative static import map (no dynamic import resolution). If a test loads code dynamically that isn’t visible via imports, you may need to run it explicitly once to seed the map.
 - The helper is test-only and does not alter library behavior. It activates automatically when you pass `--testmon`.
 
-Refreshing remote-selection artifacts
 ### Refreshing Remote-Selection Artifacts
 If you change test layout or significantly refactor imports, the remote-selection artifacts may become stale.
 To rebuild them, delete the following files at the repo root and re-run with `--testmon` to seed again:
@@ -68,9 +67,7 @@ rm .nrl_remote_map.json .nrl_remote_state.json
 
 ### Run Unit Tests in a Hermetic Environment
 
-For environments lacking necessary dependencies (e.g., `gcc`, `nvcc`)
-or where environmental configuration may be problematic, tests can be run
-in Docker with this script:
+For environments lacking necessary dependencies (e.g., `gcc`, `nvcc`) or where environmental configuration may be problematic, tests can be run in Docker with this script:
 
 ```sh
 CONTAINER=... bash tests/run_unit_in_docker.sh
@@ -155,7 +152,6 @@ Functional tests are located under `tests/functional/`.
 uv run bash tests/functional/sft.sh
 ```
 
-At the end of each functional test, the metric checks will be printed as well as
 At the end of each functional test, the metric checks will be printed as well as whether they pass or fail. Here is an example:
 
 ```text
@@ -169,8 +165,6 @@ At the end of each functional test, the metric checks will be printed as well as
 
 ### Run Functional Tests in a Hermetic Environment
 
-For environments lacking necessary dependencies (e.g., `gcc`, `nvcc`)
-or where environmental configuration may be problematic, tests can be run
 For environments lacking necessary dependencies (e.g., `gcc`, `nvcc`) or where environmental configuration may be problematic, tests can be run in Docker with this script:
 
 ```sh

diff --git a/nemo_rl/algorithms/grpo.py b/nemo_rl/algorithms/grpo.py
@@ -2515,9 +2515,21 @@ def async_grpo_train(
         },
     }
 
+    # Register trajectory collector as a named Ray actor so the rlix pipeline can
+    # look it up for set_weight_version calls (spec: nemorl-port-plan.md lines 490, 538, 603).
+    _rlix_pipeline_id = os.environ.get("PIPELINE_ID", "")
+    _rlix_ray_namespace = os.environ.get("ROLL_RAY_NAMESPACE", "")
+    _tc_name = (
+        f"rlix:trajectory_collector:{_rlix_pipeline_id}"
+        if _rlix_pipeline_id
+        else None
+    )
+    _tc_namespace = _rlix_ray_namespace if _rlix_ray_namespace else None
+
     # Initialize trajectory collector with synchronized collection
     trajectory_collector = AsyncTrajectoryCollector.options(
-        runtime_env=_tc_runtime_env
+        runtime_env=_tc_runtime_env,
+        **({"name": _tc_name, "namespace": _tc_namespace} if _tc_name else {}),
     ).remote(
         policy_generation=policy_generation,
         tokenizer=tokenizer,