Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
78 commits
Select commit Hold shift + click to select a range
c6bde5d
init migrate
billxbf Mar 18, 2026
4ec8c9d
nit
billxbf Mar 18, 2026
3f1b513
nit
billxbf Mar 18, 2026
5b5147b
nit
billxbf Mar 18, 2026
8039701
integration and runtime
billxbf Mar 26, 2026
c856d70
bug fix
billxbf Apr 1, 2026
4295ece
packaging cleanup
billxbf Apr 3, 2026
d9df831
nit
billxbf Apr 3, 2026
aef25d6
swebench, sglang+slime patches, runtime fix
billxbf Apr 14, 2026
3414a56
nit
billxbf Apr 15, 2026
dd2015e
simplify calculator example
billxbf Apr 15, 2026
561ea17
add runtime timeout to prevent orphant processes
billxbf Apr 15, 2026
705f80c
nit
billxbf Apr 16, 2026
c4e6408
pin harness versions for reproduction
billxbf Apr 16, 2026
c56ab90
unify rollout to async-only path
billxbf Apr 16, 2026
45d116a
decouple eval prewarm pool and prepare commands
billxbf Apr 16, 2026
2b38683
split multi-turn trajectories at first user turn
billxbf Apr 16, 2026
3abf2dd
minimize SGLang patch compute overhead
billxbf Apr 16, 2026
bbbfa6c
push rollout completion from server to trainer
billxbf Apr 16, 2026
fbdf596
bundle small fixes from feature review
billxbf Apr 17, 2026
28af4a5
cleanup tests
billxbf Apr 17, 2026
f3e5dc0
drop failed traces to slime
billxbf Apr 17, 2026
5a23b96
packaging cleanups
billxbf Apr 17, 2026
006edf0
share group adv estimator
billxbf Apr 17, 2026
0b2b5ed
nit
billxbf Apr 17, 2026
470419f
nit
billxbf Apr 17, 2026
ac490fb
nit
billxbf Apr 18, 2026
38dd467
dynamic bs + tis, remove adv estimato
billxbf Apr 19, 2026
5818f87
expand instance
billxbf Apr 19, 2026
865cb8e
preflight check
billxbf Apr 20, 2026
c25662e
nit
billxbf Apr 20, 2026
d30c1f7
qwen3.5 support
billxbf Apr 21, 2026
19dad51
prefix merging refactor
billxbf Apr 22, 2026
53bbcde
sglang trace toolcall fix
billxbf Apr 23, 2026
d98ae45
nit
billxbf Apr 23, 2026
9efea91
nit
billxbf Apr 23, 2026
e12f7a0
nit
billxbf Apr 23, 2026
28fa4b7
append metadata to trace
billxbf Apr 24, 2026
f17b7a9
swegym grpo exp
billxbf Apr 26, 2026
11da026
tool & mask
billxbf May 4, 2026
c75f202
nit
billxbf May 4, 2026
726ddaa
nit
billxbf May 4, 2026
4526377
apptainer native grpo example
billxbf May 4, 2026
8c737c5
timeout decouples queuing
billxbf May 4, 2026
03e71cd
 nit
billxbf May 5, 2026
7cdf31f
documentations
billxbf May 6, 2026
c23d868
Revise README content and formatting
billxbf May 6, 2026
4ccdb70
Update README links and descriptions for clarity
billxbf May 6, 2026
902cfaf
Update README.md
billxbf May 6, 2026
98b3583
Update README.md
billxbf May 6, 2026
6eac835
nit
billxbf May 6, 2026
e7c5b84
low fraction filter
billxbf May 7, 2026
c7eac76
qwencode support on sweb-v
billxbf May 11, 2026
b42f69e
nit
billxbf May 14, 2026
2b407ba
nit
billxbf May 14, 2026
bc21c80
nit
billxbf May 14, 2026
118c0d4
prepare for refactor
billxbf May 14, 2026
cea6a5d
add training curve
billxbf May 18, 2026
4945ec0
update training curves
billxbf May 19, 2026
f4bda0c
resolve conflict
billxbf May 19, 2026
f7d7741
nit
billxbf May 19, 2026
14d859f
VLM support and count_star example
billxbf May 20, 2026
4203c7a
add polar dashboard
billxbf May 21, 2026
01830b1
Merge branch 'stable' into polar
billxbf May 21, 2026
88bcad4
nit
billxbf May 21, 2026
c4f8607
Merge branch 'polar' of github.com:NVIDIA-NeMo/ProRL-Agent-Server int…
billxbf May 21, 2026
692c8fa
nit
billxbf May 22, 2026
92c2c75
Merge branch 'stable' into polar
billxbf May 22, 2026
2f21e4d
fix reasoning content mapping
billxbf May 28, 2026
657e015
patching transform edge cases
billxbf May 28, 2026
ce6c5f2
vllm dual inference
billxbf May 28, 2026
4e132df
nit
billxbf May 28, 2026
0960180
cleanup examples
billxbf May 29, 2026
5f7f370
clarify harness presets, add openclaw & hermes
billxbf May 29, 2026
5ab54a6
update quickstart examples and docs, upgrade prefix merging robustness
billxbf May 31, 2026
08f6e5c
nit
billxbf May 31, 2026
f6d851f
nit
billxbf Jun 1, 2026
854b703
Potential fix for pull request finding
billxbf Jun 1, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -91,13 +91,14 @@ ipython_config.py
# PyPI config
.pypirc

# Project-local outputs
# Project local
*.log
*.out
*.pkl
batches/
wandb/
checkpoints/
models/
experiments/
rollout_results/
outputs/
Expand All @@ -114,7 +115,7 @@ Megatron-LM/
glm/

# Generated documentation
docs/_build/
docs/
/site

# Dashboard frontend build artifacts
Expand Down
30 changes: 21 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,29 +30,41 @@

#### 🟩 Install the **Rollout Server** (Polar):
```bash
uv venv
uv venv --python 3.13
uv pip install -e .
source .venv/bin/activate
```

#### 🟩 Install the **Inference Server** (SGLang):
### 🟩 Install the **Inference Server** (SGLang or vLLM):

Pick one (that your trainer supports). Avoid installing both under the same environment given dependency conflicts.

**vLLM**
```bash
uv pip install vllm --torch-backend=auto
```

**SGLang**
```bash
uv pip install --prerelease=allow sglang==0.5.10
uv pip install --prerelease=allow sglang==0.5.10 torch==2.9.1+cu128
bash scripts/patch/patch_sglang.sh
```
The patch applies necessary TITO and prompt token id emission on pinned `sglang` version. We'll remove this once upstream supports go through. `vllm` integration is on the way.
The patch applies necessary TITO and prompt token id emission on the pinned `sglang` version. We'll remove this once upstream support goes through.

### 🟩 Install your favorite **Training Framework**:

#### 🟩 Polar is trainer agnostic. So choice of **Trainer** and **Training Backend** are highly flexible given Polar's server boundaries.
Polar is trainer agnostic. So choice of **Trainer** and **Training Backend** are highly flexible given Polar's HTTP server boundaries.

Currently, we provide a demo-purpose [Slime](https://github.com/THUDM/slime) integration in [Slime bridge installation guide](src/slime_bridge/README.md#slime-installation).


#### 🟩 (Optional) For SWE-bench official evaluation harness:
#### (Optional) For SWE-bench official evaluation harness:

```bash
uv pip install -e ".[swebench]"
```

#### 🟩 (Optional) To enable **polar dashboard** UI, build the frontend once.
#### (Optional) To enable **polar dashboard** UI, build the frontend once.

```bash
cd web && npm install && npm run build
Expand All @@ -62,7 +74,7 @@ cd web && npm install && npm run build

## Usage Guide

- ⭐ [Choose your Agent Harness](src/polar/agent/README.md): pick a built-in harness, or use the generic shell harness with wrapped agents.
- ⭐ [Choose your Agent Harness](src/polar/agent/README.md): Express your agent using the generic `shell` harness, or pick a preset shortcut.
- 🚀 [Trajectory Construction and Eval](src/polar/trajectory/README.md): See [builder](src/polar/trajectory/builder/README.md) and
[evaluator](src/polar/trajectory/evaluator/README.md) guides for registered strategies.
- 🔧 [Deployment Topology](src/polar/config/README.md): configure the Polar service.
Expand Down Expand Up @@ -110,7 +122,7 @@ Our development goal for **Polar** is low-intrusion and neutral, finding the low
- [x] Slime bridge & RL example.
- [x] CUA (VLM / VLA) Support.
- [ ] More built-in evaluators (eg. self distillation with textual feedback).
- [ ] vLLM dual inference support.
- [x] vLLM dual inference support.
- [ ] More trainer bridges (NemoRL, VERL, etc.).

</td>
Expand Down
Binary file added assets/dashboard_calculator.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/dashboard_trajectory.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
105 changes: 30 additions & 75 deletions examples/calculator/README.md
Original file line number Diff line number Diff line change
@@ -1,110 +1,65 @@
# Calculator Example

This is a small end-to-end Polar rollout example. Each agent gets a tiny
`calculator.py` file with parser stubs, edits it, and the evaluator runs
`python3 test_calculator.py`.
The smallest end-to-end Polar run. Each harness gets a tiny `calculator.py`
with parser stubs, edits it, and the evaluator runs `python3 test_calculator.py`.
Use it as a quick smoke test that rollout, gateway, runtime, harness execution,
and evaluation all work together.

Use this example when you want a quick local check that rollout, gateway,
runtime setup, harness execution, and evaluation still work together.
## Prerequisites

The topology setup is used on 4 x B200 GPUs. Adjust based on your hardware.
Install **Polar** and **vLLM** as described in the [top-level README](../../README.md#installation).
This example uses 1 node 8×B200 — two vLLM servers (tensor-parallel 4 each).
Adjust the setup and topology for your hardware.

## What It Runs
## Quick Start

- rollout server on `:8080`
- two gateway nodes on `:8100` and `:8101`
- two local SGLang backends on `:8000` and `:8001`
- one shared runtime image: `polar-localhost-calculator:latest`
- six harnesses: `claude_code`, `codex`, `gemini_cli`, `opencode`, `pi`,
`qwen_code`

The default scripts use Docker. Apptainer is also supported with
`--backend apptainer`.

## Setup

From the repo root:

```bash
uv venv
uv pip install -e .
uv pip install --prerelease=allow sglang==0.5.10
bash scripts/patch/patch_sglang.sh
```

Build the runtime image once:
### 1. Build the runtime image (once)

```bash
uv run python examples/calculator/build_image.py
```

## Start Services

Start two SGLang servers, one per GPU group:
### 2. Start two vLLM servers

```bash
CUDA_VISIBLE_DEVICES=0 uv run python -m sglang.launch_server \
--model-path Qwen/Qwen3.5-4B \
--host 0.0.0.0 \
--port 8000 \
--tool-call-parser qwen3_coder \
--reasoning-parser qwen3 \
--mem-fraction-static 0.7 \
--context-length 262144 \
--trust-remote-code
```
CUDA_VISIBLE_DEVICES=0,1,2,3 uv run vllm serve Qwen/Qwen3.6-27B --port 8000 \
--tensor-parallel-size 4 --max-model-len 262144 \
--reasoning-parser qwen3 --enable-auto-tool-choice --tool-call-parser qwen3_coder

```bash
CUDA_VISIBLE_DEVICES=1 uv run python -m sglang.launch_server \
--model-path Qwen/Qwen3.5-4B \
--host 0.0.0.0 \
--port 8001 \
--tool-call-parser qwen3_coder \
--reasoning-parser qwen3 \
--mem-fraction-static 0.7 \
--context-length 262144 \
--trust-remote-code
CUDA_VISIBLE_DEVICES=4,5,6,7 uv run vllm serve Qwen/Qwen3.6-27B --port 8001 \
--tensor-parallel-size 4 --max-model-len 262144 \
--reasoning-parser qwen3 --enable-auto-tool-choice --tool-call-parser qwen3_coder
```

Start Polar:
### 3. Start Polar Servers

```bash
uv run polar serve_rollout -c examples/calculator/topology.yaml
```

```bash
uv run polar serve_gateway -c examples/calculator/topology.yaml --node-id localhost-node-01
```

```bash
uv run polar serve_gateway -c examples/calculator/topology.yaml --node-id localhost-node-02
```

## Run
### 4. Run

Run every harness:
Submits example harness at once and prints a reward comparison:

```bash
uv run python examples/calculator/submit_all.py
uv run python examples/calculator/run.py
```

Run one harness:
Use Apptainer instead of Docker with `--backend apptainer`.

```bash
uv run python examples/calculator/submit_calculator_task.py claude_code
```

Use Apptainer instead of Docker:
### 5. (Optional) Watch in the dashboard

```bash
uv run python examples/calculator/submit_all.py --backend apptainer
uv run polar dashboard -c examples/calculator/topology.yaml
```

Results are written under:
Open <http://127.0.0.1:8090> to inspect live tasks, sessions, trajectories,
and evaluations.

```text
examples/calculator/batches/<timestamp>/
```
<p align="center">
<img src="../../assets/dashboard_calculator.png" alt="Calculator dashboard" width="400">
<img src="../../assets/dashboard_trajectory.png" alt="Trajectory view" width="400">
</p>

Each harness directory contains `request.json`, `response.json`, and
`summary.json`.
Loading