Releases · NVIDIA-NeMo/Gym

Release list

v0.4.0 Latest

Latest

nemo-automation-bot released this 02 Jul 19:28

v0.4.0

d67ad66

Release Summary

NeMo Gym v0.4.0 expands evaluation tooling and agent integrations. It establishes a new monthly release cadence; we will continue to provide day-zero support for Nemotron models, datasets, and environments.

Highlights:

Unified gym CLI: find agents and benchmarks by name with gym list, and catch config mistakes early with gym env validate
Diagnose evaluations with BLADE, an analysis skill for agents that reads your evaluation results and produces an evidence-backed report of which tasks failed, why, and the highest-impact fix (e.g. to the agent harness, training, verifier, or prompt)
Measure the impact of agent skills: run the same tasks with different skill sets and compare how each changes agent performance
Run agents in isolated sandboxes through a new pluggable provider framework
More agent harnesses out of the box, including OpenClaw, Pi, and OpenCode
Connect to hosted inference providers: Fireworks, Together.ai, OpenRouter, and more
New benchmarks across science, long-context, and interactive tasks

First-Time Contributors

We welcomed 20+ new contributors to this release! A few highlights:

@marta-sd and @wprazuch led the CLI refactor and clearer config errors
@hemildesai added the pluggable sandbox provider infrastructure and OpenSandbox as the first built-in
@adil-a laid the groundwork for Gym-owned MCP resources servers, letting a server expose its tools over MCP
@eric-tramel added the BunsenChem chemistry benchmark
@jeffwillette added the long machine translation datasets and servers

Thank you to all the new contributors for helping make NeMo Gym better!

Command Line Interface

One gym command for the full workflow, with gym env, gym eval, gym list, and gym dataset subcommands
Reference agents, benchmarks, and environments by name: use gym list to see what is available
gym env validate checks your config for missing, malformed, or empty values before a run and reports actionable errors

Evaluation & Diagnostics

Skill evaluation: measure how agent skills affect performance by running the same tasks with different skill sets. Skills apply at rollout time as a run-level knob, so one dataset works across all skill variants and every rollout is tagged for comparison
BLADE (Benchmark Level Analysis and Diagnostics Engine): a built-in analysis skill that reads an agent run's rollouts, metrics, and configs and produces an evidence-backed report of which tasks failed, why, and the highest-impact fix (e.g. harness, training, verifier, or prompt)

Sandboxing

Run tool-using and coding agents in isolated sandboxes through a pluggable provider framework
Built-in OpenSandbox and Apptainer providers, with third-party providers discoverable via entry points

Configure Agent Harnesses

New harnesses join the existing built-in set (Claude Code, Hermes, OpenHands, and more):

Added OpenCode, OpenClaw, and Pi agents for evaluation
Claude Code runtime capabilities (tool access, MCP servers, and bare vs. native auto-discovery mode) are now easily set via the server config

Configure Models

New inference_provider model server connects to any OpenAI-compatible hosted provider (Fireworks, Together.ai, OpenRouter, DeepInfra, Gemini, and more) with ready-made configs
Every Gym model server now speaks the Anthropic Messages API, so Anthropic-native harnesses like the Claude Code CLI can run against any model you serve with Gym

New Benchmarks

Science: CritPt (research-level physics), SciCode (scientific coding), BunsenChem (chemistry multiple-choice), and FrontierScience Research (rubric-scored science)
Long context: Graphwalks (long-context graph reasoning) and Long Machine Translation (PG19, WMT24++)
Interactive: TALES, a text-adventure game suite

See the Available Environments table for the full list.

Deprecation Notices

The legacy ng_* and nemo_gym_* CLI commands (such as ng_run and ng_collect_rollouts) are deprecated in favor of the unified gym CLI. They still work for now but will be removed in a future release.

Bug Fixes

Fixed intermittent connection errors during high-concurrency rollout collection
Clear error messages instead of crashes when a config file contains invalid YAML

Documentation

New Build Verifiers section with verification patterns and multi-reward verification
New Evaluate section covering benchmarks, evaluation metrics, and a guide to agent-native results diagnostics
New page for configuring and evaluating agent skills

Full Changelog

ci: bump _release_library.yml to v1.4.3 (#1508) by @ko3n1g
fix(vllm_model): use reasoning parser option in converter (#1511) by @cmunley1
fix: Compatibility with vllm 0.20 tool-calling (#1432) by @tdene
ci: require SHA for release-ref, fix duplicate changelog, add release docs (#1536) by @kajalj22
added long machine translation datasets and servers (#1458) by @jeffwillette
feat(genrm_compare): add style density penalty for formatting control (#1543) by @macandro96
fix: add example data + metrics for longmt_eval (#1559) by @ananthsub
Add LC benchmarks (#1437) by @hsiehjackson
removing questions/expected_answers type formating in ns_tools (#1581) by @OliviaViessmann
ci: parallelize the server test suite (in-process concurrency, ~17min → faster locally + CI) (#1577) by @wprazuch
ci: pin uv to 0.11.19 (0.11.20 resolver regression breaks the test suite) (#1576) by @wprazuch
fix: graphwalks data validation (#1587) by @cmunley1
Support RDKit chemistry answer formats (#1327) by @danecor
feat: multi-reward tool-call environment and reward_components for GDPO (#1525) by @anjalibshah
Add FrontierScience Research benchmark (#1553) by @jiacheng-xu
fix(config): actionable error for unknown server cross-references (#1561) by @wprazuch
docs: add NGC authentication step to GRPO setup tutorial (Fern) (#1552) by @lbliii
feat(cli): document ng_init_resources_server generated config inline (#1205 friction #7) (#1597) by @wprazuch
fix: Fern preview build (#1610) by @chtruong814
docs: document the Gym to RL framework token-ID data interface (#1554) by @ananthsub
Add ArXiv MCP tool config (#1419) by @tamohannes
Add Wikipedia MCP tool config (#1420) by @tamohannes
Add periodictable MCP tool config (#1422) by @tamohannes
ci: add Claude Code review workflow (#1622) by @kajalj22
docs: document use_absolute_ip config option (supersedes #595) (#1621) by @lbliii
Add CoolProp MCP tool config (#1421) by @tamohannes
Add particle MCP tool config (#1423) by @tamohannes
Add radioactive decay MCP tool config (#1424) by @tamohannes
fix: make logprobs capture robust to top_logprobs=null in vllm model (#1612) by @ananthsub
Add SciCode Benchmark (#1592) by @fsiino-nvidia
Add CritPt Benchmark (#1588) by @fsiino-nvidia
fix: resolve CritPt benchmark config interpolation and add critpt_agent README (#1642) by @linj-glitch
docs: describe local_vllm_model and extend docs for vllm_model (#1430) by @marta-sd
Add BLADE analysis skill (#1591) by @jmabry
feat: make claude code agent runtime capabilities configurable (#1603) by @cwing-nvidia
Add sandbox API and mini swe agent 2 resource agent (#1377) by @hemildesai
feat: abstention environment (#1459) by @cmunley1
feat: reasoning gym environment (#1378) by @cmunley1
fix(security): upgrade mlflow, grpcio, torch (longmt_eval) for CVE remediation (#1657) by @kajalj22
feat(docs): add GitHub link to docs navbar (#1654) by @abhay-codes07
chore: vendor gh-stack agent skill (#1616) by @ananthsub
feat: arc agi environment (#1460) by @cmunley1
chore (local_vllm_model): bump vllm 0.17.0 -> 0.20.0 (#1674) by @ananthsub
Add sandbox coverage unit tests (#1684) by @hemildesai
fix: refresh blackjack example rollouts (#1683) by @cmunley1
feat: blackjack environment (#1464) by @cmunley1
feat: instruction following environment (#1403) by @cmunley1
feat: circle vlm environments (#1465) by @cmunley1
feat: calendar environments (#1468) by @cmunley1
feat: code gen environment (#1467) by @cmunley1
fix: ensure client keepalive < server keepalive to avoid client keepalive desync errors (#1555) by @ananthsub
feat: ether0 environment (#1472) by @cmunley1
feat: [GDPval-AA v2 Updates 1 / n] - GDPval Multi-Reference Model Support (#1663) by @vadam5
docs: document stacked pull requests in development setup (#1617) by @ananthsub
docs(config): document the Domain enum (#1205 friction #9 / FEP-1023) (#1633) by @wprazuch
docs: define Resources/Agent/Model Server in the glossary (#1205 friction #9, #395) (#1634) by @wprazuch
fix(config): aggregated error for unset '???' config values (#1575) by @wprazuch
feat: Add a default /v1/messages (Anthropic Messages) route to the base Gym… (#1627) by @ffrujeri
feat: [GDPval-AA v2 Updates 2 / n] - Task Execution Only Mode (#1722) by @vadam5
feat: [GDPval-AA v2 Updates 3 / n] - Judge Only Mode (#1725) by @vadam5
Gym CLI refactor (#1630) by @marta-sd
feat(config): unify dataset source via discriminated source: block (FEP-1025) (#1637) by @wprazuch
feat(config): unified clean errors for bad/malformed/empty config_paths (#1205 #8/#12; #1488/#1489/#1490) (#1609) by @wprazuch
feat: environment registry + 'gym list environments' (#1205 friction #8 / M2) (#1635) by @wprazuch
feat: agent registry — name-based agent discovery + composability (M3 core) (#1671) by @wprazuch
feat(cli): add 'gym env validate' pre-flight config check (#1205 friction #12) (#1599) by @wprazuch
ci: fail notify job when Slack webhook returns an error (#1739) by @kajalj22
Support agent-specific num_repeats in ng_collect_rollouts (#1356) by @gwarmstrong
docs(fern): adding an evaluation section an...

Contributors

gchlebus, yfw, and 37 other contributors

Assets 2

v0.3.0

nemo-automation-bot released this 04 Jun 15:53

v0.3.0

4c44cf9

Release Summary

NeMo Gym v0.3.0 ships alongside the NVIDIA Nemotron 3 Ultra model release, open sourcing the environments and corresponding datasets used during training.

Highlights:

70+ new environments, including benchmarks such as Tau2 and Nemotron RL training environments
Popular harness available out-of-the-box such as Claude Code and Hermes
Integrations with OpenEnv and Harbor - use environments from these libraries directly with NeMo Gym
Integration with VeRL - train with VeRL and scale rollout collection with NeMo Gym

First-Time Contributors

We welcomed 30+ new contributors to this release! Here are a few highlights:

@grace-lam added the integration to run Harbor environments with NeMo Gym
@aleksficek — added Competitive Coding Challenges environment
@jthomson04 improved rollout resilience when models emit malformed tool-call arguments or missing message content

Thank you to all the new contributors for helping make NeMo Gym better!

New Environments & Benchmarks

Added 70+ new environments including novel datasets and integrations of popular benchmarks. New coverage spans:

Coding — competitive programming, code infilling, SQL generation, and software-engineering benchmarks with execution-based verification
Math & proofs — olympiad-style problems, proof grading and validation, and formal verification (including Lean)
Knowledge & science — graduate-level QA, chemistry and physics tasks, and lab-style reasoning (including multimodal figure, table, and protocol tasks)
Agentic — multi-turn tool use, search, sandboxed execution, finance workflows, and tau-bench-style conversational agents
Instruction following — format constraints, citation compliance, and IFBench-style rule verification
Safety & RLHF — jailbreak detection, abstention calibration, prompt-injection resistance, and generative reward modeling
Multimodal, speech & translation — VLM benchmarks, visual grounding, ASR evaluation, and machine-translation quality metrics
Chat & broad knowledge — arena-style preference evaluation and MMLU-family benchmarks
Interactive RL — Gymnasium-style multi-step environments for spatial and game-based training

See the Available Environments table for the full list.

Configure Agent Harnesses

Claude Code — available out of the box in NeMo Gym
Hermes — available out of the box in NeMo Gym
LangGraph agent — an adapter that lets you build custom agents using LangGraph patterns (reflection, subagent orchestration, parallel thinking, rewoo)
Gymnasium agent — generic multi-turn harness for use with OpenAI Gym-style environments

Configure Models

Optional max_concurrent_requests on the OpenAI model server to cap in-flight API calls — useful for rate-limited external endpoints when rollout concurrency is high

Rollout Collection & Profiling

New ng_aggregate_rollouts command to merge rollout shards collected independently across multiple nodes, enabling distributed eval without requiring a single coordinated collection job

Environment Library Integrations

OpenEnv — combine OpenEnv environments with NeMo Gym environments
Harbor — combine Harbor environments with NeMo Gym environments

Deprecation Notices

Documentation has moved from Sphinx to Fern. Old Sphinx URLs redirect to the new site at docs.nvidia.com/nemo/gym. The docs/ directory is no longer used for publishing.

Bug Fixes

Fixed aiohttp connection limit exhaustion under FastAPI/Uvicorn with multiple workers
Fixed session cookie propagation for Starlette >= 1.0.0
Fixed duplicated usage counting and errors on empty usage in subsequent model calls
Improved rollout resilience when models emit malformed tool-call arguments or missing message content
Fixed prompt-key hashing when inputs contain Pydantic BaseModel objects

Documentation

New concepts pages for environments, evaluation, and training
Improved Architecture page to clarify how environments map to NeMo Gym components
Consolidated detailed setup and quickstart into a single improved quickstart with clearer descriptions
Expanded Ecosystem page with environment library, training framework, and agent harness integrations

Changelog Details

feat: VLM circle click environment (#837) by @cmunley1
feat: LocalVLLMModel bump to vLLM 0.17.0 (#839) by @bxyu-nvidia
feat: Status updates for agent refs during rollout collection (#843) by @bxyu-nvidia
feat: ether0 chemistry benchmark environment (#838) by @cmunley1
docs: prime intellect verifiers dataset generation instruction update (#851) by @cmunley1
Finance Agent Environment (#742) by @ushnish-de
feat: Add XSTest safety benchmark resource server (#764) by @dcfarris
Create a guide to build environments in NeMo Gym (#711) by @shashank3959
Add multi-step tool-calling data generation example (#778) by @shashank3959
docs: Fix TRL docs link (#857) by @bxyu-nvidia
Swap readme table columns (to main) (#856) by @fsiino-nvidia
Introduce Benchmarks directory (#858) by @gwarmstrong
add gpqa diamond dataset (#845) by @azkalot1
docs: rl <> gym compatibility table (#803) by @lbliii
Updated contributing guide message (#862) by @cwing-nvidia
docs: Nemotron 3 Super recipe link (#863) by @bxyu-nvidia
Gym 0.2.0 huggingface dataset pointers (#859) by @fsiino-nvidia
Add support for SWE-Multilingual benchmark (#822) by @roclark
chore: Bump python package version to 0.3.0.rc0 and descriptions (#883) by @chtruong814
feat: add Harbor integration (#751) by @grace-lam
docs: Fix MultiChallenge train dataset description (#885) by @bxyu-nvidia
docs: update GPQA-D readme (#888) by @cmunley1
feat: add spider2_lite resource server (#864) by @ryan-lempka
Add prompt config for templating (#861) by @gwarmstrong
Compute aggregate metrics (#890) by @gwarmstrong
Streamline Benchmark rollouts and add aime24/math_with_judge metrics (#891) by @gwarmstrong
added bbh-train support to gym (#894) by @arnavkomaragiri
updated README with license info (#895) by @arnavkomaragiri
feat: VLMEvalKit (#872) by @vadam5
bug: Fix README table display (#897) by @bxyu-nvidia
feat: Initial integration with OpenEnv (#898) by @ahmadki
feat: add aime25 benchmark (#899) by @gwarmstrong
GPQA benchmark (#903) by @gwarmstrong
Structured Outputs update with YAML and XML (#865) by @jkyi-nvidia
feat: langgraph integration (#877) by @vadam5
Add proof environments (#907) by @smahdavi4
feat: Benchmark infra refactors (#906) by @bxyu-nvidia
[Fix] use venv Python for swerl_gen Ray workers instead of hardcoded PYTHONPATH (#920) by @spacegoing
[Fix] guard nltk download with local find() to avoid unnecessary remote fetch (#919) by @spacegoing
[fix] (code_gen): use runtime_env py_executable for Ray workers (#913) by @spacegoing
docs: version bump, CTA link changes (#880) by @vadam5
Add zero reward group option for proof judge environment (#923) by @smahdavi4
fix: always send session cookie for starlette >= 1.0.0 (#942) by @cmunley1
feat: Fix duplicated usage counting and errors on empty usage in subsequent model calls (#939) by @bxyu-nvidia
benchmark: LiveCodeBench v5 and v6 (#933) by @bxyu-nvidia
fix: reasoning gym duplicate license (#947) by @cmunley1
SWE agent refactor (#934) by @sdevare-nv
feat: tee gym server subprocess logs to a configurable directory (#950) by @ananthsub
feat: Browsecomp benchmark exposure (#944) by @bxyu-nvidia
ci: upgrade GitHub Actions for Node.js 24 compatibility (#932) by @ko3n1g
docs: add aiohttp-over-httpx guidance and multi-turn agent patterns (#957) by @cwing-nvidia
feat: add dataset preparation script for spider2_lite (#959) by @ryan-lempka
feat: Start Nemotron 3 Ultra benchmarks config; expose Spider 2 lite and XSTest benchmarks (#958) by @bxyu-nvidia
docs: dataset availability (#962) by @cmunley1
fix: Match torch backend auto in genrm model (#963) by @bxyu-nvidia
Support for multiple gold choices in swerl_llm_judge (#956) by @atefehsz
feat(ether0): Add boxed and Answer: LETTER extraction fallbacks (#925) by @jubick1337
fix: RMtree ignores errors (#964) by @bxyu-nvidia
feat: AALCR and Ruler benchmarks; Misc infra (#966) by @bxyu-nvidia
terminus judge improvement for sim only mode (#968) by @jialeiwang
Abstention Environment (HotpotQA) (#954) by @MahanFathi
chore: bump _code_freeze workflow to v0.86.0 (#978) by @ko3n1g
SWE: update OH version (#979) by @sdevare-nv
fix: Handle BaseModel inputs in prompt-key hashing. (#991) by @ffrujeri
docs: llm-as-a-judge (#926) by @fsiino-nvidia
Add the RDKit-Chemistry RL Environment (#984) by @danecor
feat: mmlu_pro and mmlu_prox benchmarks (#988) by @fsiino-nvidia
feat: Misc infra (#970) by @bxyu-nvidia
feat: Introduce NVARC Resource Server with inductive and transductive modes (#1003) by @cmunley1
Add CVDP benchmark resource server with apptainer instead of docker (#928) by @arti4nvj
feat: add ifbench (#999) by @fsiino-nvidia
Upstream 20260408 (#1039) by @bxyu-nvidia
fix: GenRM lock in order to properly handle concurrent requests. (#1041) by @ffrujeri
Tau2 benchmark (#1049) by @bxyu-nvidia
Add tau2 to Nemotron 3 Ultra benchmarks (#1052) by @bxyu-nvidia
feat: Fix sequential reasoning allowed (#1053) by @bxyu-nvidia
Fix aiohttp connection limit under FastAPI/Uvicorn workers > 1 (#1054) by @bxyu-nvidia
fix: pypi (#1056) by @cmunley1
Additional Tau2 metrics (#1064) by @bxyu-nvidia
Bump version to 0.2.1 and make wheel test mandatory (#1065) by @kajalj22
renamed simple_agent to cvdp_agent for consistency (#1024) by @arti4nvj
feat: VLM counting environment (#930) by @cmunley1
fix: add value field...

Contributors

odelalleau, marta-sd, and 51 other contributors

Assets 2

v0.2.1

chtruong814 released this 15 Apr 22:52

v0.2.1

27e9211

pypi fixes for 0.2.1 patch release by @cmunley1 @kajalj22 :: PR: #1081

Contributors

cmunley1 and kajalj22

Assets 2

v0.2.0

bxyu-nvidia released this 11 Mar 15:03

v0.2.0

3e587db

Release Summary

NeMo Gym v0.2.0 ships alongside the NVIDIA Nemotron 3 Super model release, open sourcing the RL environments and corresponding datasets used during training. Highlights:

17 new training environments across coding, math, science, reasoning, agentic tasks, and safety.
Integrations with Future House Aviary, Open-Thought Reasoning Gym, and Prime Intellect Verifiers let you use environments from these libraries directly within NeMo Gym
End-to-end rollout collection with a locally managed vLLM server
Install directly from PyPI with pip install nemo-gym

First-Time Contributors

We welcomed 15 new contributors to this release! Here are a few highlights:

@sidnarayanan added the Aviary integration to enable training on any Aviary environment, a library of interactive RL environments spanning math, science, biology, and more
@3mei added the text-to-SQL environment to generate SQL queries from natural language across multiple SQL dialects
@Kelvin0110 added the NewtonBench environment to discover scientific laws through interactive experimentation

Thank you to all the new contributors for helping make NeMo Gym better!

Major Features & Improvements

New Environments

Added 17 new resources servers spanning:
- Coding: Text to SQL (#648), SWE RL Gen (#561), SWE RL LLM Judge (#561)
- Math: Lean4 Mathematical Proofs (#563)
- Science: Aviary (#55), NewtonBench (#650)
- Reasoning: MultiChallenge (#654), ARC-AGI (#105), Reasoning Gym (#113)
- Agent tasks: xLAM Function Calling (#262), Tavily Search (#825), Single Step Tool Use with Argument Comparison (#825), Terminus Judge (#594), NeMo Skills Tools (#571)
- Safety: Jailbreak Detection (#825), Over Refusal Detection (#825)
- RLHF: Generative Reward Model Compare (#674)
Added 5 new agent servers: Aviary agent (#55), proof refinement agent (#563), SWE agents (#343), tool simulation agent (#826), and verifiers agent (#573)

Environment Library Integrations
Combine environments from other libraries with NeMo Gym environments

Future House Aviary (#55, #590)
Open-Thought Reasoning Gym (#113)
Prime Intellect Verifiers (#573)

Model Serving

Local vLLM model server with end-to-end rollout collection without an external API (#558, #762)
vLLM 0.16+ support for the reasoning field in responses (#816)
VLLMModel chat template kwargs support (#538, #636)
Per-task chat template and extra body args, enabling per-task control of reasoning mode and thinking budget (#672)

Rollout Collection & Profiling

New ng_reward_profile command to compute per-task pass rates and aggregate metrics (#83, #621)
CPU profiling for rollout performance analysis (#763)
Add option for seeding on num_repeats for rollouts (#740)

Infrastructure & Developer Experience

PyPI compatibility: install via pip install nemo-gym (#649)
Dry run mode: ng_run +dryrun=true to validate configs and install environments without starting servers (#743)
ng_status command to list running servers and their health (#290)
Server stdout/stderr redirection with server name prefixes (#703)
FastAPI worker support for higher throughput across multiple workers (#566)

Model Recipes

Nemotron 3 Nano training recipe (#699)
Nemotron 3 Super training recipe (#863)

Deprecation Notices

Deprecated ng_viewer due to a Gradio security vulnerability. We plan to revisit rollout viewing with a more robust solution in a future release.

Bug Fixes

Fixed 0.1.1 environments to work correctly with RL training pipelines (#768)
Fixed crash when server receives malformed JSON during rollout collection (#770)
Fixed dry run mode failing (#746)
Fixed nested responses_create_params overrides not merging correctly from CLI (#827)
Fixed ng_prepare_data failing when multiple environments define overlapping metrics (#738)
Fixed reward profiling failing when model response doesn't include usage stats (#824)
Fixed NeMo-Skills python tool to use HTTP calls instead of subprocess execution (#606)
Bumped Pillow and other packages to address security vulnerabilities (#667, #739)
ng_dump_config now redacts API key values from output (#567)

Documentation

New training tutorials: Unsloth training with NeMo Gym, multi-environment training
New environment tutorials: creating a training environment, custom data preparation, integrating external environment libraries, environment best practices
Model recipes: reproduce the training for Nemotron 3 Nano and Nemotron 3 Super
Concepts & architecture overhaul: rewrote concepts docs, added architecture diagrams, added agent server and resources server docs
Training approaches: added training approaches docs page covering SFT, RL (GRPO), and RLVR
Ecosystem page: revamped ecosystem page with training framework integrations and environment library integrations
Infrastructure: added SWE RL infrastructure case study, deployment topology docs
Quality pass: redirect sweep, style guide sweep, consistent naming, FAQ additions, broken link fixes

Looking Ahead

VLM support: add support for VLM models and environments with images, e.g. browser environments and computer use agent (CUA) environments
Benchmark environments: add popular OSS environments such as OSWorld, Tau Bench, BrowseComp
Integrate existing agents: integrate popular existing agents, e.g. coding harnesses, as well as agents developed via popular agent frameworks, e.g. LangGraph
Environment tutorials: incorporate more complex agentic loops during training such as multi-turn conversation and user modeling

Release Assets

GitHub Release: https://github.com/NVIDIA-NeMo/Gym/releases/tag/v0.2.0
Container: nvcr.io/nvidia/nemo-rl:v0.5.0.nemotron_3_super

What's Changed

Bump to v0.2.0 by @bxyu-nvidia in #510
reasoning-gym resource server by @cmunley1 in #113
docs: redirect setup by @lbliii in #513
docs: Miscellaneous GRPO tutorial fixes by @bxyu-nvidia in #512
docs settings update by @lbliii in #525
Debug server package versions by @fsiino-nvidia in #406
List running server health and status by @fsiino-nvidia in #290
VLLMModel supports chat template kwargs by @pjin-nvidia in #538
Salesforce xlam-function-calling-60k resources server by @cmunley1 in #262
python flag for colab venv installation by @cmunley1 in #526
add unsloth and trl to docs by @cmunley1 in #536
docs: remove trl docs by @cmunley1 in #543
Remove PlainTextResponse response_class by @fsiino-nvidia in https://github.com/NVIDIA-N...

Contributors

Kipok, ananthsub, and 23 other contributors

Assets 2

v0.1.1

bxyu-nvidia released this 15 Dec 00:32

v0.1.1

414127f

What's Changed

Bump package info for v0.2.0 by @bxyu-nvidia in #337
fix: Update incorrect path in docs: library_judge_math -> math_with_j… by @shashank3959 in #355
Update secret detector to work with forks by @chtruong814 in #358
Removed reference to gitlab master by @hwolff99 in #377
Mark experimental tutorials by @bxyu-nvidia in #386
docs: experimental label by @lbliii in #391
Fixed typos by @hwolff99 in #400
Readme dataset discoverability cont by @fsiino-nvidia in #344
Add absolute ip for multi node by @sdevare-nv in #286
docs: removed "How to Navigate" section from concepts by @ahmadki in #414
docs: Fixed image embedding in core abstractions page by @ahmadki in #410
docs: Fixed Licensing information in structured outputs by @ahmadki in #412
docs: Added hyperlinks to github repo in docs by @ahmadki in #413
docs: Add software / hardware requirements to README and docs. by @ffrujeri in #401
docs: Cleaned the "Quick Start" section in the README by @ahmadki in #411
Display system and version info by @fsiino-nvidia in #347
docs: Improve language around resources servers. by @ffrujeri in #408
docs: Add Create Resource Server Tutorial by @ffrujeri in #407
miniswe w/ offline uv by @sdevare-nv in #357
update vllm model comments by @cmunley1 in #423
docs: linked several terms to their defenition in glossary by @ahmadki in #424
docs: Explain why GPT-4 is used and clarify support for other models by @ahmadki in #425
Removed internal section by @hwolff99 in #430
docs: various improvements and fixes by @ahmadki in #415
docs: Relate sections Get Started and Rollout Collection by @fsiino-nvidia in #426
Guide user on next steps after finishing get started by @cwing-nvidia in #435
Add placeholder author by @jkyi-nvidia in #440
Clarify training environment framing and align docs messaging by @cwing-nvidia in #438
docs: Added CLI documentation by @ahmadki in #444
Change NeMo Gym from framework to library by @cwing-nvidia in #456
Add Data Designer and links to ecosystem page by @cwing-nvidia in #462
docs: Moved configuration system under about by @ahmadki in #420
Add benefits to About page aligned with README by @cwing-nvidia in #452
Explain where the name Gym comes from; Gym Key Terminology doc is missing some of the old material by @bxyu-nvidia in #470
add calendar env for multi-turn IF by @sanjaykariyappa in #297
docs(readme): fix Example Resource Servers table - correct Multi Step… by @lbliii in #464
Remove penguin references by @ahmadki in #469
docs: Training framework integration by @bxyu-nvidia in #439
Bug: inconsistent documentation around servers running by @bxyu-nvidia in #472
docs: Improve server reference info by @bxyu-nvidia in #474
pyproject typos and grammar fixes by @ahmadki in #473
Miscellaneous infra improvements/fixes by @pjin-nvidia in #317
Expose server host and port in dataset viewer CLI by @ahmadki in #476
Rename examples simple_weather and stateful_counter by @fsiino-nvidia in #479
More single tool call filename updates by @fsiino-nvidia in #480
docs: Fix wrong count vs actual by @fsiino-nvidia in #482
Fix duplicate reference sections by @bxyu-nvidia in #483
docs: home pg, quickstart move, gh icon by @lbliii in #463
More single tool call filename updates cont by @fsiino-nvidia in #484
Fix NeMo Gym Pyproject links by @bxyu-nvidia in #486
docs: move FAQ by @lbliii in #489
docs: contribute section by @lbliii in #490
Misc rollout fixes by @pjin-nvidia in #447
improve framing of training framework integration guide for contributing by @cwing-nvidia in #493
Docs: Contribution Home & Dev Setup by @cwing-nvidia in #494
Add environment contribution docs by @cwing-nvidia in #498
FAQ cleanup by @cwing-nvidia in #499
Simplify contributing.md by @cwing-nvidia in #500
Reorder README structure by @cwing-nvidia in #501
docs: End-to-end GRPO Training with NeMo RL tutorial [master branch] by @bxyu-nvidia in #481
Update dataset configs with HuggingFace links by @bxyu-nvidia in #508
Change to v0.1.1 release version by @bxyu-nvidia in #509

New Contributors

@shashank3959 made their first contribution in #355
@hwolff99 made their first contribution in #377
@ahmadki made their first contribution in #414
@ffrujeri made their first contribution in #401
@sanjaykariyappa made their first contribution in #297

Full Changelog: v0.1.0...v0.1.1

Contributors

ffrujeri, sanjaykariyappa, and 12 other contributors

Assets 2

v0.1.0

bxyu-nvidia released this 15 Nov 01:06

v0.1.0

3ac9f35

What's Changed

Add copy-pr-bot by @chtruong814 in #1
Add initial repo template by @chtruong814 in #2
Update GitHub with Gitlab main by @bxyu-nvidia in #3
Alias as Penguin by @bxyu-nvidia in #4
Add Copyright docs README FAQ by @bxyu-nvidia in #7
Dapo17k by @bxyu-nvidia in #6
Fix docs build failures by @bxyu-nvidia in #8
Fix docs by @bxyu-nvidia in #10
Improve Github SSH Key setup docs by @bxyu-nvidia in #12
Comp-Coding Verifier by @kbhardwaj-nvidia in #5
Dataset viewer simple aggregations by @fsiino-nvidia in #9
VLLMModel docs in main Readme by @bxyu-nvidia in #13
Fix agent name in docs by @bxyu-nvidia in #15
VLLMModel propogates token IDs by @bxyu-nvidia in #11
VLLMModel tokenize params cleanup by @bxyu-nvidia in #21
Update Comp-Coding README.md by @kbhardwaj-nvidia in #26
Docs improvements - remove Why NeMo Gym section and add CI/CD tests info by @bxyu-nvidia in #27
update server logging format to be more consistent by @cmunley1 in #22
update readmes from ng_collect_traj to ng_collect_rollouts by @cmunley1 in #25
Simple agent stop criteria requires no tool calls AND output message item to be present by @bxyu-nvidia in #19
Server spinup polling by @bxyu-nvidia in #31
Rename top-level config key 'openai_model' => 'policy_model' by @pjin-nvidia in #33
Simple agent allows non-json tool responses by @bxyu-nvidia in #35
Multi-verifier docs by @bxyu-nvidia in #36
Servers have easy hooks into individual instances via session by @bxyu-nvidia in #24
Add Math Stack Overflow dataset by @damon-mosk-aoyama-nvidia in #42
Add Workbench validation dataset by @bxyu-nvidia in #46
Docs update by @bxyu-nvidia in #47
Implements LLM-as-Judge for Response Equivalence by @soares-f in #16
Configure global httpx client by @pjin-nvidia in #50
Fix OpenAI ResponseReasoningItem.status property by @bxyu-nvidia in #54
VLLMModel data parallel; explicit RunHelper shutdown handle by @bxyu-nvidia in #52
removed simple_agent_stateful, uses fastapi to keep track of session by @RahulSChand in #44
Migrate text_based_game: sudoku and game agent features by @RahulSChand in #30
Revert "Migrate text_based_game: sudoku and game agent features" by @bxyu-nvidia in #65
Add data aggregations to data preparation by @fsiino-nvidia in #49
Instantiate one httpx async client per unique connection / base url by @bxyu-nvidia in #75
Swap async http backend from httpx to aiohttp; various server infra improvements by @bxyu-nvidia in #77
Remove unnecessary GHA CI and add uv config to enable dependency scanning by @chtruong814 in #66
VLLMModel fix whitespace stripping and unwarranted spaces by @bxyu-nvidia in #70
Fix aggregation rounding in ng_prepare_data by @fsiino-nvidia in #76
Add profiling; improve rollout collection usability and efficiency; add uvicorn logging filtering by @bxyu-nvidia in #79
Delete .github/ISSUE_TEMPLATE directory by @pablo-garay in #87
Add support for num_repeats by @MahanFathi in #99
Comp coding fixes; lots of misc infra items by @bxyu-nvidia in #90
chore: Update cherry-pick workflow to use v0.63.0 by @pablo-garay in #108
Make Workbench stateful and sign commits by @abhibha-nvidia in #110
Clean deprecated Comp coding by @bxyu-nvidia in #106
Bxyu/misc infra 20251001 by @bxyu-nvidia in #116
Resource Server Organization by @fsiino-nvidia in #80
Add metrics conflict error FAQ to Readme by @fsiino-nvidia in #93
Azure OpenAI model support by @bxyu-nvidia in #112
Use python env for precommit hook; alter files trigger by @fsiino-nvidia in #125
Update issue templates by @bxyu-nvidia in #152
Add back Nemo Framework templates by @bxyu-nvidia in #153
Fix Workbench invalid function name by @bxyu-nvidia in #167
VLLMModel enable reasoning parsing by @bxyu-nvidia in #129
Add Attributions for Third Party Softwares by @banghuaz-nvidia in #154
Fix infinite OpenAI endpoint query; misc improvements by @bxyu-nvidia in #171
docs: Add Tutorial 00 - Key Terminology by @cwing-nvidia in #180
docs: Add tutorial README with learning path structure by @cwing-nvidia in #177
Redirect main Gym readme to Tutorials by @bxyu-nvidia in #201
docs: Add Tutorial 01 - Understanding Core Concepts by @cwing-nvidia in #181
docs: Add Tutorial 09 - Configuration Management by @cwing-nvidia in #183
Add CODE_OF_CONDUCT.md for community guidelines by @cwing-nvidia in #148
Add SECURITY.md with NVIDIA security policy by @cwing-nvidia in #149
Make metrics conflict criteria less strict by @fsiino-nvidia in #150
Move tutorials to docs by @bxyu-nvidia in #205
docs: Replace README with improved version by @cwing-nvidia in #192
Large docs improvement PR from @cwing-nvidia by @bxyu-nvidia in #208
Add back How-To's and FAQs by @bxyu-nvidia in #209
Docs fixes by @bxyu-nvidia in #210
Improve CONTRIBUTING.md by @cwing-nvidia in #151
feat (OpenQA): Add OpenQA support with per-record regex and rescue features by @psgundecha-nv in #155
feat(mcqa): Add custom answer extraction via template_metadata to support STEM MCQA dataset by @psgundecha-nv in #128
Add README to docs folder by @bxyu-nvidia in #216
Ray comp coding infra by @sdevare-nv in #195
Misc docs fixes by @bxyu-nvidia in #218
CLI help and command help; misc improvements by @bxyu-nvidia in #229
Misc infra 20251024 by @bxyu-nvidia in #234
Fix ray version mismatch by @sdevare-nv in #231
Misc fixes 20251027 by @bxyu-nvidia in #243
Validate server port selection by @fsiino-nvidia in #233
bxyu/misc-infra-20251027-001 by @bxyu-nvidia in #247
Fix input assistant messages by @bxyu-nvidia in #248
Misc infra 20251028 002 by @bxyu-nvidia in #253
Structured Outputs JSON Environment by @jkyi-nvidia in #251
Bump OpenAI version to 2.6.1; improve dependency constrain resolution by @bxyu-nvidia in #255
Update missing header and attributions by @banghuaz-nvidia in #237
Misc infra 20251031 by @bxyu-nvidia in #263
Update math dataset examples and metrics by @damon-mosk-aoyama-nvidia in #265
Misc infra 20251101 by @bxyu-nvidia in #267
Almost-server detection and reporting by @fsiino-nvidia in #249
Miniswe env by @sdevare-nv in #241
Differentiate Example-only and Training Resource Servers...

Contributors

pablo-garay, soares-f, and 16 other contributors

Assets 2

Uh oh!

Releases: NVIDIA-NeMo/Gym

Release list

v0.4.0

Release Summary

First-Time Contributors

Command Line Interface

Evaluation & Diagnostics

Sandboxing

Configure Agent Harnesses

Configure Models

New Benchmarks

Deprecation Notices

Bug Fixes

Documentation

Contributors

Uh oh!

v0.3.0

Release Summary

First-Time Contributors

New Environments & Benchmarks

Configure Agent Harnesses

Configure Models

Rollout Collection & Profiling

Environment Library Integrations

Deprecation Notices

Bug Fixes

Documentation

Contributors

Uh oh!

v0.2.1

Contributors

Uh oh!

v0.2.0

Release Summary

First-Time Contributors

Major Features & Improvements

Deprecation Notices

Bug Fixes

Documentation

Looking Ahead

Release Assets

What's Changed

Contributors

Uh oh!

v0.1.1

What's Changed

New Contributors

Contributors

Uh oh!

v0.1.0

What's Changed

Contributors

Uh oh!