Releases: NVIDIA-NeMo/Gym
Release list
v0.4.0
Release Summary
NeMo Gym v0.4.0 expands evaluation tooling and agent integrations. It establishes a new monthly release cadence; we will continue to provide day-zero support for Nemotron models, datasets, and environments.
Highlights:
- Unified
gymCLI: find agents and benchmarks by name withgym list, and catch config mistakes early withgym env validate - Diagnose evaluations with BLADE, an analysis skill for agents that reads your evaluation results and produces an evidence-backed report of which tasks failed, why, and the highest-impact fix (e.g. to the agent harness, training, verifier, or prompt)
- Measure the impact of agent skills: run the same tasks with different skill sets and compare how each changes agent performance
- Run agents in isolated sandboxes through a new pluggable provider framework
- More agent harnesses out of the box, including OpenClaw, Pi, and OpenCode
- Connect to hosted inference providers: Fireworks, Together.ai, OpenRouter, and more
- New benchmarks across science, long-context, and interactive tasks
First-Time Contributors
We welcomed 20+ new contributors to this release! A few highlights:
- @marta-sd and @wprazuch led the CLI refactor and clearer config errors
- @hemildesai added the pluggable sandbox provider infrastructure and OpenSandbox as the first built-in
- @adil-a laid the groundwork for Gym-owned MCP resources servers, letting a server expose its tools over MCP
- @eric-tramel added the BunsenChem chemistry benchmark
- @jeffwillette added the long machine translation datasets and servers
Thank you to all the new contributors for helping make NeMo Gym better!
Command Line Interface
- One
gymcommand for the full workflow, withgym env,gym eval,gym list, andgym datasetsubcommands - Reference agents, benchmarks, and environments by name: use
gym listto see what is available gym env validatechecks your config for missing, malformed, or empty values before a run and reports actionable errors
Evaluation & Diagnostics
- Skill evaluation: measure how agent skills affect performance by running the same tasks with different skill sets. Skills apply at rollout time as a run-level knob, so one dataset works across all skill variants and every rollout is tagged for comparison
- BLADE (Benchmark Level Analysis and Diagnostics Engine): a built-in analysis skill that reads an agent run's rollouts, metrics, and configs and produces an evidence-backed report of which tasks failed, why, and the highest-impact fix (e.g. harness, training, verifier, or prompt)
Sandboxing
- Run tool-using and coding agents in isolated sandboxes through a pluggable provider framework
- Built-in OpenSandbox and Apptainer providers, with third-party providers discoverable via entry points
Configure Agent Harnesses
New harnesses join the existing built-in set (Claude Code, Hermes, OpenHands, and more):
- Added OpenCode, OpenClaw, and Pi agents for evaluation
- Claude Code runtime capabilities (tool access, MCP servers, and bare vs. native auto-discovery mode) are now easily set via the server config
Configure Models
- New
inference_providermodel server connects to any OpenAI-compatible hosted provider (Fireworks, Together.ai, OpenRouter, DeepInfra, Gemini, and more) with ready-made configs - Every Gym model server now speaks the Anthropic Messages API, so Anthropic-native harnesses like the Claude Code CLI can run against any model you serve with Gym
New Benchmarks
- Science: CritPt (research-level physics), SciCode (scientific coding), BunsenChem (chemistry multiple-choice), and FrontierScience Research (rubric-scored science)
- Long context: Graphwalks (long-context graph reasoning) and Long Machine Translation (PG19, WMT24++)
- Interactive: TALES, a text-adventure game suite
See the Available Environments table for the full list.
Deprecation Notices
- The legacy
ng_*andnemo_gym_*CLI commands (such asng_runandng_collect_rollouts) are deprecated in favor of the unifiedgymCLI. They still work for now but will be removed in a future release.
Bug Fixes
- Fixed intermittent connection errors during high-concurrency rollout collection
- Clear error messages instead of crashes when a config file contains invalid YAML
Documentation
- New Build Verifiers section with verification patterns and multi-reward verification
- New Evaluate section covering benchmarks, evaluation metrics, and a guide to agent-native results diagnostics
- New page for configuring and evaluating agent skills
Full Changelog
- ci: bump _release_library.yml to v1.4.3 (#1508) by @ko3n1g
- fix(vllm_model): use reasoning parser option in converter (#1511) by @cmunley1
- fix: Compatibility with vllm 0.20 tool-calling (#1432) by @tdene
- ci: require SHA for release-ref, fix duplicate changelog, add release docs (#1536) by @kajalj22
- added long machine translation datasets and servers (#1458) by @jeffwillette
- feat(genrm_compare): add style density penalty for formatting control (#1543) by @macandro96
- fix: add example data + metrics for longmt_eval (#1559) by @ananthsub
- Add LC benchmarks (#1437) by @hsiehjackson
- removing questions/expected_answers type formating in ns_tools (#1581) by @OliviaViessmann
- ci: parallelize the server test suite (in-process concurrency, ~17min → faster locally + CI) (#1577) by @wprazuch
- ci: pin uv to 0.11.19 (0.11.20 resolver regression breaks the test suite) (#1576) by @wprazuch
- fix: graphwalks data validation (#1587) by @cmunley1
- Support RDKit chemistry answer formats (#1327) by @danecor
- feat: multi-reward tool-call environment and reward_components for GDPO (#1525) by @anjalibshah
- Add FrontierScience Research benchmark (#1553) by @jiacheng-xu
- fix(config): actionable error for unknown server cross-references (#1561) by @wprazuch
- docs: add NGC authentication step to GRPO setup tutorial (Fern) (#1552) by @lbliii
- feat(cli): document ng_init_resources_server generated config inline (#1205 friction #7) (#1597) by @wprazuch
- fix: Fern preview build (#1610) by @chtruong814
- docs: document the Gym to RL framework token-ID data interface (#1554) by @ananthsub
- Add ArXiv MCP tool config (#1419) by @tamohannes
- Add Wikipedia MCP tool config (#1420) by @tamohannes
- Add periodictable MCP tool config (#1422) by @tamohannes
- ci: add Claude Code review workflow (#1622) by @kajalj22
- docs: document use_absolute_ip config option (supersedes #595) (#1621) by @lbliii
- Add CoolProp MCP tool config (#1421) by @tamohannes
- Add particle MCP tool config (#1423) by @tamohannes
- Add radioactive decay MCP tool config (#1424) by @tamohannes
- fix: make logprobs capture robust to top_logprobs=null in vllm model (#1612) by @ananthsub
- Add SciCode Benchmark (#1592) by @fsiino-nvidia
- Add CritPt Benchmark (#1588) by @fsiino-nvidia
- fix: resolve CritPt benchmark config interpolation and add critpt_agent README (#1642) by @linj-glitch
- docs: describe local_vllm_model and extend docs for vllm_model (#1430) by @marta-sd
- Add BLADE analysis skill (#1591) by @jmabry
- feat: make claude code agent runtime capabilities configurable (#1603) by @cwing-nvidia
- Add sandbox API and mini swe agent 2 resource agent (#1377) by @hemildesai
- feat: abstention environment (#1459) by @cmunley1
- feat: reasoning gym environment (#1378) by @cmunley1
- fix(security): upgrade mlflow, grpcio, torch (longmt_eval) for CVE remediation (#1657) by @kajalj22
- feat(docs): add GitHub link to docs navbar (#1654) by @abhay-codes07
- chore: vendor gh-stack agent skill (#1616) by @ananthsub
- feat: arc agi environment (#1460) by @cmunley1
- chore (local_vllm_model): bump vllm 0.17.0 -> 0.20.0 (#1674) by @ananthsub
- Add sandbox coverage unit tests (#1684) by @hemildesai
- fix: refresh blackjack example rollouts (#1683) by @cmunley1
- feat: blackjack environment (#1464) by @cmunley1
- feat: instruction following environment (#1403) by @cmunley1
- feat: circle vlm environments (#1465) by @cmunley1
- feat: calendar environments (#1468) by @cmunley1
- feat: code gen environment (#1467) by @cmunley1
- fix: ensure client keepalive < server keepalive to avoid client keepalive desync errors (#1555) by @ananthsub
- feat: ether0 environment (#1472) by @cmunley1
- feat: [GDPval-AA v2 Updates 1 / n] - GDPval Multi-Reference Model Support (#1663) by @vadam5
- docs: document stacked pull requests in development setup (#1617) by @ananthsub
- docs(config): document the Domain enum (#1205 friction #9 / FEP-1023) (#1633) by @wprazuch
- docs: define Resources/Agent/Model Server in the glossary (#1205 friction #9, #395) (#1634) by @wprazuch
- fix(config): aggregated error for unset '???' config values (#1575) by @wprazuch
- feat: Add a default /v1/messages (Anthropic Messages) route to the base Gym… (#1627) by @ffrujeri
- feat: [GDPval-AA v2 Updates 2 / n] - Task Execution Only Mode (#1722) by @vadam5
- feat: [GDPval-AA v2 Updates 3 / n] - Judge Only Mode (#1725) by @vadam5
- Gym CLI refactor (#1630) by @marta-sd
- feat(config): unify dataset source via discriminated
source:block (FEP-1025) (#1637) by @wprazuch - feat(config): unified clean errors for bad/malformed/empty config_paths (#1205 #8/#12; #1488/#1489/#1490) (#1609) by @wprazuch
- feat: environment registry + 'gym list environments' (#1205 friction #8 / M2) (#1635) by @wprazuch
- feat: agent registry — name-based agent discovery + composability (M3 core) (#1671) by @wprazuch
- feat(cli): add 'gym env validate' pre-flight config check (#1205 friction #12) (#1599) by @wprazuch
- ci: fail notify job when Slack webhook returns an error (#1739) by @kajalj22
- Support agent-specific num_repeats in ng_collect_rollouts (#1356) by @gwarmstrong
- docs(fern): adding an evaluation section an...
v0.3.0
Release Summary
NeMo Gym v0.3.0 ships alongside the NVIDIA Nemotron 3 Ultra model release, open sourcing the environments and corresponding datasets used during training.
Highlights:
- 70+ new environments, including benchmarks such as Tau2 and Nemotron RL training environments
- Popular harness available out-of-the-box such as Claude Code and Hermes
- Integrations with OpenEnv and Harbor - use environments from these libraries directly with NeMo Gym
- Integration with VeRL - train with VeRL and scale rollout collection with NeMo Gym
First-Time Contributors
We welcomed 30+ new contributors to this release! Here are a few highlights:
- @grace-lam added the integration to run Harbor environments with NeMo Gym
- @aleksficek — added Competitive Coding Challenges environment
- @jthomson04 improved rollout resilience when models emit malformed tool-call arguments or missing message content
Thank you to all the new contributors for helping make NeMo Gym better!
New Environments & Benchmarks
Added 70+ new environments including novel datasets and integrations of popular benchmarks. New coverage spans:
- Coding — competitive programming, code infilling, SQL generation, and software-engineering benchmarks with execution-based verification
- Math & proofs — olympiad-style problems, proof grading and validation, and formal verification (including Lean)
- Knowledge & science — graduate-level QA, chemistry and physics tasks, and lab-style reasoning (including multimodal figure, table, and protocol tasks)
- Agentic — multi-turn tool use, search, sandboxed execution, finance workflows, and tau-bench-style conversational agents
- Instruction following — format constraints, citation compliance, and IFBench-style rule verification
- Safety & RLHF — jailbreak detection, abstention calibration, prompt-injection resistance, and generative reward modeling
- Multimodal, speech & translation — VLM benchmarks, visual grounding, ASR evaluation, and machine-translation quality metrics
- Chat & broad knowledge — arena-style preference evaluation and MMLU-family benchmarks
- Interactive RL — Gymnasium-style multi-step environments for spatial and game-based training
See the Available Environments table for the full list.
Configure Agent Harnesses
- Claude Code — available out of the box in NeMo Gym
- Hermes — available out of the box in NeMo Gym
- LangGraph agent — an adapter that lets you build custom agents using LangGraph patterns (reflection, subagent orchestration, parallel thinking, rewoo)
- Gymnasium agent — generic multi-turn harness for use with OpenAI Gym-style environments
Configure Models
- Optional
max_concurrent_requestson the OpenAI model server to cap in-flight API calls — useful for rate-limited external endpoints when rollout concurrency is high
Rollout Collection & Profiling
- New
ng_aggregate_rolloutscommand to merge rollout shards collected independently across multiple nodes, enabling distributed eval without requiring a single coordinated collection job
Environment Library Integrations
- OpenEnv — combine OpenEnv environments with NeMo Gym environments
- Harbor — combine Harbor environments with NeMo Gym environments
Deprecation Notices
- Documentation has moved from Sphinx to Fern. Old Sphinx URLs redirect to the new site at docs.nvidia.com/nemo/gym. The
docs/directory is no longer used for publishing.
Bug Fixes
- Fixed aiohttp connection limit exhaustion under FastAPI/Uvicorn with multiple workers
- Fixed session cookie propagation for Starlette >= 1.0.0
- Fixed duplicated usage counting and errors on empty usage in subsequent model calls
- Improved rollout resilience when models emit malformed tool-call arguments or missing message content
- Fixed prompt-key hashing when inputs contain Pydantic BaseModel objects
Documentation
- New concepts pages for environments, evaluation, and training
- Improved Architecture page to clarify how environments map to NeMo Gym components
- Consolidated detailed setup and quickstart into a single improved quickstart with clearer descriptions
- Expanded Ecosystem page with environment library, training framework, and agent harness integrations
Changelog Details
- feat: VLM circle click environment (#837) by @cmunley1
- feat: LocalVLLMModel bump to vLLM 0.17.0 (#839) by @bxyu-nvidia
- feat: Status updates for agent refs during rollout collection (#843) by @bxyu-nvidia
- feat: ether0 chemistry benchmark environment (#838) by @cmunley1
- docs: prime intellect verifiers dataset generation instruction update (#851) by @cmunley1
- Finance Agent Environment (#742) by @ushnish-de
- feat: Add XSTest safety benchmark resource server (#764) by @dcfarris
- Create a guide to build environments in NeMo Gym (#711) by @shashank3959
- Add multi-step tool-calling data generation example (#778) by @shashank3959
- docs: Fix TRL docs link (#857) by @bxyu-nvidia
- Swap readme table columns (to main) (#856) by @fsiino-nvidia
- Introduce Benchmarks directory (#858) by @gwarmstrong
- add gpqa diamond dataset (#845) by @azkalot1
- docs: rl <> gym compatibility table (#803) by @lbliii
- Updated contributing guide message (#862) by @cwing-nvidia
- docs: Nemotron 3 Super recipe link (#863) by @bxyu-nvidia
- Gym 0.2.0 huggingface dataset pointers (#859) by @fsiino-nvidia
- Add support for SWE-Multilingual benchmark (#822) by @roclark
- chore: Bump python package version to 0.3.0.rc0 and descriptions (#883) by @chtruong814
- feat: add Harbor integration (#751) by @grace-lam
- docs: Fix MultiChallenge train dataset description (#885) by @bxyu-nvidia
- docs: update GPQA-D readme (#888) by @cmunley1
- feat: add spider2_lite resource server (#864) by @ryan-lempka
- Add prompt config for templating (#861) by @gwarmstrong
- Compute aggregate metrics (#890) by @gwarmstrong
- Streamline Benchmark rollouts and add aime24/math_with_judge metrics (#891) by @gwarmstrong
- added bbh-train support to gym (#894) by @arnavkomaragiri
- updated README with license info (#895) by @arnavkomaragiri
- feat: VLMEvalKit (#872) by @vadam5
- bug: Fix README table display (#897) by @bxyu-nvidia
- feat: Initial integration with OpenEnv (#898) by @ahmadki
- feat: add aime25 benchmark (#899) by @gwarmstrong
- GPQA benchmark (#903) by @gwarmstrong
- Structured Outputs update with YAML and XML (#865) by @jkyi-nvidia
- feat: langgraph integration (#877) by @vadam5
- Add proof environments (#907) by @smahdavi4
- feat: Benchmark infra refactors (#906) by @bxyu-nvidia
- [Fix] use venv Python for swerl_gen Ray workers instead of hardcoded PYTHONPATH (#920) by @spacegoing
- [Fix] guard nltk download with local find() to avoid unnecessary remote fetch (#919) by @spacegoing
- [fix] (code_gen): use runtime_env py_executable for Ray workers (#913) by @spacegoing
- docs: version bump, CTA link changes (#880) by @vadam5
- Add zero reward group option for proof judge environment (#923) by @smahdavi4
- fix: always send session cookie for starlette >= 1.0.0 (#942) by @cmunley1
- feat: Fix duplicated usage counting and errors on empty usage in subsequent model calls (#939) by @bxyu-nvidia
- benchmark: LiveCodeBench v5 and v6 (#933) by @bxyu-nvidia
- fix: reasoning gym duplicate license (#947) by @cmunley1
- SWE agent refactor (#934) by @sdevare-nv
- feat: tee gym server subprocess logs to a configurable directory (#950) by @ananthsub
- feat: Browsecomp benchmark exposure (#944) by @bxyu-nvidia
- ci: upgrade GitHub Actions for Node.js 24 compatibility (#932) by @ko3n1g
- docs: add aiohttp-over-httpx guidance and multi-turn agent patterns (#957) by @cwing-nvidia
- feat: add dataset preparation script for spider2_lite (#959) by @ryan-lempka
- feat: Start Nemotron 3 Ultra benchmarks config; expose Spider 2 lite and XSTest benchmarks (#958) by @bxyu-nvidia
- docs: dataset availability (#962) by @cmunley1
- fix: Match torch backend auto in genrm model (#963) by @bxyu-nvidia
- Support for multiple gold choices in swerl_llm_judge (#956) by @atefehsz
- feat(ether0): Add boxed and Answer: LETTER extraction fallbacks (#925) by @jubick1337
- fix: RMtree ignores errors (#964) by @bxyu-nvidia
- feat: AALCR and Ruler benchmarks; Misc infra (#966) by @bxyu-nvidia
- terminus judge improvement for sim only mode (#968) by @jialeiwang
- Abstention Environment (HotpotQA) (#954) by @MahanFathi
- chore: bump
_code_freezeworkflow tov0.86.0(#978) by @ko3n1g - SWE: update OH version (#979) by @sdevare-nv
- fix: Handle BaseModel inputs in prompt-key hashing. (#991) by @ffrujeri
- docs: llm-as-a-judge (#926) by @fsiino-nvidia
- Add the RDKit-Chemistry RL Environment (#984) by @danecor
- feat: mmlu_pro and mmlu_prox benchmarks (#988) by @fsiino-nvidia
- feat: Misc infra (#970) by @bxyu-nvidia
- feat: Introduce NVARC Resource Server with inductive and transductive modes (#1003) by @cmunley1
- Add CVDP benchmark resource server with apptainer instead of docker (#928) by @arti4nvj
- feat: add ifbench (#999) by @fsiino-nvidia
- Upstream 20260408 (#1039) by @bxyu-nvidia
- fix: GenRM lock in order to properly handle concurrent requests. (#1041) by @ffrujeri
- Tau2 benchmark (#1049) by @bxyu-nvidia
- Add tau2 to Nemotron 3 Ultra benchmarks (#1052) by @bxyu-nvidia
- feat: Fix sequential reasoning allowed (#1053) by @bxyu-nvidia
- Fix aiohttp connection limit under FastAPI/Uvicorn workers > 1 (#1054) by @bxyu-nvidia
- fix: pypi (#1056) by @cmunley1
- Additional Tau2 metrics (#1064) by @bxyu-nvidia
- Bump version to 0.2.1 and make wheel test mandatory (#1065) by @kajalj22
- renamed simple_agent to cvdp_agent for consistency (#1024) by @arti4nvj
- feat: VLM counting environment (#930) by @cmunley1
- fix: add value field...
v0.2.1
v0.2.0
Release Summary
NeMo Gym v0.2.0 ships alongside the NVIDIA Nemotron 3 Super model release, open sourcing the RL environments and corresponding datasets used during training. Highlights:
- 17 new training environments across coding, math, science, reasoning, agentic tasks, and safety.
- Integrations with Future House Aviary, Open-Thought Reasoning Gym, and Prime Intellect Verifiers let you use environments from these libraries directly within NeMo Gym
- End-to-end rollout collection with a locally managed vLLM server
- Install directly from PyPI with pip install nemo-gym
First-Time Contributors
We welcomed 15 new contributors to this release! Here are a few highlights:
- @sidnarayanan added the Aviary integration to enable training on any Aviary environment, a library of interactive RL environments spanning math, science, biology, and more
- @3mei added the text-to-SQL environment to generate SQL queries from natural language across multiple SQL dialects
- @Kelvin0110 added the NewtonBench environment to discover scientific laws through interactive experimentation
Thank you to all the new contributors for helping make NeMo Gym better!
Major Features & Improvements
New Environments
- Added 17 new resources servers spanning:
- Coding: Text to SQL (#648), SWE RL Gen (#561), SWE RL LLM Judge (#561)
- Math: Lean4 Mathematical Proofs (#563)
- Science: Aviary (#55), NewtonBench (#650)
- Reasoning: MultiChallenge (#654), ARC-AGI (#105), Reasoning Gym (#113)
- Agent tasks: xLAM Function Calling (#262), Tavily Search (#825), Single Step Tool Use with Argument Comparison (#825), Terminus Judge (#594), NeMo Skills Tools (#571)
- Safety: Jailbreak Detection (#825), Over Refusal Detection (#825)
- RLHF: Generative Reward Model Compare (#674)
- Added 5 new agent servers: Aviary agent (#55), proof refinement agent (#563), SWE agents (#343), tool simulation agent (#826), and verifiers agent (#573)
Environment Library Integrations
Combine environments from other libraries with NeMo Gym environments
Model Serving
- Local vLLM model server with end-to-end rollout collection without an external API (#558, #762)
- vLLM 0.16+ support for the reasoning field in responses (#816)
- VLLMModel chat template kwargs support (#538, #636)
- Per-task chat template and extra body args, enabling per-task control of reasoning mode and thinking budget (#672)
Rollout Collection & Profiling
- New ng_reward_profile command to compute per-task pass rates and aggregate metrics (#83, #621)
- CPU profiling for rollout performance analysis (#763)
- Add option for seeding on num_repeats for rollouts (#740)
Infrastructure & Developer Experience
- PyPI compatibility: install via
pip install nemo-gym(#649) - Dry run mode:
ng_run +dryrun=trueto validate configs and install environments without starting servers (#743) ng_statuscommand to list running servers and their health (#290)- Server stdout/stderr redirection with server name prefixes (#703)
- FastAPI worker support for higher throughput across multiple workers (#566)
Model Recipes
Deprecation Notices
- Deprecated ng_viewer due to a Gradio security vulnerability. We plan to revisit rollout viewing with a more robust solution in a future release.
Bug Fixes
- Fixed 0.1.1 environments to work correctly with RL training pipelines (#768)
- Fixed crash when server receives malformed JSON during rollout collection (#770)
- Fixed dry run mode failing (#746)
- Fixed nested responses_create_params overrides not merging correctly from CLI (#827)
- Fixed ng_prepare_data failing when multiple environments define overlapping metrics (#738)
- Fixed reward profiling failing when model response doesn't include usage stats (#824)
- Fixed NeMo-Skills python tool to use HTTP calls instead of subprocess execution (#606)
- Bumped Pillow and other packages to address security vulnerabilities (#667, #739)
- ng_dump_config now redacts API key values from output (#567)
Documentation
- New training tutorials: Unsloth training with NeMo Gym, multi-environment training
- New environment tutorials: creating a training environment, custom data preparation, integrating external environment libraries, environment best practices
- Model recipes: reproduce the training for Nemotron 3 Nano and Nemotron 3 Super
- Concepts & architecture overhaul: rewrote concepts docs, added architecture diagrams, added agent server and resources server docs
- Training approaches: added training approaches docs page covering SFT, RL (GRPO), and RLVR
- Ecosystem page: revamped ecosystem page with training framework integrations and environment library integrations
- Infrastructure: added SWE RL infrastructure case study, deployment topology docs
- Quality pass: redirect sweep, style guide sweep, consistent naming, FAQ additions, broken link fixes
Looking Ahead
- VLM support: add support for VLM models and environments with images, e.g. browser environments and computer use agent (CUA) environments
- Benchmark environments: add popular OSS environments such as OSWorld, Tau Bench, BrowseComp
- Integrate existing agents: integrate popular existing agents, e.g. coding harnesses, as well as agents developed via popular agent frameworks, e.g. LangGraph
- Environment tutorials: incorporate more complex agentic loops during training such as multi-turn conversation and user modeling
Release Assets
GitHub Release: https://github.com/NVIDIA-NeMo/Gym/releases/tag/v0.2.0
Container: nvcr.io/nvidia/nemo-rl:v0.5.0.nemotron_3_super
What's Changed
- Bump to v0.2.0 by @bxyu-nvidia in #510
- reasoning-gym resource server by @cmunley1 in #113
- docs: redirect setup by @lbliii in #513
- docs: Miscellaneous GRPO tutorial fixes by @bxyu-nvidia in #512
- docs settings update by @lbliii in #525
- Debug server package versions by @fsiino-nvidia in #406
- List running server health and status by @fsiino-nvidia in #290
- VLLMModel supports chat template kwargs by @pjin-nvidia in #538
- Salesforce xlam-function-calling-60k resources server by @cmunley1 in #262
- python flag for colab venv installation by @cmunley1 in #526
- add unsloth and trl to docs by @cmunley1 in #536
- docs: remove trl docs by @cmunley1 in #543
- Remove PlainTextResponse response_class by @fsiino-nvidia in https://github.com/NVIDIA-N...
v0.1.1
What's Changed
- Bump package info for v0.2.0 by @bxyu-nvidia in #337
- fix: Update incorrect path in docs: library_judge_math -> math_with_j… by @shashank3959 in #355
- Update secret detector to work with forks by @chtruong814 in #358
- Removed reference to gitlab master by @hwolff99 in #377
- Mark experimental tutorials by @bxyu-nvidia in #386
- docs: experimental label by @lbliii in #391
- Fixed typos by @hwolff99 in #400
- Readme dataset discoverability cont by @fsiino-nvidia in #344
- Add absolute ip for multi node by @sdevare-nv in #286
- docs: removed "How to Navigate" section from concepts by @ahmadki in #414
- docs: Fixed image embedding in core abstractions page by @ahmadki in #410
- docs: Fixed Licensing information in structured outputs by @ahmadki in #412
- docs: Added hyperlinks to github repo in docs by @ahmadki in #413
- docs: Add software / hardware requirements to README and docs. by @ffrujeri in #401
- docs: Cleaned the "Quick Start" section in the README by @ahmadki in #411
- Display system and version info by @fsiino-nvidia in #347
- docs: Improve language around resources servers. by @ffrujeri in #408
- docs: Add Create Resource Server Tutorial by @ffrujeri in #407
- miniswe w/ offline uv by @sdevare-nv in #357
- update vllm model comments by @cmunley1 in #423
- docs: linked several terms to their defenition in glossary by @ahmadki in #424
- docs: Explain why GPT-4 is used and clarify support for other models by @ahmadki in #425
- Removed internal section by @hwolff99 in #430
- docs: various improvements and fixes by @ahmadki in #415
- docs: Relate sections Get Started and Rollout Collection by @fsiino-nvidia in #426
- Guide user on next steps after finishing get started by @cwing-nvidia in #435
- Add placeholder author by @jkyi-nvidia in #440
- Clarify training environment framing and align docs messaging by @cwing-nvidia in #438
- docs: Added CLI documentation by @ahmadki in #444
- Change NeMo Gym from framework to library by @cwing-nvidia in #456
- Add Data Designer and links to ecosystem page by @cwing-nvidia in #462
- docs: Moved configuration system under about by @ahmadki in #420
- Add benefits to About page aligned with README by @cwing-nvidia in #452
- Explain where the name Gym comes from; Gym Key Terminology doc is missing some of the old material by @bxyu-nvidia in #470
- add calendar env for multi-turn IF by @sanjaykariyappa in #297
- docs(readme): fix Example Resource Servers table - correct Multi Step… by @lbliii in #464
- Remove penguin references by @ahmadki in #469
- docs: Training framework integration by @bxyu-nvidia in #439
- Bug: inconsistent documentation around servers running by @bxyu-nvidia in #472
- docs: Improve server reference info by @bxyu-nvidia in #474
- pyproject typos and grammar fixes by @ahmadki in #473
- Miscellaneous infra improvements/fixes by @pjin-nvidia in #317
- Expose server host and port in dataset viewer CLI by @ahmadki in #476
- Rename examples simple_weather and stateful_counter by @fsiino-nvidia in #479
- More single tool call filename updates by @fsiino-nvidia in #480
- docs: Fix wrong count vs actual by @fsiino-nvidia in #482
- Fix duplicate reference sections by @bxyu-nvidia in #483
- docs: home pg, quickstart move, gh icon by @lbliii in #463
- More single tool call filename updates cont by @fsiino-nvidia in #484
- Fix NeMo Gym Pyproject links by @bxyu-nvidia in #486
- docs: move FAQ by @lbliii in #489
- docs: contribute section by @lbliii in #490
- Misc rollout fixes by @pjin-nvidia in #447
- improve framing of training framework integration guide for contributing by @cwing-nvidia in #493
- Docs: Contribution Home & Dev Setup by @cwing-nvidia in #494
- Add environment contribution docs by @cwing-nvidia in #498
- FAQ cleanup by @cwing-nvidia in #499
- Simplify contributing.md by @cwing-nvidia in #500
- Reorder README structure by @cwing-nvidia in #501
- docs: End-to-end GRPO Training with NeMo RL tutorial [master branch] by @bxyu-nvidia in #481
- Update dataset configs with HuggingFace links by @bxyu-nvidia in #508
- Change to v0.1.1 release version by @bxyu-nvidia in #509
New Contributors
- @shashank3959 made their first contribution in #355
- @hwolff99 made their first contribution in #377
- @ahmadki made their first contribution in #414
- @ffrujeri made their first contribution in #401
- @sanjaykariyappa made their first contribution in #297
Full Changelog: v0.1.0...v0.1.1
v0.1.0
What's Changed
- Add copy-pr-bot by @chtruong814 in #1
- Add initial repo template by @chtruong814 in #2
- Update GitHub with Gitlab main by @bxyu-nvidia in #3
- Alias as Penguin by @bxyu-nvidia in #4
- Add Copyright docs README FAQ by @bxyu-nvidia in #7
- Dapo17k by @bxyu-nvidia in #6
- Fix docs build failures by @bxyu-nvidia in #8
- Fix docs by @bxyu-nvidia in #10
- Improve Github SSH Key setup docs by @bxyu-nvidia in #12
- Comp-Coding Verifier by @kbhardwaj-nvidia in #5
- Dataset viewer simple aggregations by @fsiino-nvidia in #9
- VLLMModel docs in main Readme by @bxyu-nvidia in #13
- Fix agent name in docs by @bxyu-nvidia in #15
- VLLMModel propogates token IDs by @bxyu-nvidia in #11
- VLLMModel tokenize params cleanup by @bxyu-nvidia in #21
- Update Comp-Coding README.md by @kbhardwaj-nvidia in #26
- Docs improvements - remove Why NeMo Gym section and add CI/CD tests info by @bxyu-nvidia in #27
- update server logging format to be more consistent by @cmunley1 in #22
- update readmes from ng_collect_traj to ng_collect_rollouts by @cmunley1 in #25
- Simple agent stop criteria requires no tool calls AND output message item to be present by @bxyu-nvidia in #19
- Server spinup polling by @bxyu-nvidia in #31
- Rename top-level config key 'openai_model' => 'policy_model' by @pjin-nvidia in #33
- Simple agent allows non-json tool responses by @bxyu-nvidia in #35
- Multi-verifier docs by @bxyu-nvidia in #36
- Servers have easy hooks into individual instances via session by @bxyu-nvidia in #24
- Add Math Stack Overflow dataset by @damon-mosk-aoyama-nvidia in #42
- Add Workbench validation dataset by @bxyu-nvidia in #46
- Docs update by @bxyu-nvidia in #47
- Implements LLM-as-Judge for Response Equivalence by @soares-f in #16
- Configure global httpx client by @pjin-nvidia in #50
- Fix OpenAI ResponseReasoningItem.status property by @bxyu-nvidia in #54
- VLLMModel data parallel; explicit RunHelper shutdown handle by @bxyu-nvidia in #52
- removed simple_agent_stateful, uses fastapi to keep track of session by @RahulSChand in #44
- Migrate text_based_game: sudoku and game agent features by @RahulSChand in #30
- Revert "Migrate text_based_game: sudoku and game agent features" by @bxyu-nvidia in #65
- Add data aggregations to data preparation by @fsiino-nvidia in #49
- Instantiate one httpx async client per unique connection / base url by @bxyu-nvidia in #75
- Swap async http backend from httpx to aiohttp; various server infra improvements by @bxyu-nvidia in #77
- Remove unnecessary GHA CI and add uv config to enable dependency scanning by @chtruong814 in #66
- VLLMModel fix whitespace stripping and unwarranted spaces by @bxyu-nvidia in #70
- Fix aggregation rounding in ng_prepare_data by @fsiino-nvidia in #76
- Add profiling; improve rollout collection usability and efficiency; add uvicorn logging filtering by @bxyu-nvidia in #79
- Delete .github/ISSUE_TEMPLATE directory by @pablo-garay in #87
- Add support for
num_repeatsby @MahanFathi in #99 - Comp coding fixes; lots of misc infra items by @bxyu-nvidia in #90
- chore: Update cherry-pick workflow to use v0.63.0 by @pablo-garay in #108
- Make Workbench stateful and sign commits by @abhibha-nvidia in #110
- Clean deprecated Comp coding by @bxyu-nvidia in #106
- Bxyu/misc infra 20251001 by @bxyu-nvidia in #116
- Resource Server Organization by @fsiino-nvidia in #80
- Add metrics conflict error FAQ to Readme by @fsiino-nvidia in #93
- Azure OpenAI model support by @bxyu-nvidia in #112
- Use python env for precommit hook; alter files trigger by @fsiino-nvidia in #125
- Update issue templates by @bxyu-nvidia in #152
- Add back Nemo Framework templates by @bxyu-nvidia in #153
- Fix Workbench invalid function name by @bxyu-nvidia in #167
- VLLMModel enable reasoning parsing by @bxyu-nvidia in #129
- Add Attributions for Third Party Softwares by @banghuaz-nvidia in #154
- Fix infinite OpenAI endpoint query; misc improvements by @bxyu-nvidia in #171
- docs: Add Tutorial 00 - Key Terminology by @cwing-nvidia in #180
- docs: Add tutorial README with learning path structure by @cwing-nvidia in #177
- Redirect main Gym readme to Tutorials by @bxyu-nvidia in #201
- docs: Add Tutorial 01 - Understanding Core Concepts by @cwing-nvidia in #181
- docs: Add Tutorial 09 - Configuration Management by @cwing-nvidia in #183
- Add CODE_OF_CONDUCT.md for community guidelines by @cwing-nvidia in #148
- Add SECURITY.md with NVIDIA security policy by @cwing-nvidia in #149
- Make metrics conflict criteria less strict by @fsiino-nvidia in #150
- Move tutorials to docs by @bxyu-nvidia in #205
- docs: Replace README with improved version by @cwing-nvidia in #192
- Large docs improvement PR from @cwing-nvidia by @bxyu-nvidia in #208
- Add back How-To's and FAQs by @bxyu-nvidia in #209
- Docs fixes by @bxyu-nvidia in #210
- Improve CONTRIBUTING.md by @cwing-nvidia in #151
- feat (OpenQA): Add OpenQA support with per-record regex and rescue features by @psgundecha-nv in #155
- feat(mcqa): Add custom answer extraction via template_metadata to support STEM MCQA dataset by @psgundecha-nv in #128
- Add README to docs folder by @bxyu-nvidia in #216
- Ray comp coding infra by @sdevare-nv in #195
- Misc docs fixes by @bxyu-nvidia in #218
- CLI help and command help; misc improvements by @bxyu-nvidia in #229
- Misc infra 20251024 by @bxyu-nvidia in #234
- Fix ray version mismatch by @sdevare-nv in #231
- Misc fixes 20251027 by @bxyu-nvidia in #243
- Validate server port selection by @fsiino-nvidia in #233
- bxyu/misc-infra-20251027-001 by @bxyu-nvidia in #247
- Fix input assistant messages by @bxyu-nvidia in #248
- Misc infra 20251028 002 by @bxyu-nvidia in #253
- Structured Outputs JSON Environment by @jkyi-nvidia in #251
- Bump OpenAI version to 2.6.1; improve dependency constrain resolution by @bxyu-nvidia in #255
- Update missing header and attributions by @banghuaz-nvidia in #237
- Misc infra 20251031 by @bxyu-nvidia in #263
- Update math dataset examples and metrics by @damon-mosk-aoyama-nvidia in #265
- Misc infra 20251101 by @bxyu-nvidia in #267
- Almost-server detection and reporting by @fsiino-nvidia in #249
- Miniswe env by @sdevare-nv in #241
- Differentiate Example-only and Training Resource Servers...