NVIDIA-NeMo · marta-sd · Oct 28, 2025 · Oct 23, 2025 · Oct 23, 2025 · Oct 23, 2025
@@ -42,7 +42,7 @@
         top_p=0,
         parallelism=1,
         extra={
-            "tokenizer": "/checkpoints/llama-3_2-1b-instruct_v2.0/context/nemo_tokenizer",
+            "tokenizer": "/checkpoint/tokenizer",
             "tokenizer_backend": "huggingface",
         },
     ),

@@ -0,0 +1,39 @@
+# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#!/usr/bin/env python3
+
+# [snippet-start]
+# Start Python in a new terminal
+# 3. Launch evaluation:
+
+from nemo_evaluator.api import evaluate
+from nemo_evaluator.api.api_dataclasses import (
+    ApiEndpoint,
+    EvaluationConfig,
+    EvaluationTarget,
+)
+
+# Configure evaluation
+api_endpoint = ApiEndpoint(
+    url="http://0.0.0.0:8080/v1/completions/",
+    type="completions",
+    model_id="megatron_model",
+)
+target = EvaluationTarget(api_endpoint=api_endpoint)
+config = EvaluationConfig(type="gsm8k", output_dir="results")
+
+# Run evaluation
+results = evaluate(target_cfg=target, eval_cfg=config)
+print(results)
+# [snippet-end]
@@ -46,6 +46,14 @@ Unified CLI experience with automated container management, built-in orchestrati
 Programmatic control with full adapter features, custom configurations, and direct API access for integration into existing workflows.
 :::
 
+:::{grid-item-card} {octicon}`gear;1.5em;sd-mr-1` NeMo Framework Container
+:link: gs-quickstart-nemo-fw
+:link-type: ref
+**For NeMo Framework Users**
+
+End-to-end training and evaluation of large language models (LLMs).
+:::
+
 :::{grid-item-card} {octicon}`container;1.5em;sd-mr-1` Container Direct
 :link: gs-quickstart-container
 :link-type: ref
@@ -272,5 +280,6 @@ nemo-evaluator-launcher run --config-dir packages/nemo-evaluator-launcher/exampl
 
 NeMo Evaluator Launcher <launcher>
 NeMo Evaluator Core <core>
+NeMo Framework Container <nemo-fw>
 Container Direct <container>
 ```
@@ -1,7 +1,3 @@
----
-orphan: true
----
-
 (gs-quickstart-nemo-fw)=
 # Evaluate checkpoints trained by NeMo Framework
 
@@ -12,52 +8,42 @@ The NeMo Evaluator is integrated within NeMo Framework, offering streamlined dep
 
 ## Prerequisites
 
-- Docker with GPU support
-- NeMo Framework docker container
+- Docker installed
+- CUDA-compatible GPU
+- [NeMo Framework docker container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo/tags)
+- Your model checkpoint (or use [Llama 3.2 1B Instruct](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/llama-3_2-1b-instruct) for testing)
 
 ## Quick Start
 
-### 1. Start NeMo Framework Container
-
-For optimal performance and user experience, use the latest version of the [NeMo Framework container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo/tags). Please fetch the most recent `$TAG` and run the following command to start a container:
 
 ```bash
-docker run --rm -it -w /workdir -v $(pwd):/workdir \
+# 1. Start NeMo Framework Container
+
+TAG=...
+CHECKPOINT_PATH=/path/to/checkpoint/lama-3_2-1b-instruct_v2.0/iter_0000000"  # use absolute path
+
+docker run --rm -it -w /workdir -v $(pwd):/workdir -v $CHECKPOINT_PATH:/checkpoint/ \
   --entrypoint bash \
   --gpus all \
   nvcr.io/nvidia/nemo:${TAG}
 ```
 
-### 2. Deploy a Model
-
 ```bash
-# Deploy a NeMo checkpoint
+# Run inside the container:
+
+# 2. Deploy a Model
 python \
   /opt/Export-Deploy/scripts/deploy/nlp/deploy_ray_inframework.py \
-  --nemo_checkpoint "/path/to/your/checkpoint" \
+  --megatron_checkpoint /checkpoint \
   --model_id megatron_model \
   --port 8080 \
   --host 0.0.0.0
 ```
 
-### 3. Evaluate the Model
-
-```python
-from nemo_evaluator.api import evaluate
-from nemo_evaluator.api.api_dataclasses import ApiEndpoint, EvaluationConfig, EvaluationTarget
-
-# Configure evaluation
-api_endpoint = ApiEndpoint(
-    url="http://0.0.0.0:8080/v1/completions/",
-    type="completions",
-    model_id="megatron_model"
-)
-target = EvaluationTarget(api_endpoint=api_endpoint)
-config = EvaluationConfig(type="gsm8k", output_dir="results")
-
-# Run evaluation
-results = evaluate(target_cfg=target, eval_cfg=config)
-print(results)
+```{literalinclude} ../_snippets/nemo_fw_basic.py
+:language: python
+:start-after: "# [snippet-start]"
+:end-before: "# [snippet-end]"
 ```
 
 
@@ -86,7 +72,7 @@ Deploy multiple instances of your model:
 ```shell
 python \
   /opt/Export-Deploy/scripts/deploy/nlp/deploy_ray_inframework.py \
-  --nemo_checkpoint "meta-llama/Llama-3.1-8B" \
+  --megatron_checkpoint /checkpoint \
   --model_id "megatron_model" \
   --port 8080 \                          # Ray server port
   --num_gpus 4 \                         # Total GPUs available
@@ -120,3 +106,11 @@ if __name__ == "__main__":
         )
     evaluate(target_cfg=eval_target, eval_cfg=eval_config)
 ```
+
+## Next Steps
+
+- Explore {ref}`deployment-nemo-fw` for other deployment options
+- Integrate evaluation into your training pipeline
+- Run deployment and evaluation with NeMo Run
+- Configure adapters and interceptors for advanced evaluation scenarios
+- Explore {ref}`tutorials-overview`
@@ -347,12 +347,13 @@ Install SDK <get-started/install>
 Quickstart <get-started/quickstart/index>
 :::
 
-<!-- :::{toctree}
+:::{toctree}
 :caption: Tutorials
 :hidden:
 
 About Tutorials <tutorials/index>
-::: -->
+Tutorials for NeMo Framework <tutorials/nemo-fw/index>
+:::
 
 <!-- :::{toctree}
 :caption: Evaluation

@@ -145,32 +145,4 @@ if __name__ == "__main__":
 
 > **Tip:** If you encounter a TimeoutError on the eval client side, please increase the `request_timeout` parameter in `ConfigParams` class to a larger value like `1000` or `1200` seconds (the default is 300).
 
-## Run Evaluations with NeMo Run
 
-This section explains how to run evaluations with NeMo Run. For detailed information about [NeMo Run](https://github.com/NVIDIA/NeMo-Run), please refer to its documentation. Below is a concise guide focused on using NeMo Run to perform evaluations in NeMo.
-
-The [evaluation_with_nemo_run.py](https://github.com/NVIDIA-NeMo/Evaluator/blob/main/scripts/evaluation_with_nemo_run.py) script serves as a reference for launching evaluations with NeMo Run. This script demonstrates how to use NeMo Run with both local executors (your local workstation) and Slurm-based executors like clusters. In this setup, the deploy and evaluate processes are launched as two separate jobs with NeMo Run. The evaluate method waits until the PyTriton server is accessible and the model is deployed before starting the evaluations.
-
-> **Note:** Please make sure to update HF_TOKEN in the NeMo Run script's [local_executor env_vars](https://github.com/nvidia-nemo/evaluator/blob/main/scripts/evaluation_with_nemo_run.py#l267) with your HF_TOKEN if using local executor or in the [slurm_executor's env_vars](https://github.com/nvidia-nemo/evaluator/blob/main/scripts/evaluation_with_nemo_run.py#l232) if using slurm_executor.
-
-### Run Locally with NeMo Run
-
-To run evaluations on your local workstation, use the following command:
-
-```bash
-cd scripts
-python evaluation_with_nemo_run.py --nemo_checkpoint '/workspace/llama3_8b_nemo2/' --eval_task 'gsm8k' --devices 2
-```
-
-> **Note:** When running locally with NeMo Run, you will need to manually terminate the deploy process once evaluations are complete.
-
-### Run on Slurm-based Clusters
-
-To run evaluations on Slurm-based clusters, add the `--slurm` flag to your command and specify any custom parameters such as user, host, remote_job_dir, account, mounts, etc. Refer to the `evaluation_with_nemo_run.py` script for further details. Below is an example command:
-
-```bash
-cd scripts
-python evaluation_with_nemo_run.py --nemo_checkpoint='/workspace/llama3_8b_nemo2' --slurm --nodes 1 \
-  --devices 8 --container_image "nvcr.io/nvidia/nemo:25.07" --tensor_parallelism_size 8
-```
-By following these commands, you can successfully run evaluations using NeMo Run on both local and Slurm-based environments.
@@ -1,69 +1,23 @@
----
-orphan: true
----
-
 (tutorials-overview)=
 
 # Tutorials
 
 Master {{ product_name_short }} with hands-on tutorials and practical examples.
 
-## Before You Start
-
-Before starting the tutorials, ensure you have:
-
-- **NeMo Framework Container**: Running the latest [NeMo Framework container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo)
-- **Model Checkpoint**: Access to a NeMo 2.0 checkpoint (tutorials use Llama 3.2 1B Instruct)
-- **GPU Resources**: CUDA-compatible GPU with sufficient memory
-- **Jupyter Environment**: Ability to run Jupyter notebooks
-
----
-
-## Available Tutorials
-
-Build your expertise with these progressive tutorials:
 
 ::::{grid} 1 2 2 2
 :gutter: 1 1 1 2
 
-:::{grid-item-card} {octicon}`play;1.5em;sd-mr-1` 1. MMLU Evaluation
-:link: https://github.com/NVIDIA-NeMo/Eval/tree/main/tutorials/mmlu.ipynb
-:link-type: url
-Deploy models and run evaluations with the MMLU benchmark for both completions and chat endpoints.
-:::
-
-:::{grid-item-card} {octicon}`package;1.5em;sd-mr-1` 2. Simple Evals Framework
-:link: https://github.com/NVIDIA-NeMo/Eval/tree/main/tutorials/simple-evals.ipynb
-:link-type: url
-Discover how to extend evaluation capabilities by installing additional frameworks and running HumanEval coding assessments.
-:::
-
-:::{grid-item-card} {octicon}`tools;1.5em;sd-mr-1` 3. Custom Tasks
-:link: https://github.com/NVIDIA-NeMo/Eval/tree/main/tutorials/wikitext.ipynb
-:link-type: url
-Master custom evaluation workflows by running WikiText benchmark with advanced configuration and log-probability analysis.
+:::{grid-item-card} {octicon}`play;1.5em;sd-mr-1` Evaluation with NeMo Framework
+:link: nemo-fw/index
+:link-type: doc
+Deploy models and run evaluations using NeMo Framework container.
 :::
 
-:::{grid-item-card} {octicon}`package-dependents;1.5em;sd-mr-1` 4. Create a Framework Definition File
+:::{grid-item-card} {octicon}`package-dependents;1.5em;sd-mr-1` Create a Framework Definition File
 :link: create-framework-definition-file
 :link-type: ref
 Integrate your custom evaluation framework with {{ product_name_short }} by creating a Framework Definition File (FDF).
 :::
 
-::::
-
-## Run the Tutorials
-
-1. Start NeMo Framework Container:
-   ```bash
-   docker run --rm -it -w /workdir -v $(pwd):/workdir \
-     --entrypoint bash --gpus all \
-     nvcr.io/nvidia/nemo:${TAG}
-   ```
-
-2. Launch Jupyter:
-   ```bash
-   jupyter lab --ip=0.0.0.0 --port=8888 --allow-root
-   ```
-
-3. Navigate to the `tutorials/` directory and open the desired notebook
+::::
@@ -0,0 +1,63 @@
+
+# Tutorials for NeMo Framework
+
+## Before You Start
+
+Before starting the tutorials, ensure you have:
+
+- **NeMo Framework Container**: Running the latest [NeMo Framework container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo)
+- **Model Checkpoint**: Access to a NeMo 2.0 checkpoint (tutorials use Llama 3.2 1B Instruct)
+- **GPU Resources**: CUDA-compatible GPU with sufficient memory
+- **Jupyter Environment**: Ability to run Jupyter notebooks
+
+---
+
+## Available Tutorials
+
+Build your expertise with these progressive tutorials:
+
+::::{grid} 1 2 2 2
+:gutter: 1 1 1 2
+
+:::{grid-item-card} {octicon}`rocket;1.5em;sd-mr-1` Orchestrating evaluations with NeMo Run
+:link: nemo-run
+:link-type: doc
+Launch deployment and evaluation jobs using NeMo Run.
+:::
+
+:::{grid-item-card} {octicon}`play;1.5em;sd-mr-1` Basic evaluation with MMLU Evaluation
+:link: https://github.com/NVIDIA-NeMo/Eval/tree/main/tutorials/mmlu.ipynb
+:link-type: url
+Deploy models and run evaluations with the MMLU benchmark for both completions and chat endpoints.
+:::
+
+:::{grid-item-card} {octicon}`package;1.5em;sd-mr-1` Enable additional evaluation harnesses
+:link: https://github.com/NVIDIA-NeMo/Eval/tree/main/tutorials/simple-evals.ipynb
+:link-type: url
+Discover how to extend evaluation capabilities by installing additional harnesses and running HumanEval coding assessments.
+:::
+
+:::{grid-item-card} {octicon}`tools;1.5em;sd-mr-1` Configure custom tasks
+:link: https://github.com/NVIDIA-NeMo/Eval/tree/main/tutorials/wikitext.ipynb
+:link-type: url
+Master custom evaluation workflows by running WikiText benchmark with advanced configuration and log-probability analysis.
+:::
+
+
+::::
+
+## Run the Notebook Tutorials
+
+1. Start NeMo Framework Container:
+   ```bash
+   docker run --rm -it -w /workdir -v $(pwd):/workdir \
+     --entrypoint bash --gpus all \
+     nvcr.io/nvidia/nemo:${TAG}
+   ```
+
+2. Launch Jupyter:
+   ```bash
+   jupyter lab --ip=0.0.0.0 --port=8888 --allow-root
+   ```
+
+3. Navigate to the `tutorials/` directory and open the desired notebook