Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/get-started/_snippets/arc_challenge.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@
top_p=0,
parallelism=1,
extra={
"tokenizer": "/checkpoints/llama-3_2-1b-instruct_v2.0/context/nemo_tokenizer",
"tokenizer": "/checkpoint/tokenizer",
"tokenizer_backend": "huggingface",
},
),
Expand Down
39 changes: 39 additions & 0 deletions docs/get-started/_snippets/nemo_fw_basic.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#!/usr/bin/env python3

# [snippet-start]
# Start Python in a new terminal
# 3. Launch evaluation:

from nemo_evaluator.api import evaluate
from nemo_evaluator.api.api_dataclasses import (
ApiEndpoint,
EvaluationConfig,
EvaluationTarget,
)

# Configure evaluation
api_endpoint = ApiEndpoint(
url="http://0.0.0.0:8080/v1/completions/",
type="completions",
model_id="megatron_model",
)
target = EvaluationTarget(api_endpoint=api_endpoint)
config = EvaluationConfig(type="gsm8k", output_dir="results")

# Run evaluation
results = evaluate(target_cfg=target, eval_cfg=config)
print(results)
# [snippet-end]
9 changes: 9 additions & 0 deletions docs/get-started/quickstart/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,14 @@ Unified CLI experience with automated container management, built-in orchestrati
Programmatic control with full adapter features, custom configurations, and direct API access for integration into existing workflows.
:::

:::{grid-item-card} {octicon}`gear;1.5em;sd-mr-1` NeMo Framework Container
:link: gs-quickstart-nemo-fw
:link-type: ref
**For NeMo Framework Users**

End-to-end training and evaluation of large language models (LLMs).
:::

:::{grid-item-card} {octicon}`container;1.5em;sd-mr-1` Container Direct
:link: gs-quickstart-container
:link-type: ref
Expand Down Expand Up @@ -272,5 +280,6 @@ nemo-evaluator-launcher run --config-dir packages/nemo-evaluator-launcher/exampl

NeMo Evaluator Launcher <launcher>
NeMo Evaluator Core <core>
NeMo Framework Container <nemo-fw>
Container Direct <container>
```
60 changes: 27 additions & 33 deletions docs/get-started/quickstart/nemo-fw.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,3 @@
---
orphan: true
---

(gs-quickstart-nemo-fw)=
# Evaluate checkpoints trained by NeMo Framework

Expand All @@ -12,52 +8,42 @@ The NeMo Evaluator is integrated within NeMo Framework, offering streamlined dep

## Prerequisites

- Docker with GPU support
- NeMo Framework docker container
- Docker installed
- CUDA-compatible GPU
- [NeMo Framework docker container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo/tags)
- Your model checkpoint (or use [Llama 3.2 1B Instruct](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/llama-3_2-1b-instruct) for testing)

## Quick Start

### 1. Start NeMo Framework Container

For optimal performance and user experience, use the latest version of the [NeMo Framework container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo/tags). Please fetch the most recent `$TAG` and run the following command to start a container:

```bash
docker run --rm -it -w /workdir -v $(pwd):/workdir \
# 1. Start NeMo Framework Container

TAG=...
CHECKPOINT_PATH=/path/to/checkpoint/lama-3_2-1b-instruct_v2.0/iter_0000000" # use absolute path

docker run --rm -it -w /workdir -v $(pwd):/workdir -v $CHECKPOINT_PATH:/checkpoint/ \
--entrypoint bash \
--gpus all \
nvcr.io/nvidia/nemo:${TAG}
```

### 2. Deploy a Model

```bash
# Deploy a NeMo checkpoint
# Run inside the container:

# 2. Deploy a Model
python \
/opt/Export-Deploy/scripts/deploy/nlp/deploy_ray_inframework.py \
--nemo_checkpoint "/path/to/your/checkpoint" \
--megatron_checkpoint /checkpoint \
--model_id megatron_model \
--port 8080 \
--host 0.0.0.0
```

### 3. Evaluate the Model

```python
from nemo_evaluator.api import evaluate
from nemo_evaluator.api.api_dataclasses import ApiEndpoint, EvaluationConfig, EvaluationTarget

# Configure evaluation
api_endpoint = ApiEndpoint(
url="http://0.0.0.0:8080/v1/completions/",
type="completions",
model_id="megatron_model"
)
target = EvaluationTarget(api_endpoint=api_endpoint)
config = EvaluationConfig(type="gsm8k", output_dir="results")

# Run evaluation
results = evaluate(target_cfg=target, eval_cfg=config)
print(results)
```{literalinclude} ../_snippets/nemo_fw_basic.py
:language: python
:start-after: "# [snippet-start]"
:end-before: "# [snippet-end]"
```


Expand Down Expand Up @@ -86,7 +72,7 @@ Deploy multiple instances of your model:
```shell
python \
/opt/Export-Deploy/scripts/deploy/nlp/deploy_ray_inframework.py \
--nemo_checkpoint "meta-llama/Llama-3.1-8B" \
--megatron_checkpoint /checkpoint \
--model_id "megatron_model" \
--port 8080 \ # Ray server port
--num_gpus 4 \ # Total GPUs available
Expand Down Expand Up @@ -120,3 +106,11 @@ if __name__ == "__main__":
)
evaluate(target_cfg=eval_target, eval_cfg=eval_config)
```

## Next Steps

- Explore {ref}`deployment-nemo-fw` for other deployment options
- Integrate evaluation into your training pipeline
- Run deployment and evaluation with NeMo Run
- Configure adapters and interceptors for advanced evaluation scenarios
- Explore {ref}`tutorials-overview`
5 changes: 3 additions & 2 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -347,12 +347,13 @@ Install SDK <get-started/install>
Quickstart <get-started/quickstart/index>
:::

<!-- :::{toctree}
:::{toctree}
:caption: Tutorials
:hidden:

About Tutorials <tutorials/index>
::: -->
Tutorials for NeMo Framework <tutorials/nemo-fw/index>
:::

<!-- :::{toctree}
:caption: Evaluation
Expand Down
28 changes: 0 additions & 28 deletions docs/nemo-fw/evaluation-doc.md
Original file line number Diff line number Diff line change
Expand Up @@ -145,32 +145,4 @@ if __name__ == "__main__":

> **Tip:** If you encounter a TimeoutError on the eval client side, please increase the `request_timeout` parameter in `ConfigParams` class to a larger value like `1000` or `1200` seconds (the default is 300).

## Run Evaluations with NeMo Run

This section explains how to run evaluations with NeMo Run. For detailed information about [NeMo Run](https://github.com/NVIDIA/NeMo-Run), please refer to its documentation. Below is a concise guide focused on using NeMo Run to perform evaluations in NeMo.

The [evaluation_with_nemo_run.py](https://github.com/NVIDIA-NeMo/Evaluator/blob/main/scripts/evaluation_with_nemo_run.py) script serves as a reference for launching evaluations with NeMo Run. This script demonstrates how to use NeMo Run with both local executors (your local workstation) and Slurm-based executors like clusters. In this setup, the deploy and evaluate processes are launched as two separate jobs with NeMo Run. The evaluate method waits until the PyTriton server is accessible and the model is deployed before starting the evaluations.

> **Note:** Please make sure to update HF_TOKEN in the NeMo Run script's [local_executor env_vars](https://github.com/nvidia-nemo/evaluator/blob/main/scripts/evaluation_with_nemo_run.py#l267) with your HF_TOKEN if using local executor or in the [slurm_executor's env_vars](https://github.com/nvidia-nemo/evaluator/blob/main/scripts/evaluation_with_nemo_run.py#l232) if using slurm_executor.

### Run Locally with NeMo Run

To run evaluations on your local workstation, use the following command:

```bash
cd scripts
python evaluation_with_nemo_run.py --nemo_checkpoint '/workspace/llama3_8b_nemo2/' --eval_task 'gsm8k' --devices 2
```

> **Note:** When running locally with NeMo Run, you will need to manually terminate the deploy process once evaluations are complete.

### Run on Slurm-based Clusters

To run evaluations on Slurm-based clusters, add the `--slurm` flag to your command and specify any custom parameters such as user, host, remote_job_dir, account, mounts, etc. Refer to the `evaluation_with_nemo_run.py` script for further details. Below is an example command:

```bash
cd scripts
python evaluation_with_nemo_run.py --nemo_checkpoint='/workspace/llama3_8b_nemo2' --slurm --nodes 1 \
--devices 8 --container_image "nvcr.io/nvidia/nemo:25.07" --tensor_parallelism_size 8
```
By following these commands, you can successfully run evaluations using NeMo Run on both local and Slurm-based environments.
58 changes: 6 additions & 52 deletions docs/tutorials/index.md
Original file line number Diff line number Diff line change
@@ -1,69 +1,23 @@
---
orphan: true
---

(tutorials-overview)=

# Tutorials

Master {{ product_name_short }} with hands-on tutorials and practical examples.

## Before You Start

Before starting the tutorials, ensure you have:

- **NeMo Framework Container**: Running the latest [NeMo Framework container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo)
- **Model Checkpoint**: Access to a NeMo 2.0 checkpoint (tutorials use Llama 3.2 1B Instruct)
- **GPU Resources**: CUDA-compatible GPU with sufficient memory
- **Jupyter Environment**: Ability to run Jupyter notebooks

---

## Available Tutorials

Build your expertise with these progressive tutorials:

::::{grid} 1 2 2 2
:gutter: 1 1 1 2

:::{grid-item-card} {octicon}`play;1.5em;sd-mr-1` 1. MMLU Evaluation
:link: https://github.com/NVIDIA-NeMo/Eval/tree/main/tutorials/mmlu.ipynb
:link-type: url
Deploy models and run evaluations with the MMLU benchmark for both completions and chat endpoints.
:::

:::{grid-item-card} {octicon}`package;1.5em;sd-mr-1` 2. Simple Evals Framework
:link: https://github.com/NVIDIA-NeMo/Eval/tree/main/tutorials/simple-evals.ipynb
:link-type: url
Discover how to extend evaluation capabilities by installing additional frameworks and running HumanEval coding assessments.
:::

:::{grid-item-card} {octicon}`tools;1.5em;sd-mr-1` 3. Custom Tasks
:link: https://github.com/NVIDIA-NeMo/Eval/tree/main/tutorials/wikitext.ipynb
:link-type: url
Master custom evaluation workflows by running WikiText benchmark with advanced configuration and log-probability analysis.
:::{grid-item-card} {octicon}`play;1.5em;sd-mr-1` Evaluation with NeMo Framework
:link: nemo-fw/index
:link-type: doc
Deploy models and run evaluations using NeMo Framework container.
:::

:::{grid-item-card} {octicon}`package-dependents;1.5em;sd-mr-1` 4. Create a Framework Definition File
:::{grid-item-card} {octicon}`package-dependents;1.5em;sd-mr-1` Create a Framework Definition File
:link: create-framework-definition-file
:link-type: ref
Integrate your custom evaluation framework with {{ product_name_short }} by creating a Framework Definition File (FDF).
:::

::::

## Run the Tutorials

1. Start NeMo Framework Container:
```bash
docker run --rm -it -w /workdir -v $(pwd):/workdir \
--entrypoint bash --gpus all \
nvcr.io/nvidia/nemo:${TAG}
```

2. Launch Jupyter:
```bash
jupyter lab --ip=0.0.0.0 --port=8888 --allow-root
```

3. Navigate to the `tutorials/` directory and open the desired notebook
::::
63 changes: 63 additions & 0 deletions docs/tutorials/nemo-fw/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@

# Tutorials for NeMo Framework

## Before You Start

Before starting the tutorials, ensure you have:

- **NeMo Framework Container**: Running the latest [NeMo Framework container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo)
- **Model Checkpoint**: Access to a NeMo 2.0 checkpoint (tutorials use Llama 3.2 1B Instruct)
- **GPU Resources**: CUDA-compatible GPU with sufficient memory
- **Jupyter Environment**: Ability to run Jupyter notebooks

---

## Available Tutorials

Build your expertise with these progressive tutorials:

::::{grid} 1 2 2 2
:gutter: 1 1 1 2

:::{grid-item-card} {octicon}`rocket;1.5em;sd-mr-1` Orchestrating evaluations with NeMo Run
:link: nemo-run
:link-type: doc
Launch deployment and evaluation jobs using NeMo Run.
:::

:::{grid-item-card} {octicon}`play;1.5em;sd-mr-1` Basic evaluation with MMLU Evaluation
:link: https://github.com/NVIDIA-NeMo/Eval/tree/main/tutorials/mmlu.ipynb
:link-type: url
Deploy models and run evaluations with the MMLU benchmark for both completions and chat endpoints.
:::

:::{grid-item-card} {octicon}`package;1.5em;sd-mr-1` Enable additional evaluation harnesses
:link: https://github.com/NVIDIA-NeMo/Eval/tree/main/tutorials/simple-evals.ipynb
:link-type: url
Discover how to extend evaluation capabilities by installing additional harnesses and running HumanEval coding assessments.
:::

:::{grid-item-card} {octicon}`tools;1.5em;sd-mr-1` Configure custom tasks
:link: https://github.com/NVIDIA-NeMo/Eval/tree/main/tutorials/wikitext.ipynb
:link-type: url
Master custom evaluation workflows by running WikiText benchmark with advanced configuration and log-probability analysis.
:::


::::

## Run the Notebook Tutorials

1. Start NeMo Framework Container:
```bash
docker run --rm -it -w /workdir -v $(pwd):/workdir \
--entrypoint bash --gpus all \
nvcr.io/nvidia/nemo:${TAG}
```

2. Launch Jupyter:
```bash
jupyter lab --ip=0.0.0.0 --port=8888 --allow-root
```

3. Navigate to the `tutorials/` directory and open the desired notebook
Loading
Loading