You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
NeMo Evaluator is an open-source platform for robust, reproducible, and scalable evaluation of Large Language Models. It enables you to run hundreds of benchmarks across popular evaluation harnesses against any OpenAI-compatible model API. Evaluations execute in open-source Docker containers for auditable and trustworthy results. The platform's containerized architecture allows for the rapid integration of public benchmarks and private datasets.
NeMo Evaluator is built on four core principles to provide a reliable and versatile evaluation experience:
20
20
@@ -23,7 +23,7 @@ NeMo Evaluator is built on four core principles to provide a reliable and versat
23
23
-**State-of-the-Art Benchmarking**: Access a comprehensive suite of over 100 benchmarks from 18 popular open-source evaluation harnesses. See the full list of [Supported benchmarks and evaluation harnesses](#supported-benchmarks-and-evaluation-harnesses).
24
24
-**Extensible and Customizable**: Integrate new evaluation harnesses, add custom benchmarks with proprietary data, and define custom result exporters for existing MLOps tooling.
25
25
26
-
###How It Works: Launcher and Core Engine
26
+
## How It Works: Launcher and Core Engine
27
27
28
28
The platform consists of two main components:
29
29
@@ -51,7 +51,7 @@ graph TD
51
51
```
52
52
53
53
54
-
###🚀 Quickstart
54
+
## 🚀 Quickstart
55
55
56
56
Get your first evaluation result in minutes. This guide uses your local machine to run a small benchmark against an OpenAI API-compatible endpoint.
57
57
@@ -63,7 +63,7 @@ The launcher is the only package required to get started.
63
63
pip install nemo-evaluator-launcher
64
64
```
65
65
66
-
####2. Set Up Your Model Endpoint
66
+
### 2. Set Up Your Model Endpoint
67
67
68
68
NeMo Evaluator works with any model that exposes an OpenAI-compatible endpoint. For this quickstart, we will use the OpenAI API.
69
69
@@ -84,7 +84,7 @@ To use out-of-the-box build.nvidia.com APIs, you need an API key:
84
84
2. In the Setup menu under Keys/Secrets, generate an API key.
85
85
3. Set the environment variable by executing `export NGC_API_KEY=<YOUR_API_KEY>`.
86
86
87
-
####3. Run Your First Evaluation
87
+
### 3. Run Your First Evaluation
88
88
89
89
Run a small evaluation on your local machine. The launcher automatically pulls the correct container and executes the benchmark. The list of benchmarks is directly configured in the YAML file.
90
90
@@ -99,15 +99,15 @@ nemo-evaluator-launcher run --config-dir packages/nemo-evaluator-launcher/exampl
99
99
100
100
After running this command, you will see a `job_id`, which can be used to track the job and its results. All logs will be available in your `<YOUR_OUTPUT_LOCAL_DIR>`.
101
101
102
-
####4. Check Your Results
102
+
### 4. Check Your Results
103
103
104
104
Results, logs, and run configurations are saved locally. Inspect the status of the evaluation job by using the corresponding `job_id`:
105
105
106
106
```bash
107
107
nemo-evaluator-launcher status <job_id_or_invocation_id>
108
108
```
109
109
110
-
####Next Steps
110
+
### Next Steps
111
111
112
112
- List all supported benchmarks:
113
113
@@ -120,7 +120,65 @@ nemo-evaluator-launcher status <job_id_or_invocation_id>
120
120
- Learn to evaluate self-hosted models in the extended [Tutorial guide](./docs/nemo-evaluator-launcher/tutorial.md) for nemo-evaluator-launcher.
121
121
- Customize your workflow with [Custom Exporters](./docs/nemo-evaluator-launcher/exporters/overview.md) or by evaluating with [proprietary data](./docs/nemo-evaluator/extending/framework-definition-file.md).
122
122
123
-
### Supported Benchmarks and Evaluation Harnesses
123
+
124
+
## 🧩 Evaluate checkpoints trained by NeMo Framework
125
+
126
+
The NeMo Framework is NVIDIA’s GPU-accelerated, end-to-end training platform for large language models (LLMs), multimodal models, and speech models. It enables seamless scaling of both pretraining and post-training workloads, from a single GPU to clusters with thousands of nodes, supporting Hugging Face/PyTorch and Megatron models. NeMo includes a suite of libraries and curated training recipes to help users build models from start to finish.
127
+
128
+
The NeMo Evaluator is integrated within NeMo Framework, offering streamlined deployment and advanced evaluation capabilities for models trained using NeMo, leveraging state-of-the-art evaluation harnesses.
129
+
130
+
### Features
131
+
132
+
-**Multi-Backend Deployment**: Supports PyTriton and multi-instance evaluations using the Ray Serve deployment backend
133
+
-**Production-Ready**: Supports high-performance inference with CUDA graphs and flash decoding
134
+
-**Multi-GPU and Multi-Node Support**: Enables distributed inference across multiple GPUs and compute nodes
135
+
-**OpenAI-Compatible API**: Provides RESTful endpoints aligned with OpenAI API specifications
136
+
137
+
### 1. Start NeMo Framework Container
138
+
139
+
For optimal performance and user experience, use the latest version of the [NeMo Framework container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo/tags). Please fetch the most recent `$TAG` and run the following command to start a container:
140
+
141
+
```bash
142
+
docker run --rm -it -w /workdir -v $(pwd):/workdir \
## 📊 Supported Benchmarks and Evaluation Harnesses
124
182
125
183
NeMo Evaluator Launcher provides pre-built evaluation containers for different evaluation harnesses through the NVIDIA NGC catalog. Each harness supports a variety of benchmarks, which can then be called via `nemo-evaluator`. This table provides a list of benchmark names per harness. A more detailed list of task names can be found in the [list of NGC containers](./docs/nemo-evaluator/index.md#ngc-containers).
126
184
@@ -144,6 +202,20 @@ NeMo Evaluator Launcher provides pre-built evaluation containers for different e
|**vlmevalkit**| Vision-language model evaluation |[Link](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/eval-factory/containers/vlmevalkit)|`25.08.1`| AI2D, ChartQA, OCRBench, SlideVQA |
146
204
147
-
### Contribution Guide
148
205
149
-
We welcome community contributions. Please see our [Contribution Guide](https://github.com/NVIDIA-NeMo/Eval/blob/main/CONTRIBUTING.md) for instructions on submitting pull requests, reporting issues, and suggesting features.
206
+
207
+
## 🤝 Contribution Guide
208
+
209
+
We welcome community contributions. Please see our [Contribution Guide](https://github.com/NVIDIA-NeMo/Evaluator/blob/main/CONTRIBUTING.md) for instructions on submitting pull requests, reporting issues, and suggesting features.
210
+
211
+
212
+
## 📄 License
213
+
214
+
This project is licensed under the Apache License 2.0. See the [LICENSE](https://github.com/NVIDIA-NeMo/Evaluator/blob/main/LICENSE) file for details.
Copy file name to clipboardExpand all lines: docs/index.md
+5-1Lines changed: 5 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -29,7 +29,7 @@ graph TD
29
29
B -- " " --> D{Slurm};
30
30
B -- " " --> E{Lepton};
31
31
subgraph Execution Environment
32
-
C -- "Launches Container" --> F[Evaluation Container];
32
+
C -- "Launches Container" --> F[Evaluation Container];
33
33
D -- "Launches Container" --> F;
34
34
E -- "Launches Container" --> F;
35
35
end
@@ -96,6 +96,10 @@ Results, logs, and run configurations are saved locally. Inspect the status of t
96
96
nemo-evaluator-launcher status <job_id_or_invocation_id>
97
97
```
98
98
99
+
/// note | About invocation and job IDs
100
+
It is possible to use short version of IDs in `status` command, for example `abcd` instead of a full `abcdef0123456` or `ab.0` instead of `abcdef0123456.0`, so long as there are no collisions. This is a syntactic sugar allowing for a slightly easier usage.
Generic deployment provides flexible configuration for deploying any custom server that isn't covered by built-in deployment configurations.
4
+
5
+
## Configuration
6
+
7
+
See [Generic Config](../../../../packages/nemo-evaluator-launcher/src/nemo_evaluator_launcher/configs/deployment/generic.yaml) for all available parameters.
8
+
9
+
Key arguments:
10
+
-**`image`**: Docker image to use for deployment (required)
11
+
-**`command`**: Command to run the server with template variables (required)
12
+
-**`served_model_name`**: Name of the served model (required)
13
+
-**`endpoints`**: API endpoint paths (chat, completions, health)
14
+
-**`checkpoint_path`**: Path to model checkpoint for mounting (default: null)
15
+
-**`extra_args`**: Additional command line arguments
16
+
-**`env_vars`**: Environment variables as {name: value} dict
17
+
18
+
## Best Practices
19
+
- Ensure server responds to health check endpoint (ensure that health endpoint is correctly parametrized)
20
+
- Test configuration with `--dry_run`
21
+
22
+
## Contributing Permanent Configurations
23
+
24
+
If you've successfully applied the generic deployment to serve a specific model or framework, contributions are welcome! We'll turn your working configuration into a permanent config file for the community.
-**[Generic](generic.md)**: Custom server deployment with flexible configuration
10
11
-**[None](none.md)**: Use existing endpoints (no deployment)
11
12
12
13
## Quick Reference
@@ -21,8 +22,17 @@ deployment:
21
22
-**vLLM**: General-purpose LLM serving
22
23
-**SGLang**: General-purpose LLM serving
23
24
-**NIM**: NVIDIA hardware optimized deployments
25
+
-**Generic**: Custom servers not covered by built-in configs
24
26
-**None**: Existing endpoints
25
27
28
+
## Custom Server Integration
29
+
30
+
**Need to deploy a server not covered by built-in configs?**
31
+
32
+
**Quick integration**: Use [Generic deployment](generic.md) for any Docker-based server with OpenAI-compatible API.
33
+
34
+
**Advanced integration**: Create custom deployment template in `configs/deployment/` for reusable configurations.
35
+
26
36
## Configuration Files
27
37
28
38
See all available deployment configurations: [Deployment Configs](../../../../packages/nemo-evaluator-launcher/src/nemo_evaluator_launcher/configs/deployment)
0 commit comments