Skip to content

Commit 042dfd9

Browse files
committed
(tutorial)local evaluation
Signed-off-by: Ewa Dobrowolska <[email protected]>
1 parent 96a0a25 commit 042dfd9

File tree

2 files changed

+66
-26
lines changed

2 files changed

+66
-26
lines changed

docs/tutorials/local-evaluation-of-existing-endpoint.md

Lines changed: 65 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -15,28 +15,42 @@ First, install the NeMo Evaluator Launcher. Refer to {ref}`gs-install` for detai
1515

1616
## Step-by-Step Guide
1717

18-
### 1. Select Model
18+
### 1. Select a Model
1919

20-
You have two options:
20+
You have the following options:
2121

22-
#### Option A: Use NVIDIA Build API or Another Hosted Endpoint
22+
#### Option I: Use the NVIDIA Build API
2323

24-
- **URL**: `https://integrate.api.nvidia.com/v1/chat/completions` (or your hosted endpoint)
25-
- **Models**: You can select any OpenAI‑compatible endpoint, including those from the extensive catalog on NVIDIA Build
26-
- **API Key**: Get from [build.nvidia.com](https://build.nvidia.com/meta/llama-3_1-8b-instruct) (or your provider)
27-
- For NVIDIA APIs, see [Setting up API Keys](https://docs.omniverse.nvidia.com/guide-sdg/latest/setup.html#preview-and-set-up-an-api-key)
24+
- **URL**: `https://integrate.api.nvidia.com/v1/chat/completions`
25+
- **Models**: Choose any endpoint from NVIDIA Build's extensive catalog
26+
- **API Key**: Get from [build.nvidia.com](https://build.nvidia.com/meta/llama-3_1-8b-instruct). See [Setting up API Keys](https://docs.omniverse.nvidia.com/guide-sdg/latest/setup.html#preview-and-set-up-an-api-key).
27+
Make sure to export the API key:
2828

29-
#### Option B: Deploy Your Own Endpoint
29+
```
30+
export NGC_API_KEY=nvapi-...
31+
```
32+
33+
#### Option II: Another Hosted Endpoint
34+
35+
- **URL**: Your model's endpoint URL
36+
- **Models**: Any OpenAI-compatible endpoint
37+
- **API_KEY**: If your endpoint is gated, get an API key from your provider and export it:
38+
39+
```
40+
export API_KEY=...
41+
```
42+
43+
#### Option III: Deploy Your Own Endpoint
3044

3145
Deploy an OpenAI-compatible endpoint using frameworks like vLLM, SGLang, TRT-LLM, or NIM. Refer to {ref}`bring-your-own-endpoint-manual` for deployment guidance
3246

3347
:::{note}
34-
For this tutorial we will use `meta/llama-3.1-8b-instruct` from [build.nvidia.com](https://build.nvidia.com/meta/llama-3_1-8b-instruct).
48+
For this tutorial, we will use `meta/llama-3.1-8b-instruct` from [build.nvidia.com](https://build.nvidia.com/meta/llama-3_1-8b-instruct). You will need to export your NGC_API_KEY to access this endpoint.
3549
:::
3650

3751
### 2. Select Tasks
3852

39-
Choose which benchmarks to evaluate. Available tasks include:
53+
Choose which benchmarks to evaluate. You can list all available tasks with the following command:
4054

4155
```bash
4256
nemo-evaluator-launcher ls tasks
@@ -47,63 +61,89 @@ For a comprehensive list of supported tasks and descriptions, see {ref}`nemo-eva
4761
**Important**: Each task has a dedicated endpoint type (e.g., `/v1/chat/completions`, `/v1/completions`). Ensure that your model provides the correct endpoint type for the tasks you want to evaluate.
4862

4963
:::{note}
50-
For this tutorial we will pick: `ifeval` and `humaneval_instruct` as these are fast, both use the chat endpoint.
64+
For this tutorial we will pick: `ifeval` and `humaneval_instruct` as these are fast. They both use the chat endpoint.
5165
:::
5266

53-
### 3. Create Configuration File
67+
### 3. Create a Configuration File
5468

55-
Create a `configs` directory and your first configuration file:
69+
Create a `configs` directory:
5670

5771
```bash
5872
mkdir configs
5973
```
6074

6175
Create a configuration file with a descriptive name (e.g., `configs/local_endpoint.yaml`):
6276

63-
This configuration will create evaluations for 2 tasks: `ifeval` and `humaneval_instruct`. You can display the whole configuration and scripts which will be executed using `--dry-run`
77+
```bash
78+
touch configs/local_endpoint.yaml
79+
```
80+
81+
This configuration will create evaluations for 2 tasks: `ifeval` and `humaneval_instruct`:
6482

6583
```yaml
6684
defaults:
67-
- execution: local
68-
- deployment: none
85+
- execution: local # The evaluation will run locally on your machine using Docker
86+
- deployment: none # Since we are evaluating an existing endpoint, we don't need to deploy the model
6987
- _self_
7088

7189
execution:
72-
output_dir: results/${target.api_endpoint.model_id}
90+
output_dir: results/${target.api_endpoint.model_id} # Logs and artifacts will be saved here
91+
mode: sequential # Default: run tasks sequentially. You can also use the mode 'parallel'
7392

7493
target:
7594
api_endpoint:
7695
model_id: meta/llama-3.1-8b-instruct # TODO: update to the model you want to evaluate
7796
url: https://integrate.api.nvidia.com/v1/chat/completions # TODO: update to the endpoint you want to evaluate
78-
api_key_name: NGC_API_KEY # API Key with access to build.nvidia.com or model of your choice
97+
api_key_name: NGC_API_KEY # Name of the env variable that stores the API Key with access to build.nvidia.com (or model of your choice)
7998

8099
# specify the benchmarks to evaluate
81100
evaluation:
82-
overrides: # these overrides apply to all tasks; for task-specific overrides, use the `overrides` field
83-
config.params.request_timeout: 3600
101+
# Optional: Global evaluation overrides - these apply to all benchmarks below
102+
nemo_evaluator_config:
103+
config:
104+
params:
105+
max_new_tokens: 32768
106+
temperature: 0.6
107+
top_p: 0.95
108+
request_timeout: 1600
84109
tasks:
85110
- name: ifeval # use the default benchmark configuration
86111
- name: humaneval_instruct
112+
# Optional: Task overrides - here they apply only to the task `humaneval_instruct`
113+
nemo_evaluator_config:
114+
config:
115+
params:
116+
max_new_tokens: 1024
117+
temperature: 0.3
118+
pararrelism: 2
119+
```
120+
121+
You can display the whole configuration and scripts which will be executed using `--dry-run`:
122+
123+
```
124+
nemo-evaluator-launcher run --config-dir configs --config-name local_endpoint --dry-run
87125
```
88126
89-
### 4. Run Evaluation
127+
### 4. Run the Evaluation
128+
129+
Once your configuration file is complete, you can run the evaluations:
90130
91131
```bash
92-
nemo-evaluator-launcher run --config-dir configs --config-name local_endpoint \
93-
-o target.api_endpoint.api_key_name=NGC_API_KEY
132+
nemo-evaluator-launcher run --config-dir configs --config-name local_endpoint
94133
```
95134

96135
### 5. Run the Same Evaluation for a Different Model (Using CLI Overrides)
136+
You can override the values from your configuration file using CLI overrides:
97137

98138
```bash
99-
export NGC_API_KEY=<YOUR MODEL API KEY>
139+
export API_KEY=<YOUR MODEL API KEY>
100140
MODEL_NAME=<YOUR_MODEL_NAME>
101141
URL=<YOUR_ENDPOINT_URL> # Note: endpoint URL needs to be FULL (e.g., https://api.example.com/v1/chat/completions)
102142

103143
nemo-evaluator-launcher run --config-dir configs --config-name local_endpoint \
104144
-o target.api_endpoint.model_id=$MODEL_NAME \
105145
-o target.api_endpoint.url=$URL \
106-
-o target.api_endpoint.api_key_name=NGC_API_KEY
146+
-o target.api_endpoint.api_key_name=API_KEY
107147
```
108148

109149
After launching, you can view logs and job status. When jobs finish, you can display results and export them using the available exporters. Refer to {ref}`exporters-overview` for available export options.

packages/nemo-evaluator-launcher/src/nemo_evaluator_launcher/resources/mapping.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -226,7 +226,7 @@ required_env_vars = []
226226
[bigcode-evaluation-harness.tasks.completions.humaneval]
227227
required_env_vars = []
228228

229-
[bigcode-evaluation-harness.tasks.completions.humaneval_instruct]
229+
[bigcode-evaluation-harness.tasks.chat.humaneval_instruct]
230230

231231

232232
###############################################################################

0 commit comments

Comments
 (0)