You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-**Models**: Choose any endpointfrom NVIDIA Build's extensive catalog
26
+
-**API Key**: Get from [build.nvidia.com](https://build.nvidia.com/meta/llama-3_1-8b-instruct). See [Setting up API Keys](https://docs.omniverse.nvidia.com/guide-sdg/latest/setup.html#preview-and-set-up-an-api-key).
27
+
Make sure to export the API key:
28
28
29
-
#### Option B: Deploy Your Own Endpoint
29
+
```
30
+
export NGC_API_KEY=nvapi-...
31
+
```
32
+
33
+
#### Option II: Another Hosted Endpoint
34
+
35
+
-**URL**: Your model's endpoint URL
36
+
-**Models**: Any OpenAI-compatible endpoint
37
+
-**API_KEY**: If your endpoint is gated, get an API key from your provider and export it:
38
+
39
+
```
40
+
export API_KEY=...
41
+
```
42
+
43
+
#### Option III: Deploy Your Own Endpoint
30
44
31
45
Deploy an OpenAI-compatible endpoint using frameworks like vLLM, SGLang, TRT-LLM, or NIM. Refer to {ref}`bring-your-own-endpoint-manual` for deployment guidance
32
46
33
47
:::{note}
34
-
For this tutorial we will use `meta/llama-3.1-8b-instruct` from [build.nvidia.com](https://build.nvidia.com/meta/llama-3_1-8b-instruct).
48
+
For this tutorial, we will use `meta/llama-3.1-8b-instruct` from [build.nvidia.com](https://build.nvidia.com/meta/llama-3_1-8b-instruct). You will need to export your NGC_API_KEY to access this endpoint.
35
49
:::
36
50
37
51
### 2. Select Tasks
38
52
39
-
Choose which benchmarks to evaluate. Available tasks include:
53
+
Choose which benchmarks to evaluate. You can list all available tasks with the following command:
40
54
41
55
```bash
42
56
nemo-evaluator-launcher ls tasks
@@ -47,63 +61,89 @@ For a comprehensive list of supported tasks and descriptions, see {ref}`nemo-eva
47
61
**Important**: Each task has a dedicated endpoint type (e.g., `/v1/chat/completions`, `/v1/completions`). Ensure that your model provides the correct endpoint type for the tasks you want to evaluate.
48
62
49
63
:::{note}
50
-
For this tutorial we will pick: `ifeval` and `humaneval_instruct` as these are fast, both use the chat endpoint.
64
+
For this tutorial we will pick: `ifeval` and `humaneval_instruct` as these are fast. They both use the chat endpoint.
51
65
:::
52
66
53
-
### 3. Create Configuration File
67
+
### 3. Create a Configuration File
54
68
55
-
Create a `configs` directory and your first configuration file:
69
+
Create a `configs` directory:
56
70
57
71
```bash
58
72
mkdir configs
59
73
```
60
74
61
75
Create a configuration file with a descriptive name (e.g., `configs/local_endpoint.yaml`):
62
76
63
-
This configuration will create evaluations for 2 tasks: `ifeval` and `humaneval_instruct`. You can display the whole configuration and scripts which will be executed using `--dry-run`
77
+
```bash
78
+
touch configs/local_endpoint.yaml
79
+
```
80
+
81
+
This configuration will create evaluations for 2 tasks: `ifeval` and `humaneval_instruct`:
64
82
65
83
```yaml
66
84
defaults:
67
-
- execution: local
68
-
- deployment: none
85
+
- execution: local# The evaluation will run locally on your machine using Docker
86
+
- deployment: none# Since we are evaluating an existing endpoint, we don't need to deploy the model
output_dir: results/${target.api_endpoint.model_id} # Logs and artifacts will be saved here
91
+
mode: sequential # Default: run tasks sequentially. You can also use the mode 'parallel'
73
92
74
93
target:
75
94
api_endpoint:
76
95
model_id: meta/llama-3.1-8b-instruct # TODO: update to the model you want to evaluate
77
96
url: https://integrate.api.nvidia.com/v1/chat/completions # TODO: update to the endpoint you want to evaluate
78
-
api_key_name: NGC_API_KEY # API Key with access to build.nvidia.com or model of your choice
97
+
api_key_name: NGC_API_KEY #Name of the env variable that stores the API Key with access to build.nvidia.com (or model of your choice)
79
98
80
99
# specify the benchmarks to evaluate
81
100
evaluation:
82
-
overrides: # these overrides apply to all tasks; for task-specific overrides, use the `overrides` field
83
-
config.params.request_timeout: 3600
101
+
# Optional: Global evaluation overrides - these apply to all benchmarks below
102
+
nemo_evaluator_config:
103
+
config:
104
+
params:
105
+
max_new_tokens: 32768
106
+
temperature: 0.6
107
+
top_p: 0.95
108
+
request_timeout: 1600
84
109
tasks:
85
110
- name: ifeval # use the default benchmark configuration
86
111
- name: humaneval_instruct
112
+
# Optional: Task overrides - here they apply only to the task `humaneval_instruct`
113
+
nemo_evaluator_config:
114
+
config:
115
+
params:
116
+
max_new_tokens: 1024
117
+
temperature: 0.3
118
+
pararrelism: 2
119
+
```
120
+
121
+
You can display the whole configuration and scripts which will be executed using `--dry-run`:
122
+
123
+
```
124
+
nemo-evaluator-launcher run --config-dir configs --config-name local_endpoint --dry-run
87
125
```
88
126
89
-
### 4. Run Evaluation
127
+
### 4. Run the Evaluation
128
+
129
+
Once your configuration file is complete, you can run the evaluations:
90
130
91
131
```bash
92
-
nemo-evaluator-launcher run --config-dir configs --config-name local_endpoint \
93
-
-o target.api_endpoint.api_key_name=NGC_API_KEY
132
+
nemo-evaluator-launcher run --config-dir configs --config-name local_endpoint
94
133
```
95
134
96
135
### 5. Run the Same Evaluation for a Different Model (Using CLI Overrides)
136
+
You can override the values from your configuration file using CLI overrides:
97
137
98
138
```bash
99
-
exportNGC_API_KEY=<YOUR MODEL API KEY>
139
+
exportAPI_KEY=<YOUR MODEL API KEY>
100
140
MODEL_NAME=<YOUR_MODEL_NAME>
101
141
URL=<YOUR_ENDPOINT_URL># Note: endpoint URL needs to be FULL (e.g., https://api.example.com/v1/chat/completions)
102
142
103
143
nemo-evaluator-launcher run --config-dir configs --config-name local_endpoint \
104
144
-o target.api_endpoint.model_id=$MODEL_NAME \
105
145
-o target.api_endpoint.url=$URL \
106
-
-o target.api_endpoint.api_key_name=NGC_API_KEY
146
+
-o target.api_endpoint.api_key_name=API_KEY
107
147
```
108
148
109
149
After launching, you can view logs and job status. When jobs finish, you can display results and export them using the available exporters. Refer to {ref}`exporters-overview` for available export options.
0 commit comments