You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/tutorials/local-evaluation-of-existing-endpoint.md
+40-18Lines changed: 40 additions & 18 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,14 +4,8 @@ This tutorial shows how to evaluate an existing API endpoint using the Local exe
4
4
5
5
## Prerequisites
6
6
7
-
### Installation
8
-
9
-
First, install the NeMo Evaluator Launcher. Refer to {ref}`gs-install` for detailed setup instructions.
10
-
11
-
### Requirements
12
-
13
7
- Docker
14
-
- Python environment with the NeMo Evaluator Launcher CLI available
8
+
- Python environment with the NeMo Evaluator Launcher CLI available (install the launcher by following {ref}`gs-install`)
15
9
16
10
## Step-by-Step Guide
17
11
@@ -42,10 +36,12 @@ export API_KEY=...
42
36
43
37
#### Option III: Deploy Your Own Endpoint
44
38
45
-
Deploy an OpenAI-compatible endpoint using frameworks like vLLM, SGLang, TRT-LLM, or NIM. Refer to {ref}`bring-your-own-endpoint-manual` for deployment guidance
39
+
Deploy an OpenAI-compatible endpoint using frameworks like vLLM, SGLang, TRT-LLM, or NIM.
40
+
<!-- TODO: uncomment ref once the guide is ready -->
41
+
<!-- Refer to {ref}`bring-your-own-endpoint-manual` for deployment guidance -->
46
42
47
43
:::{note}
48
-
For this tutorial, we will use `meta/llama-3.1-8b-instruct` from [build.nvidia.com](https://build.nvidia.com/meta/llama-3_1-8b-instruct). You will need to export your NGC_API_KEY to access this endpoint.
44
+
For this tutorial, we will use `meta/llama-3.1-8b-instruct` from [build.nvidia.com](https://build.nvidia.com/meta/llama-3_1-8b-instruct). You will need to export your `NGC_API_KEY` to access this endpoint.
49
45
:::
50
46
51
47
### 2. Select Tasks
@@ -72,13 +68,8 @@ Create a `configs` directory:
72
68
mkdir configs
73
69
```
74
70
75
-
Create a configuration file with a descriptive name (e.g., `configs/local_endpoint.yaml`):
76
-
77
-
```bash
78
-
touch configs/local_endpoint.yaml
79
-
```
80
-
81
-
This configuration will create evaluations for 2 tasks: `ifeval` and `humaneval_instruct`:
71
+
Create a configuration file with a descriptive name (e.g., `configs/local_endpoint.yaml`)
72
+
and populate it with the following content:
82
73
83
74
```yaml
84
75
defaults:
@@ -118,6 +109,8 @@ evaluation:
118
109
pararrelism: 2
119
110
```
120
111
112
+
This configuration will create evaluations for 2 tasks: `ifeval` and `humaneval_instruct`.
113
+
121
114
You can display the whole configuration and scripts which will be executed using `--dry-run`:
After launching, you can view logs and job status. When jobs finish, you can display results and export them using the available exporters. Refer to {ref}`exporters-overview` for available export options.
142
+
### 6. Check the Job Status and Results
143
+
144
+
List the runs from last 2 hours to see the invocation IDs of the two evaluation jobs:
145
+
146
+
```bash
147
+
nemo-evaluator-launcher ls runs --since 2h # list runs from last 2 hours
148
+
```
149
+
150
+
Use the IDs to check the jobs statuses:
151
+
152
+
```bash
153
+
nemo-evaluator-launcher status <invocation_id1> <invocation_id2> --json
154
+
```
155
+
156
+
When jobs finish, you can display results and export them using the available exporters:
157
+
158
+
```bash
159
+
# Check the results
160
+
cat results/*/artifacts/results.yml
161
+
162
+
# Check the running logs
163
+
tail -f results/*/*/logs/stdout.log # use the output_dir printed by the run command
164
+
165
+
# Export metrics and metadata from both runs to json
166
+
nemo-evaluator-launcher export <invocation_id1> <invocation_id2> --dest local --format json
167
+
cat processed_results.json
168
+
```
169
+
170
+
Refer to {ref}`exporters-overview` for available export options.
150
171
151
172
## Next Steps
152
173
153
174
- **{ref}`evaluation-configuration`**: Customize evaluation parameters and prompts
154
175
- **{ref}`executors-overview`**: Try Slurm or Lepton for different environments
155
-
-**{ref}`bring-your-own-endpoint-manual`**: Deploy your own endpoints with various frameworks
176
+
<!-- TODO: uncoment once ready -->
177
+
<!-- - **{ref}`bring-your-own-endpoint-manual`**: Deploy your own endpoints with various frameworks -->
156
178
- **{ref}`exporters-overview`**: Send results to W&B, MLFlow, or other platforms
0 commit comments