Skip to content

Commit 89faadb

Browse files
committed
add explicite instructions for checking output + minor fixes
Signed-off-by: Marta Stepniewska-Dziubinska <[email protected]>
1 parent 042dfd9 commit 89faadb

File tree

1 file changed

+40
-18
lines changed

1 file changed

+40
-18
lines changed

docs/tutorials/local-evaluation-of-existing-endpoint.md

Lines changed: 40 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -4,14 +4,8 @@ This tutorial shows how to evaluate an existing API endpoint using the Local exe
44

55
## Prerequisites
66

7-
### Installation
8-
9-
First, install the NeMo Evaluator Launcher. Refer to {ref}`gs-install` for detailed setup instructions.
10-
11-
### Requirements
12-
137
- Docker
14-
- Python environment with the NeMo Evaluator Launcher CLI available
8+
- Python environment with the NeMo Evaluator Launcher CLI available (install the launcher by following {ref}`gs-install`)
159

1610
## Step-by-Step Guide
1711

@@ -42,10 +36,12 @@ export API_KEY=...
4236

4337
#### Option III: Deploy Your Own Endpoint
4438

45-
Deploy an OpenAI-compatible endpoint using frameworks like vLLM, SGLang, TRT-LLM, or NIM. Refer to {ref}`bring-your-own-endpoint-manual` for deployment guidance
39+
Deploy an OpenAI-compatible endpoint using frameworks like vLLM, SGLang, TRT-LLM, or NIM.
40+
<!-- TODO: uncomment ref once the guide is ready -->
41+
<!-- Refer to {ref}`bring-your-own-endpoint-manual` for deployment guidance -->
4642

4743
:::{note}
48-
For this tutorial, we will use `meta/llama-3.1-8b-instruct` from [build.nvidia.com](https://build.nvidia.com/meta/llama-3_1-8b-instruct). You will need to export your NGC_API_KEY to access this endpoint.
44+
For this tutorial, we will use `meta/llama-3.1-8b-instruct` from [build.nvidia.com](https://build.nvidia.com/meta/llama-3_1-8b-instruct). You will need to export your `NGC_API_KEY` to access this endpoint.
4945
:::
5046

5147
### 2. Select Tasks
@@ -72,13 +68,8 @@ Create a `configs` directory:
7268
mkdir configs
7369
```
7470

75-
Create a configuration file with a descriptive name (e.g., `configs/local_endpoint.yaml`):
76-
77-
```bash
78-
touch configs/local_endpoint.yaml
79-
```
80-
81-
This configuration will create evaluations for 2 tasks: `ifeval` and `humaneval_instruct`:
71+
Create a configuration file with a descriptive name (e.g., `configs/local_endpoint.yaml`)
72+
and populate it with the following content:
8273

8374
```yaml
8475
defaults:
@@ -118,6 +109,8 @@ evaluation:
118109
pararrelism: 2
119110
```
120111
112+
This configuration will create evaluations for 2 tasks: `ifeval` and `humaneval_instruct`.
113+
121114
You can display the whole configuration and scripts which will be executed using `--dry-run`:
122115

123116
```
@@ -146,11 +139,40 @@ nemo-evaluator-launcher run --config-dir configs --config-name local_endpoint \
146139
-o target.api_endpoint.api_key_name=API_KEY
147140
```
148141

149-
After launching, you can view logs and job status. When jobs finish, you can display results and export them using the available exporters. Refer to {ref}`exporters-overview` for available export options.
142+
### 6. Check the Job Status and Results
143+
144+
List the runs from last 2 hours to see the invocation IDs of the two evaluation jobs:
145+
146+
```bash
147+
nemo-evaluator-launcher ls runs --since 2h # list runs from last 2 hours
148+
```
149+
150+
Use the IDs to check the jobs statuses:
151+
152+
```bash
153+
nemo-evaluator-launcher status <invocation_id1> <invocation_id2> --json
154+
```
155+
156+
When jobs finish, you can display results and export them using the available exporters:
157+
158+
```bash
159+
# Check the results
160+
cat results/*/artifacts/results.yml
161+
162+
# Check the running logs
163+
tail -f results/*/*/logs/stdout.log # use the output_dir printed by the run command
164+
165+
# Export metrics and metadata from both runs to json
166+
nemo-evaluator-launcher export <invocation_id1> <invocation_id2> --dest local --format json
167+
cat processed_results.json
168+
```
169+
170+
Refer to {ref}`exporters-overview` for available export options.
150171

151172
## Next Steps
152173

153174
- **{ref}`evaluation-configuration`**: Customize evaluation parameters and prompts
154175
- **{ref}`executors-overview`**: Try Slurm or Lepton for different environments
155-
- **{ref}`bring-your-own-endpoint-manual`**: Deploy your own endpoints with various frameworks
176+
<!-- TODO: uncoment once ready -->
177+
<!-- - **{ref}`bring-your-own-endpoint-manual`**: Deploy your own endpoints with various frameworks -->
156178
- **{ref}`exporters-overview`**: Send results to W&B, MLFlow, or other platforms

0 commit comments

Comments
 (0)