(tutorial)local evaluation

e-dobrowolska · e-dobrowolska · commit 042dfd9ec9b4 · 2025-10-24T16:52:36.000+02:00
Signed-off-by: Ewa Dobrowolska &lt;edobrowolska@nvidia.com&gt;
diff --git a/docs/tutorials/local-evaluation-of-existing-endpoint.md b/docs/tutorials/local-evaluation-of-existing-endpoint.md
@@ -15,28 +15,42 @@ First, install the NeMo Evaluator Launcher. Refer to {ref}`gs-install` for detai
 
 ## Step-by-Step Guide
 
-### 1. Select Model
+### 1. Select a Model
 
-You have two options:
+You have the following options:
 
-#### Option A: Use NVIDIA Build API or Another Hosted Endpoint
+#### Option I: Use the NVIDIA Build API
 
-- **URL**: `https://integrate.api.nvidia.com/v1/chat/completions` (or your hosted endpoint)
-- **Models**: You can select any OpenAI‑compatible endpoint, including those from the extensive catalog on NVIDIA Build
-- **API Key**: Get from [build.nvidia.com](https://build.nvidia.com/meta/llama-3_1-8b-instruct) (or your provider)
-  - For NVIDIA APIs, see [Setting up API Keys](https://docs.omniverse.nvidia.com/guide-sdg/latest/setup.html#preview-and-set-up-an-api-key)
+- **URL**: `https://integrate.api.nvidia.com/v1/chat/completions`
+- **Models**: Choose any endpoint from NVIDIA Build's extensive catalog
+- **API Key**: Get from [build.nvidia.com](https://build.nvidia.com/meta/llama-3_1-8b-instruct). See [Setting up API Keys](https://docs.omniverse.nvidia.com/guide-sdg/latest/setup.html#preview-and-set-up-an-api-key).
+  Make sure to export the API key:
 
-#### Option B: Deploy Your Own Endpoint
+```
+export NGC_API_KEY=nvapi-...
+```
+
+#### Option II: Another Hosted Endpoint
+
+- **URL**: Your model's endpoint URL
+- **Models**: Any OpenAI-compatible endpoint
+- **API_KEY**: If your endpoint is gated, get an API key from your provider and export it:
+
+```
+export API_KEY=...
+```
+
+#### Option III: Deploy Your Own Endpoint
 
 Deploy an OpenAI-compatible endpoint using frameworks like vLLM, SGLang, TRT-LLM, or NIM. Refer to {ref}`bring-your-own-endpoint-manual` for deployment guidance
 
 :::{note}
-For this tutorial we will use `meta/llama-3.1-8b-instruct` from [build.nvidia.com](https://build.nvidia.com/meta/llama-3_1-8b-instruct).
+For this tutorial, we will use `meta/llama-3.1-8b-instruct` from [build.nvidia.com](https://build.nvidia.com/meta/llama-3_1-8b-instruct). You will need to export your NGC_API_KEY to access this endpoint.
 :::
 
 ### 2. Select Tasks
 
-Choose which benchmarks to evaluate. Available tasks include:
+Choose which benchmarks to evaluate. You can list all available tasks with the following command:
 
 ```bash
 nemo-evaluator-launcher ls tasks
@@ -47,63 +61,89 @@ For a comprehensive list of supported tasks and descriptions, see {ref}`nemo-eva
 **Important**: Each task has a dedicated endpoint type (e.g., `/v1/chat/completions`, `/v1/completions`). Ensure that your model provides the correct endpoint type for the tasks you want to evaluate.
 
 :::{note}
-For this tutorial we will pick: `ifeval` and `humaneval_instruct` as these are fast, both use the chat endpoint.
+For this tutorial we will pick: `ifeval` and `humaneval_instruct` as these are fast. They both use the chat endpoint.
 :::
 
-### 3. Create Configuration File
+### 3. Create a Configuration File
 
-Create a `configs` directory and your first configuration file:
+Create a `configs` directory:
 
 ```bash
 mkdir configs
 ```
 
 Create a configuration file with a descriptive name (e.g., `configs/local_endpoint.yaml`):
 
-This configuration will create evaluations for 2 tasks: `ifeval` and `humaneval_instruct`. You can display the whole configuration and scripts which will be executed using `--dry-run`
+```bash
+touch configs/local_endpoint.yaml
+```
+
+This configuration will create evaluations for 2 tasks: `ifeval` and `humaneval_instruct`:
 
 ```yaml
 defaults:
-  - execution: local
-  - deployment: none
+  - execution: local  # The evaluation will run locally on your machine using Docker
+  - deployment: none  # Since we are evaluating an existing endpoint,  we don't need to deploy the model
   - _self_
 
 execution:
-  output_dir: results/${target.api_endpoint.model_id}
+  output_dir: results/${target.api_endpoint.model_id}  # Logs and artifacts will be saved here
+  mode: sequential # Default: run tasks sequentially. You can also use the mode 'parallel'
 
 target:
   api_endpoint:
     model_id: meta/llama-3.1-8b-instruct  # TODO: update to the model you want to evaluate
     url: https://integrate.api.nvidia.com/v1/chat/completions  # TODO: update to the endpoint you want to evaluate
-    api_key_name: NGC_API_KEY  # API Key with access to build.nvidia.com or model of your choice
+    api_key_name: NGC_API_KEY  # Name of the env variable that stores the API Key with access to build.nvidia.com (or model of your choice)
 
 # specify the benchmarks to evaluate
 evaluation:
-  overrides:  # these overrides apply to all tasks; for task-specific overrides, use the `overrides` field
-    config.params.request_timeout: 3600
+  # Optional: Global evaluation overrides - these apply to all benchmarks below
+  nemo_evaluator_config:
+    config:
+      params:
+        max_new_tokens: 32768
+        temperature: 0.6
+        top_p: 0.95
+        request_timeout: 1600
   tasks:
     - name: ifeval  # use the default benchmark configuration
     - name: humaneval_instruct
+      # Optional: Task overrides - here they apply only to the task `humaneval_instruct`
+      nemo_evaluator_config:
+        config:
+          params:
+            max_new_tokens: 1024
+            temperature: 0.3
+            pararrelism: 2
+```
+
+You can display the whole configuration and scripts which will be executed using `--dry-run`:
+
+```
+nemo-evaluator-launcher run --config-dir configs --config-name local_endpoint --dry-run
 ```
 
-### 4. Run Evaluation
+### 4. Run the Evaluation
+
+Once your configuration file is complete, you can run the evaluations:
 
 ```bash
-nemo-evaluator-launcher run --config-dir configs --config-name local_endpoint \
-  -o target.api_endpoint.api_key_name=NGC_API_KEY
+nemo-evaluator-launcher run --config-dir configs --config-name local_endpoint
 ```
 
 ### 5. Run the Same Evaluation for a Different Model (Using CLI Overrides)
+You can override the values from your configuration file using CLI overrides:
 
 ```bash
-export NGC_API_KEY=<YOUR MODEL API KEY>
+export API_KEY=<YOUR MODEL API KEY>
 MODEL_NAME=<YOUR_MODEL_NAME>
 URL=<YOUR_ENDPOINT_URL>  # Note: endpoint URL needs to be FULL (e.g., https://api.example.com/v1/chat/completions)
 
 nemo-evaluator-launcher run --config-dir configs --config-name local_endpoint \
   -o target.api_endpoint.model_id=$MODEL_NAME \
   -o target.api_endpoint.url=$URL \
-  -o target.api_endpoint.api_key_name=NGC_API_KEY
+  -o target.api_endpoint.api_key_name=API_KEY
 ```
 
 After launching, you can view logs and job status. When jobs finish, you can display results and export them using the available exporters. Refer to {ref}`exporters-overview` for available export options.
diff --git a/packages/nemo-evaluator-launcher/src/nemo_evaluator_launcher/resources/mapping.toml b/packages/nemo-evaluator-launcher/src/nemo_evaluator_launcher/resources/mapping.toml
@@ -226,7 +226,7 @@ required_env_vars = []
 [bigcode-evaluation-harness.tasks.completions.humaneval]
 required_env_vars = []
 
-[bigcode-evaluation-harness.tasks.completions.humaneval_instruct]
+[bigcode-evaluation-harness.tasks.chat.humaneval_instruct]
 
 
 ###############################################################################