Update documentation (#2279)

anandhu-eng · web-flow · commit 35d9836017ec · 2025-07-23T07:47:20.000+01:00
* Update MLCFlow commands for v5.1 (#2237) * updating for 5.1-dev (inference doc)
diff --git a/compliance/nvidia/TEST06/run_verification.py b/compliance/nvidia/TEST06/run_verification.py
@@ -53,7 +53,12 @@ def get_args():
         "--scenario",
         "-s",
         required=True,
-        choices=["Offline", "Server", "Interactive", "SingleStream", "MultiStream"],
+        choices=[
+            "Offline",
+            "Server",
+            "Interactive",
+            "SingleStream",
+            "MultiStream"],
     )
     args = parser.parse_args()
     return args
diff --git a/docs/benchmarks/language/deepseek-r1.md b/docs/benchmarks/language/deepseek-r1.md
@@ -0,0 +1,11 @@
+---
+hide:
+  - toc
+---
+
+# Reasoning using DeepSeek-R1
+
+=== "MLCommons-Python"
+    ## MLPerf Reference Implementation in Python
+    
+{{ mlperf_inference_implementation_readme (4, "deepseek-r1", "reference", devices=["CUDA"]) }}
diff --git a/docs/benchmarks/language/get-deepseek-r1-data.md b/docs/benchmarks/language/get-deepseek-r1-data.md
@@ -0,0 +1,24 @@
+---
+hide:
+  - toc
+---
+
+# Reasoning using DeepSeek R1
+
+## Dataset
+
+The benchmark implementation run command will automatically download the validation and calibration datasets and do the necessary preprocessing. In case you want to download only the datasets, you can use the below commands.
+
+=== "Validation"
+
+    ### Get Validation Dataset
+    ```
+    mlcr get,preprocessed,dataset,deepseek-r1,_validation,_mlc,_rclone --outdirname=<path to download> -j
+    ```
+
+=== "Calibration"
+
+    ### Get Calibration Dataset
+    ```
+    mlcr get,preprocessed,dataset,deepseek-r1,_calibration,_mlc,_rclone --outdirname=<path to download> -j
+    ```
diff --git a/docs/benchmarks/language/get-llama3_1-8b-data.md b/docs/benchmarks/language/get-llama3_1-8b-data.md
@@ -0,0 +1,60 @@
+---
+hide:
+  - toc
+---
+
+# Text Summarization using LLAMA3.1-8b
+
+## Dataset
+
+The benchmark implementation run command will automatically download the validation and calibration datasets and do the necessary preprocessing. In case you want to download only the datasets, you can use the below commands.
+
+=== "Validation"
+
+    === "Full dataset (Datacenter)"
+
+        ### Get Validation Dataset
+        ```
+        mlcr get,dataset,cnndm,_validation,_datacenter,_llama3,_mlc,_rclone --outdirname=<path to download> -j
+        ```
+    
+    === "5000 samples (Edge)"
+
+        ### Get Validation Dataset
+        ```
+        mlcr get,dataset,cnndm,_validation,_edge,_llama3,_mlc,_rclone --outdirname=<path to download> -j
+        ```
+
+=== "Calibration"
+
+    ### Get Calibration Dataset
+    ```
+    mlcr get,dataset,cnndm,_calibration,_llama3,_mlc,_rclone --outdirname=<path to download> -j
+    ```
+
+- `--outdirname=<PATH_TO_DOWNLOAD_LLAMA3_405B_DATASET>` could be provided to download the dataset to a specific location.
+
+## Model
+The benchmark implementation run command will automatically download the required model and do the necessary conversions. In case you want to only download the official model, you can use the below commands.
+
+=== "Pytorch"
+
+    === "From MLCOMMONS Google Drive"
+
+        > **Note:**  One has to accept the [MLCommons Llama 3.1 License Confidentiality Notice](http://llama3-1.mlcommons.org/) to access the model files in MLCOMMONS Google Drive. 
+
+        ### Get the Official MLPerf LLAMA3.1-405B model from MLCOMMONS Cloudfare R2
+        ```
+        TBD
+        ```
+
+    === "From Hugging Face repo"
+
+        > **Note:** Access to the HuggingFace model could be requested [here](https://ai.meta.com/resources/models-and-libraries/llama-downloads/).
+
+        ### Get model from HuggingFace repo
+        ```
+        mlcr get,ml-model,llama3,_hf,_meta-llama/Llama-3.1-8B-Instruct --hf_token=<huggingface access token> -j
+        ```
+
+- `--outdirname=<PATH_TO_DOWNLOAD_LLAMA3_8B_MODEL>` could be provided to download the model to a specific location.
diff --git a/docs/benchmarks/language/llama3_1-8b.md b/docs/benchmarks/language/llama3_1-8b.md
@@ -0,0 +1,11 @@
+---
+hide:
+  - toc
+---
+
+# Text Summarization using LLAMA3_1-8b
+
+=== "MLCommons-Python"
+    ## MLPerf Reference Implementation in Python
+    
+{{ mlperf_inference_implementation_readme (4, "llama3_1-8b", "reference", devices=["CPU","CUDA"]) }}
diff --git a/docs/benchmarks/speech_to_text/get-whisper-data.md b/docs/benchmarks/speech_to_text/get-whisper-data.md
@@ -0,0 +1,40 @@
+---
+hide:
+  - toc
+---
+
+# Speech to Text using Whisper
+
+## Dataset
+
+The benchmark implementation run command will automatically download the validation and calibration datasets and do the necessary preprocessing. In case you want to download only the datasets, you can use the below commands.
+
+=== "Validation"
+
+    === "Preprocessed"
+
+        ### Get Preprocessed Validation Dataset
+        ```
+        mlcr get,dataset,whisper,_preprocessed,_mlc,_rclone --outdirname=<path to download> -j
+        ```
+
+    === "Unprocessed"
+
+        ### Get Unprocessed Validation Dataset
+        ```
+        mlcr get,dataset,whisper,_unprocessed --outdirname=<path to download> -j
+        ```
+
+## Model
+The benchmark implementation run command will automatically download the required model and do the necessary conversions if any. In case you want to only download the official model, you can use the below commands.
+
+=== "Pytorch"
+
+    === "From MLCOMMONS"
+
+        ### Get the Official MLPerf Whisper model from MLCOMMONS Cloudflare R2
+        ```
+        mlcr get,ml-model,whisper,_rclone,_mlc s-j
+        ```
+
+- `--outdirname=<PATH_TO_DOWNLOAD_WHISPER_MODEL>` could be provided to download the model to a specific location.
diff --git a/docs/benchmarks/speech_to_text/whisper.md b/docs/benchmarks/speech_to_text/whisper.md
@@ -0,0 +1,11 @@
+---
+hide:
+  - toc
+---
+
+# Speech to Text using Whisper
+
+=== "MLCommons-Python"
+    ## MLPerf Reference Implementation in Python
+    
+{{ mlperf_inference_implementation_readme (4, "whisper", "reference", devices=["CPU","CUDA"]) }}
diff --git a/language/deepseek-r1/README.md b/language/deepseek-r1/README.md
@@ -1,5 +1,11 @@
 # Mlperf Inference DeepSeek Reference Implementation
 
+## Automated command to run the benchmark via MLFlow
+
+Please see the [new docs site](https://docs.mlcommons.org/inference/benchmarks/language/deepseek-r1/) for an automated way to run this benchmark across different available implementations and do an end-to-end submission with or without docker.
+
+You can also do pip install mlc-scripts and then use `mlcr` commands for downloading the model and datasets using the commands given in the later sections.
+
 ## Model & Dataset Download
 
 > **Model**: [deepseek-ai/DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1) (revision: `56d4cbbb4d29f4355bab4b9a39ccb717a14ad5ad`)
@@ -19,6 +25,14 @@ The dataset is an ensemble of the datasets: AIME, MATH500, gpqa, MMLU-Pro, livec
 
 ### Preprocessed
 
+**Using MLCFlow Automation**
+
+```
+mlcr get,dataset,whisper,_preprocessed,_mlc,_rclone --outdirname=<path to download> -j
+```
+
+**Using Native method**
+
 You can use Rclone to download the preprocessed dataset from a Cloudflare R2 bucket.
 
 To run Rclone on Windows, you can download the executable [here](https://rclone.org/install/#windows).
@@ -38,6 +52,14 @@ rclone copy mlc-inference:mlcommons-inference-wg-public/deepseek_r1/datasets/mlp
 
 ### Calibration
 
+**Using MLCFlow Automation**
+
+```
+mlcr get,preprocessed,dataset,deepseek-r1,_calibration,_mlc,_rclone --outdirname=<path to download> -j
+```
+
+**Using Native method**
+
 Download and install Rclone as described in the previous section.
 
 Then navigate in the terminal to your desired download directory and run the following command to download the dataset:
@@ -179,6 +201,14 @@ The following table shows which backends support different evaluation and MLPerf
 
 ## Accuracy Evaluation
 
+**Using MLCFlow Automation**
+
+```
+TBD
+```
+
+**Using Native method**
+
 Accuracy evaluation is handled uniformly across all backends:
 
 ```bash
diff --git a/language/llama3.1-8b/README.md b/language/llama3.1-8b/README.md
@@ -9,7 +9,7 @@
 
 ## Automated command to run the benchmark via MLFlow
 
-Please see the [new docs site](https://docs.mlcommons.org/inference/benchmarks/language/llama3_1-8b/) (TBD) for an automated way to run this benchmark across different available implementations and do an end-to-end submission with or without docker.
+Please see the [new docs site](https://docs.mlcommons.org/inference/benchmarks/language/llama3_1-8b/) for an automated way to run this benchmark across different available implementations and do an end-to-end submission with or without docker.
 
 You can also do pip install mlc-scripts and then use `mlcr` commands for downloading the model and datasets using the commands given in the later sections.
 
@@ -99,7 +99,10 @@ pip install -e ../../loadgen
 ## Get Model
 ### MLCommons Members Download (Recommended for official submission)
 
-You need to request for access to [MLCommons](http://llama3-1.mlcommons.org/) and you'll receive an email with the download instructions. You can download the model automatically via the below command
+You need to request for access to [MLCommons](http://llama3-1.mlcommons.org/) and you'll receive an email with the download instructions. 
+
+**Official Model download using MLCFlow Automation**
+You can download the model automatically via the below command
 ```
 TBD
 ```
@@ -115,6 +118,12 @@ git clone https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct ${CHECKPOINT_P
 cd ${CHECKPOINT_PATH} && git checkout be673f326cab4cd22ccfef76109faf68e41aa5f1
 ```
 
+**External Model download using MLCFlow Automation**
+You can download the model automatically via the below command
+```
+mlcr get,ml-model,llama3,_hf,_meta-llama/Llama-3.1-8B-Instruct --hf_token=<huggingface access token> -j
+```
+
 ### Download huggingface model through MLC
 
 ```
@@ -142,24 +151,39 @@ rclone config create mlc-inference s3 provider=Cloudflare access_key_id=f65ba5ee
 You can then navigate in the terminal to your desired download directory and run the following command to download the dataset:
 
 #### Full dataset (datacenter) 
+
+**Using MLCFlow Automation**
+```
+mlcr get,dataset,cnndm,_validation,_datacenter,_llama3,_mlc,_rclone --outdirname=<path to download> -j
+```
+
+**Native method**
 ```
 rclone copy mlc-inference:mlcommons-inference-wg-public/llama3.1_8b/datasets/cnn_eval.json ./ -P
 ```
 
 #### 5000 samples (edge)
+
+**Using MLCFlow Automation**
+```
+mlcr get,dataset,cnndm,_validation,_edge,_llama3,_mlc,_rclone --outdirname=<path to download> -j
+```
+
+**Native method**
 ```
 rclone copy mlc-inference:mlcommons-inference-wg-public/llama3.1_8b/datasets/sample_cnn_eval_5000.json ./ -P
 ```
 
 #### Calibration
+
+**Using MLCFlow Automation**
 ```
-rclone copy mlc-inference:mlcommons-inference-wg-public/llama3.1_8b/datasets/cnn_dailymail_calibration.json ./ -P
+mlcr get,dataset,cnndm,_calibration,_llama3,_mlc,_rclone --outdirname=<path to download> -j
 ```
 
-**MLC Command**
-
+**Native method**
 ```
-TBD
+rclone copy mlc-inference:mlcommons-inference-wg-public/llama3.1_8b/datasets/cnn_dailymail_calibration.json ./ -P
 ```
 
 You can also download the calibration dataset from the Cloudflare R2 bucket by running the following command:
@@ -168,11 +192,6 @@ You can also download the calibration dataset from the Cloudflare R2 bucket by r
 rclone copy mlc-inference:mlcommons-inference-wg-public/llama3.1_8b/cnn_eval.json ./ -P
 ```
 
-**MLC Command**
-```
-TBD
-```
-
 
 ## Run Performance Benchmarks
 
@@ -265,8 +284,17 @@ The ServerSUT was not tested for GPU runs.
 
 ### Evaluate the accuracy using MLCFlow
 You can also evaulate the accuracy from the generated accuracy log by using the following MLC command
+
+**Full dataset (datacenter)**
+
 ```
-TBD
+mlcr run,accuracy,mlperf,_cnndm_llama_3,_edge --result_dir=<Path to directory where files are generated after the benchmark run>
+```
+
+**5000 samples (edge)**
+
+```
+mlcr run,accuracy,mlperf,_cnndm_llama_3,_datacenter --result_dir=<Path to directory where files are generated after the benchmark run>
 ```
 
 ## Accuracy Target
diff --git a/main.py b/main.py
diff --git a/mkdocs.yml b/mkdocs.yml
diff --git a/speech2text/README.md b/speech2text/README.md
diff --git a/tools/submission/submission_checker.py b/tools/submission/submission_checker.py