From 85d0021d19d8ca19ffea500c42d54b6eca5be19b Mon Sep 17 00:00:00 2001 From: anandhu-eng Date: Wed, 15 Oct 2025 21:26:36 +0530 Subject: [PATCH 1/3] Updation for documentation page: migration to R2 --- .../get-pointpainting-data.md | 10 +++--- docs/benchmarks/graph/get-rgat-data.md | 2 +- .../language/get-deepseek-r1-data.md | 14 +++++++- docs/benchmarks/language/get-gptj-data.md | 2 +- .../language/get-llama2-70b-data.md | 35 +++++-------------- .../language/get-llama3_1-405b-data.md | 10 +++--- .../language/get-llama3_1-8b-data.md | 14 ++++---- .../language/get-mixtral-8x7b-data.md | 6 ++-- .../recommendation/get-dlrm-v2-data.md | 4 +-- .../speech_to_text/get-whisper-data.md | 4 +-- .../benchmarks/text_to_image/get-sdxl-data.md | 4 +-- 11 files changed, 50 insertions(+), 55 deletions(-) diff --git a/docs/benchmarks/automotive/3d_object_detection/get-pointpainting-data.md b/docs/benchmarks/automotive/3d_object_detection/get-pointpainting-data.md index 6331b3535b..737c14887a 100644 --- a/docs/benchmarks/automotive/3d_object_detection/get-pointpainting-data.md +++ b/docs/benchmarks/automotive/3d_object_detection/get-pointpainting-data.md @@ -13,16 +13,16 @@ The benchmark implementation run command will automatically download the preproc === "Validation" - ### Get Validation Dataset + ### Get Validation and Calibration Dataset ``` - mlcr get,dataset,waymo -j + mlcr get,dataset,waymo,_r2-downloader,_mlc -j ``` === "Calibration" - ### Get Calibration Dataset + ### Get Calibration Dataset only ``` - mlcr get,dataset,waymo,calibration -j + mlcr get,dataset,waymo,calibration,_r2-downloader,_mlc -j ``` - `--outdirname=` could be provided to download the dataset to a specific location. @@ -33,7 +33,7 @@ The benchmark implementation run command will automatically download the preproc The benchmark implementation run command will automatically download the model. In case you want to download only the PointPainting model, you can use the below command. ```bash -mlcr get,ml-model,pointpainting -j +mlcr get,ml-model,pointpainting,_r2-downloader,_mlc -j ``` - `--outdirname=` could be provided to download the model files to a specific location. diff --git a/docs/benchmarks/graph/get-rgat-data.md b/docs/benchmarks/graph/get-rgat-data.md index bb719fea2e..bbf72d1743 100644 --- a/docs/benchmarks/graph/get-rgat-data.md +++ b/docs/benchmarks/graph/get-rgat-data.md @@ -46,7 +46,7 @@ Get the Official MLPerf R-GAT Model ### PyTorch ``` - mlcr get,ml-model,rgat -j + mlcr get,ml-model,rgat,_r2-downloader,_mlcommons -j ``` - `--outdirname=` could be provided to download the model to a specific location. \ No newline at end of file diff --git a/docs/benchmarks/language/get-deepseek-r1-data.md b/docs/benchmarks/language/get-deepseek-r1-data.md index 401c4d27bc..b05dda810e 100644 --- a/docs/benchmarks/language/get-deepseek-r1-data.md +++ b/docs/benchmarks/language/get-deepseek-r1-data.md @@ -21,4 +21,16 @@ The benchmark implementation run command will automatically download the validat ### Get Calibration Dataset ``` mlcr get,preprocessed,dataset,deepseek-r1,_calibration,_mlc,_rclone --outdirname= -j - ``` \ No newline at end of file + ``` + +## Model +The benchmark implementation run command will automatically download the required model and do the necessary conversions. In case you want to only download the official model, you can use the below commands. + +=== "Pytorch" + + === "From MLCOMMONS Storage" + + ### Get the Official MLPerf DeekSeek-R1 model from MLCOMMONS Storage + ``` + mlcr get,ml-model,deepseek-r1,_r2-downloader,_mlc,_dry-run -j + ``` \ No newline at end of file diff --git a/docs/benchmarks/language/get-gptj-data.md b/docs/benchmarks/language/get-gptj-data.md index 60e2568b6e..bdbd884b60 100644 --- a/docs/benchmarks/language/get-gptj-data.md +++ b/docs/benchmarks/language/get-gptj-data.md @@ -36,7 +36,7 @@ Get the Official MLPerf GPT-J Model ### Pytorch ``` - mlcr get,ml-model,gptj,_pytorch -j + mlcr get,ml-model,gptj,_fp32,_pytorch,_r2-downloader -j ``` - `--outdirname=` could be provided to download the model to a specific location. \ No newline at end of file diff --git a/docs/benchmarks/language/get-llama2-70b-data.md b/docs/benchmarks/language/get-llama2-70b-data.md index d4daa423ec..d9094992d4 100644 --- a/docs/benchmarks/language/get-llama2-70b-data.md +++ b/docs/benchmarks/language/get-llama2-70b-data.md @@ -9,38 +9,21 @@ hide: The benchmark implementation run command will automatically download the validation and calibration datasets and do the necessary preprocessing. In case you want to download only the datasets, you can use the below commands. -=== "Nvidia Preprocessed Dataset" +=== "Preprocessed Dataset" === "Validation" LLAMA2-70b validation run uses the Open ORCA dataset. ### Get Preprocessed Validation Dataset ``` - mlcr get,dataset,preprocessed,openorca,_validation,_mlcommons,_nvidia -j + mlcr get,dataset,preprocessed,openorca,_validation,_r2-downloader,_mlc -j ``` === "Calibration" ### Get Preprocessed Calibration dataset ``` - mlcr get,dataset,preprocessed,openorca,_calibration,_mlcommons,_nvidia -j - ``` - -=== "MLCommons Preprocessed Dataset" - - === "Validation" - LLAMA2-70b validation run uses the Open ORCA dataset. - - ### Get Preprocessed Validation Dataset - ``` - mlcr get,dataset,preprocessed,openorca,_validation -j - ``` - - === "Calibration" - - ### Get Preprocessed Calibration dataset - ``` - mlcr get,dataset,preprocessed,openorca,_calibration -j + mlcr get,dataset,preprocessed,openorca,_calibration,_r2-downloader,_mlc -j ``` === "Unprocessed Dataset" @@ -69,13 +52,13 @@ The benchmark implementation run command will automatically download the require === "Pytorch" - === "From MLCOMMONS storage" - - > **Note:** One has to accept the [MLCommons Llama 2 License Confidentiality Notice](https://llama2.mlcommons.org/) to access the model files in MLCOMMONS storage. - - ### Get the Official MLPerf LLAMA2-70B model from MLCOMMONS storage + === "From MLCOMMONS Storage" + + > **Note:** One has to accept the [MLCommons Llama 2 License Confidentiality Notice](https://llama2.mlcommons.org/) to access the model files in MLCOMMONS Storage. + + ### Get the Official MLPerf LLAMA2-70B model from MLCOMMONS Google Drive ``` - mlcr get,ml-model,llama2-70b,_fp32,_pytorch -j + mlcr get,ml-model,llama2-70b,_pytorch,_r2-downloader,_70b,_mlc -j ``` === "From Hugging Face repo" diff --git a/docs/benchmarks/language/get-llama3_1-405b-data.md b/docs/benchmarks/language/get-llama3_1-405b-data.md index ad05ca8610..364dfa1fff 100644 --- a/docs/benchmarks/language/get-llama3_1-405b-data.md +++ b/docs/benchmarks/language/get-llama3_1-405b-data.md @@ -13,14 +13,14 @@ The benchmark implementation run command will automatically download the validat ### Get Validation Dataset ``` - mlcr get,dataset,mlperf,inference,llama3,_validation --outdirname= -j + mlcr get,dataset,mlperf,inference,llama3,_validation,_r2-downloader --outdirname= -j ``` === "Calibration" ### Get Calibration Dataset ``` - mlcr get,dataset,mlperf,inference,llama3,_calibration --outdirname= -j + mlcr get,dataset,mlperf,inference,llama3,_calibration,_r2-downloader --outdirname= -j ``` - `--outdirname=` could be provided to download the dataset to a specific location. @@ -30,13 +30,13 @@ The benchmark implementation run command will automatically download the require === "Pytorch" - === "From MLCOMMONS Google Drive" + === "From MLCOMMONS Storage" - > **Note:** One has to accept the [MLCommons Llama 3.1 License Confidentiality Notice](http://llama3-1.mlcommons.org/) to access the model files in MLCOMMONS Google Drive. + > **Note:** One has to accept the [MLCommons Llama 3.1 License Confidentiality Notice](http://llama3-1.mlcommons.org/) to access the model files in MLCOMMONS Storage. ### Get the Official MLPerf LLAMA3.1-405B model from MLCOMMONS Google Drive ``` - mlcr get,ml-model,llama3 -j + mlcr get,ml-model,llama3,_mlc,_r2-downloader,_405b --outdirname= -j ``` === "From Hugging Face repo" diff --git a/docs/benchmarks/language/get-llama3_1-8b-data.md b/docs/benchmarks/language/get-llama3_1-8b-data.md index e24cc37d44..b25f7eca75 100644 --- a/docs/benchmarks/language/get-llama3_1-8b-data.md +++ b/docs/benchmarks/language/get-llama3_1-8b-data.md @@ -15,21 +15,21 @@ The benchmark implementation run command will automatically download the validat ### Get Validation Dataset ``` - mlcr get,dataset,cnndm,_validation,_datacenter,_llama3,_mlc,_rclone --outdirname= -j + mlcr get,dataset,cnndm,_validation,_datacenter,_llama3,_mlc,_r2-downloader --outdirname= -j ``` === "5000 samples (Edge)" ### Get Validation Dataset ``` - mlcr get,dataset,cnndm,_validation,_edge,_llama3,_mlc,_rclone --outdirname= -j + mlcr get,dataset,cnndm,_validation,_edge,_llama3,_mlc,_r2-downloader --outdirname= -j ``` === "Calibration" ### Get Calibration Dataset ``` - mlcr get,dataset,cnndm,_calibration,_llama3,_mlc,_rclone --outdirname= -j + mlcr get,dataset,cnndm,_calibration,_llama3,_mlc,_r2-downloader --outdirname= -j ``` - `--outdirname=` could be provided to download the dataset to a specific location. @@ -39,13 +39,13 @@ The benchmark implementation run command will automatically download the require === "Pytorch" - === "From MLCOMMONS Google Drive" + === "From MLCOMMONS Storage" - > **Note:** One has to accept the [MLCommons Llama 3.1 License Confidentiality Notice](http://llama3-1.mlcommons.org/) to access the model files in MLCOMMONS Google Drive. + > **Note:** One has to accept the [MLCommons Llama 3.1 License Confidentiality Notice](http://llama3-1.mlcommons.org/) to access the model files in MLCOMMONS Storage. - ### Get the Official MLPerf LLAMA3.1-405B model from MLCOMMONS Cloudfare R2 + ### Get the Official MLPerf LLAMA3.1-8B model from MLCOMMONS Cloudfare R2 ``` - TBD + mlcr get,ml-model,llama3,_mlc,_8b,_r2-downloader --outdirname= -j ``` === "From Hugging Face repo" diff --git a/docs/benchmarks/language/get-mixtral-8x7b-data.md b/docs/benchmarks/language/get-mixtral-8x7b-data.md index cf5225843d..47b173e10a 100644 --- a/docs/benchmarks/language/get-mixtral-8x7b-data.md +++ b/docs/benchmarks/language/get-mixtral-8x7b-data.md @@ -12,14 +12,14 @@ The benchmark implementation run command will automatically download the preproc ### Get Validation Dataset ``` - mlcr get,dataset-mixtral,openorca-mbxp-gsm8k-combined -j + mlcr get,dataset-mixtral,openorca-mbxp-gsm8k-combined,_r2-downloader,_validation -j ``` === "Calibration" ### Get Calibration Dataset ``` - mlcr get,dataset-mixtral,openorca-mbxp-gsm8k-combined,_calibration -j + mlcr get,dataset-mixtral,openorca-mbxp-gsm8k-combined,_r2-downloader,_calibration -j ``` - `--outdirname=` could be provided to download the dataset to a specific location. @@ -33,7 +33,7 @@ Get the Official MLPerf MIXTRAL-8x7b Model ### Pytorch ``` - mlcr get,ml-model,mixtral -j + mlcr get,ml-model,mixtral,_r2-downloader,_mlc -j ``` - `--outdirname=` could be provided to download the model to a specific location. \ No newline at end of file diff --git a/docs/benchmarks/recommendation/get-dlrm-v2-data.md b/docs/benchmarks/recommendation/get-dlrm-v2-data.md index 8505b31bf4..a001911fe7 100644 --- a/docs/benchmarks/recommendation/get-dlrm-v2-data.md +++ b/docs/benchmarks/recommendation/get-dlrm-v2-data.md @@ -14,7 +14,7 @@ The benchmark implementation run command will automatically download the validat ### Get Validation Dataset ``` - mlcr get,dataset,criteo,_validation -j + mlcr get,preprocessed,dataset,criteo,_r2-downloader,_mlc -j ``` - `--outdirname=` could be provided to download the dataset to a specific location. @@ -28,7 +28,7 @@ Get the Official MLPerf DLRM v2 Model ### Pytorch ``` - mlcr get,ml-model,dlrm,_pytorch,_weight_sharded,_rclone -j + mlcr get,ml-model,dlrm,_pytorch,_fp32,_weight_sharded,_r2-downloader -j ``` diff --git a/docs/benchmarks/speech_to_text/get-whisper-data.md b/docs/benchmarks/speech_to_text/get-whisper-data.md index 9bc97ad9a0..ed9e3b02b0 100644 --- a/docs/benchmarks/speech_to_text/get-whisper-data.md +++ b/docs/benchmarks/speech_to_text/get-whisper-data.md @@ -15,7 +15,7 @@ The benchmark implementation run command will automatically download the validat ### Get Preprocessed Validation Dataset ``` - mlcr get,dataset,whisper,_preprocessed,_mlc,_rclone --outdirname= -j + mlcr get,dataset,whisper,_preprocessed,_mlc,_r2-downloader --outdirname= -j ``` === "Unprocessed" @@ -34,7 +34,7 @@ The benchmark implementation run command will automatically download the require ### Get the Official MLPerf Whisper model from MLCOMMONS Cloudflare R2 ``` - mlcr get,ml-model,whisper,_rclone,_mlc s-j + mlcr get,ml-model,whisper,_r2-downloader,_mlc -j ``` - `--outdirname=` could be provided to download the model to a specific location. \ No newline at end of file diff --git a/docs/benchmarks/text_to_image/get-sdxl-data.md b/docs/benchmarks/text_to_image/get-sdxl-data.md index 7c5363415c..e7e7281952 100644 --- a/docs/benchmarks/text_to_image/get-sdxl-data.md +++ b/docs/benchmarks/text_to_image/get-sdxl-data.md @@ -35,12 +35,12 @@ Get the Official MLPerf Stable Diffusion Model === "FP 16" ### Pytorch ``` - mlcr get,ml-model,sdxl,_pytorch,_fp16 -j + mlcr get,ml-model,sdxl,_pytorch,_fp16,_r2-downloader -j ``` === "FP 32" ### Pytorch ``` - mlcr get,ml-model,sdxl,_pytorch,_fp32 -j + mlcr get,ml-model,sdxl,_pytorch,_fp32,_r2-downloader -j ``` - `--outdirname=` could be provided to download the model to a specific location. From bb91a72a447930006299a7869dfe5a5e587a2ff6 Mon Sep 17 00:00:00 2001 From: anandhu-eng Date: Thu, 16 Oct 2025 13:06:57 +0530 Subject: [PATCH 2/3] fix for llama2 assets download instructions --- .../language/get-llama2-70b-data.md | 20 ++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/docs/benchmarks/language/get-llama2-70b-data.md b/docs/benchmarks/language/get-llama2-70b-data.md index d9094992d4..bf0bb8df35 100644 --- a/docs/benchmarks/language/get-llama2-70b-data.md +++ b/docs/benchmarks/language/get-llama2-70b-data.md @@ -9,7 +9,24 @@ hide: The benchmark implementation run command will automatically download the validation and calibration datasets and do the necessary preprocessing. In case you want to download only the datasets, you can use the below commands. -=== "Preprocessed Dataset" +=== "Nvidia Preprocessed Dataset" + + === "Validation" + LLAMA2-70b validation run uses the Open ORCA dataset. + + ### Get Preprocessed Validation Dataset + ``` + mlcr get,dataset,preprocessed,openorca,_validation,_mlcommons,_nvidia -j + ``` + + === "Calibration" + + ### Get Preprocessed Calibration dataset + ``` + mlcr get,dataset,preprocessed,openorca,_calibration,_mlcommons,_nvidia -j + ``` + +=== "MLCommons Preprocessed Dataset" === "Validation" LLAMA2-70b validation run uses the Open ORCA dataset. @@ -25,6 +42,7 @@ The benchmark implementation run command will automatically download the validat ``` mlcr get,dataset,preprocessed,openorca,_calibration,_r2-downloader,_mlc -j ``` + ``` === "Unprocessed Dataset" From 8ebf1593463f53ff6053ff81ed6dedf37df11437 Mon Sep 17 00:00:00 2001 From: anandhu-eng Date: Thu, 16 Oct 2025 13:23:53 +0530 Subject: [PATCH 3/3] fix typo --- docs/benchmarks/language/get-llama2-70b-data.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/benchmarks/language/get-llama2-70b-data.md b/docs/benchmarks/language/get-llama2-70b-data.md index bf0bb8df35..c9d3a2d25b 100644 --- a/docs/benchmarks/language/get-llama2-70b-data.md +++ b/docs/benchmarks/language/get-llama2-70b-data.md @@ -74,7 +74,7 @@ The benchmark implementation run command will automatically download the require > **Note:** One has to accept the [MLCommons Llama 2 License Confidentiality Notice](https://llama2.mlcommons.org/) to access the model files in MLCOMMONS Storage. - ### Get the Official MLPerf LLAMA2-70B model from MLCOMMONS Google Drive + ### Get the Official MLPerf LLAMA2-70B model from MLCOMMONS Storage ``` mlcr get,ml-model,llama2-70b,_pytorch,_r2-downloader,_70b,_mlc -j ```