Add InfiniteBench for long context benchmarking #2421

iankur · 2024-12-09T10:21:31Z

Motivation

This PR adds support for eval on a long context benchmark, InfiniteBench. See #1273 for more context.

Modifications

Following the discussion in #1273, it currently adds code from TensorRT-LLM repo (link) to load the data, create prompts and compute scores. Following are the sample outputs for both cases using gradientai/Llama-3-8B-Instruct-Gradient-1048k with maximum input length of ~130K. Please check readme for more details and instructions on how to run both the benchmarks. Currently, predictions are different (see below) which I will try to fix.

SGLang

{"question_id": 0, "model_id": "gradientai/Llama-3-8B-Instruct-Gradient-1048k", "prediction": " 71432.", "ground_truth": ["71432"]}
{"question_id": 1, "model_id": "gradientai/Llama-3-8B-Instruct-Gradient-1048k", "prediction": " 69079.", "ground_truth": ["69079"]}
{"question_id": 2, "model_id": "gradientai/Llama-3-8B-Instruct-Gradient-1048k", "prediction": " 89415.", "ground_truth": ["89415"]}
{"question_id": 3, "model_id": "gradientai/Llama-3-8B-Instruct-Gradient-1048k", "prediction": " 61734.", "ground_truth": ["61734"]}
{"question_id": 4, "model_id": "gradientai/Llama-3-8B-Instruct-Gradient-1048k", "prediction": " 40204.", "ground_truth": ["40204"]}
{"question_id": 5, "model_id": "gradientai/Llama-3-8B-Instruct-Gradient-1048k", "prediction": " 80723.", "ground_truth": ["80723"]}
{"question_id": 6, "model_id": "gradientai/Llama-3-8B-Instruct-Gradient-1048k", "prediction": " 55058.", "ground_truth": ["55058"]}
{"question_id": 7, "model_id": "gradientai/Llama-3-8B-Instruct-Gradient-1048k", "prediction": " 16783.", "ground_truth": ["16783"]}
{"question_id": 8, "model_id": "gradientai/Llama-3-8B-Instruct-Gradient-1048k", "prediction": " 15951.", "ground_truth": ["15951"]}
{"question_id": 9, "model_id": "gradientai/Llama-3-8B-Instruct-Gradient-1048k", "prediction": " 52933.", "ground_truth": ["52933"]}

TensorRT-LLM

{"id": 0, "prediction": " 71432.", "ground_truth": ["71432"], "input_lengths": [125339]}
{"id": 1, "prediction": " 69079.", "ground_truth": ["69079"], "input_lengths": [125339]}
{"id": 2, "prediction": " 89415.", "ground_truth": ["89415"], "input_lengths": [125339]}
{"id": 3, "prediction": " 61734.", "ground_truth": ["61734"], "input_lengths": [125339]}
{"id": 4, "prediction": " 40204.", "ground_truth": ["40204"], "input_lengths": [125339]}
{"id": 5, "prediction": " 80723.", "ground_truth": ["80723"], "input_lengths": [125339]}
{"id": 6, "prediction": " 55058.", "ground_truth": ["55058"], "input_lengths": [125339]}
{"id": 7, "prediction": " 16783. Remember it", "ground_truth": ["16783"], "input_lengths": [125339]}
{"id": 8, "prediction": " 15951.", "ground_truth": ["15951"], "input_lengths": [125339]}
{"id": 9, "prediction": " 52933.", "ground_truth": ["52933"], "input_lengths": [125339]}

Checklist

Format your code according to the Contributor Guide.
Add unit tests as outlined in the Contributor Guide.
Update documentation as needed, including docstrings or example tutorials.

zhyncs

Nice work! May we combine these scripts just to one? Something like this

sglang/python/sglang/bench_serving.py

Line 519 in 641b7d0

    
           SHAREGPT_URL = "https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json"

Implement the process of downloading files into a script to make it more convenient for users.

zhyncs · 2024-12-09T17:47:18Z

Additionally, the section about TensorRT LLM is very good! Would you be willing to help improve this custom task script to make it easier to test TensorRT LLM?
https://github.com/sgl-project/sglang/blob/main/test/srt/experiment_runner.py
ref #2407
If considering doing it, it can be implemented in another PR. Thanks!

zhyncs · 2024-12-09T17:48:15Z

close #1273

zhyncs · 2024-12-09T17:51:40Z

gradientai/Llama-3-8B-Instruct-Gradient-1048k GradientAI LOL your previous work. cc @michaelfeil

iankur · 2024-12-09T22:54:29Z

@zhyncs

Implement the process of downloading files into a script to make it more convenient for users.

Sounds good, I will merge the downloading script for sglang, we can keep the downloading script for tensorrt.

I will also work on the custom task script PR. I am traveling, so it may take some time but will try to do it asap.

liangan1 · 2024-12-10T01:06:33Z

benchmark/infinitebench/eval_long_context.py

+    )
+    parser.add_argument("--data-dir", type=str, default="./data")
+    parser.add_argument("--start-idx", type=int, default=0)
+    parser.add_argument("--end-idx", type=int, default=None)


can you add more descriptions about the "--start-idx" and "--end-idx" ?

I removed these arguments, which were borrowed from tensorrt eval script, and added num-samples with description.

merrymercy · 2024-12-26T16:12:14Z

Is this ready to be merged?
We can have this first and then add this to CI in the next PR.

merrymercy · 2024-12-28T22:11:08Z

cc @iankur and @zhyncs . Ready to merge this first part?

iankur added 2 commits December 9, 2024 01:17

add infinitebench eval

c86bffc

fix download path

46df6d8

zhyncs reviewed Dec 9, 2024

View reviewed changes

zhyncs self-assigned this Dec 9, 2024

zhyncs added the high priority label Dec 9, 2024

liangan1 reviewed Dec 10, 2024

View reviewed changes

iankur added 2 commits December 10, 2024 23:56

move dataset download to main script

275a64e

fix url

d0645b8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add InfiniteBench for long context benchmarking #2421

Add InfiniteBench for long context benchmarking #2421

iankur commented Dec 9, 2024 •

edited

Loading

zhyncs left a comment

zhyncs commented Dec 9, 2024

zhyncs commented Dec 9, 2024

zhyncs commented Dec 9, 2024

iankur commented Dec 9, 2024

liangan1 Dec 10, 2024

iankur Dec 11, 2024

merrymercy commented Dec 26, 2024

merrymercy commented Dec 28, 2024

Add InfiniteBench for long context benchmarking #2421

Are you sure you want to change the base?

Add InfiniteBench for long context benchmarking #2421

Conversation

iankur commented Dec 9, 2024 • edited Loading

Motivation

Modifications

Checklist

zhyncs left a comment

Choose a reason for hiding this comment

zhyncs commented Dec 9, 2024

zhyncs commented Dec 9, 2024

zhyncs commented Dec 9, 2024

iankur commented Dec 9, 2024

liangan1 Dec 10, 2024

Choose a reason for hiding this comment

iankur Dec 11, 2024

Choose a reason for hiding this comment

merrymercy commented Dec 26, 2024

merrymercy commented Dec 28, 2024

iankur commented Dec 9, 2024 •

edited

Loading