Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add InfiniteBench for long context benchmarking #2421

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

iankur
Copy link

@iankur iankur commented Dec 9, 2024

Motivation

This PR adds support for eval on a long context benchmark, InfiniteBench. See #1273 for more context.

Modifications

Following the discussion in #1273, it currently adds code from TensorRT-LLM repo (link) to load the data, create prompts and compute scores. Following are the sample outputs for both cases using gradientai/Llama-3-8B-Instruct-Gradient-1048k with maximum input length of ~130K. Please check readme for more details and instructions on how to run both the benchmarks. Currently, predictions are different (see below) which I will try to fix.

SGLang

{"question_id": 0, "model_id": "gradientai/Llama-3-8B-Instruct-Gradient-1048k", "prediction": " 71432.", "ground_truth": ["71432"]}
{"question_id": 1, "model_id": "gradientai/Llama-3-8B-Instruct-Gradient-1048k", "prediction": " 69079.", "ground_truth": ["69079"]}
{"question_id": 2, "model_id": "gradientai/Llama-3-8B-Instruct-Gradient-1048k", "prediction": " 89415.", "ground_truth": ["89415"]}
{"question_id": 3, "model_id": "gradientai/Llama-3-8B-Instruct-Gradient-1048k", "prediction": " 61734.", "ground_truth": ["61734"]}
{"question_id": 4, "model_id": "gradientai/Llama-3-8B-Instruct-Gradient-1048k", "prediction": " 40204.", "ground_truth": ["40204"]}
{"question_id": 5, "model_id": "gradientai/Llama-3-8B-Instruct-Gradient-1048k", "prediction": " 80723.", "ground_truth": ["80723"]}
{"question_id": 6, "model_id": "gradientai/Llama-3-8B-Instruct-Gradient-1048k", "prediction": " 55058.", "ground_truth": ["55058"]}
{"question_id": 7, "model_id": "gradientai/Llama-3-8B-Instruct-Gradient-1048k", "prediction": " 16783.", "ground_truth": ["16783"]}
{"question_id": 8, "model_id": "gradientai/Llama-3-8B-Instruct-Gradient-1048k", "prediction": " 15951.", "ground_truth": ["15951"]}
{"question_id": 9, "model_id": "gradientai/Llama-3-8B-Instruct-Gradient-1048k", "prediction": " 52933.", "ground_truth": ["52933"]}

TensorRT-LLM

{"id": 0, "prediction": " 71432.", "ground_truth": ["71432"], "input_lengths": [125339]}
{"id": 1, "prediction": " 69079.", "ground_truth": ["69079"], "input_lengths": [125339]}
{"id": 2, "prediction": " 89415.", "ground_truth": ["89415"], "input_lengths": [125339]}
{"id": 3, "prediction": " 61734.", "ground_truth": ["61734"], "input_lengths": [125339]}
{"id": 4, "prediction": " 40204.", "ground_truth": ["40204"], "input_lengths": [125339]}
{"id": 5, "prediction": " 80723.", "ground_truth": ["80723"], "input_lengths": [125339]}
{"id": 6, "prediction": " 55058.", "ground_truth": ["55058"], "input_lengths": [125339]}
{"id": 7, "prediction": " 16783. Remember it", "ground_truth": ["16783"], "input_lengths": [125339]}
{"id": 8, "prediction": " 15951.", "ground_truth": ["15951"], "input_lengths": [125339]}
{"id": 9, "prediction": " 52933.", "ground_truth": ["52933"], "input_lengths": [125339]}

Checklist

  • Format your code according to the Contributor Guide.
  • Add unit tests as outlined in the Contributor Guide.
  • Update documentation as needed, including docstrings or example tutorials.

Copy link
Member

@zhyncs zhyncs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! May we combine these scripts just to one? Something like this

SHAREGPT_URL = "https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json"

Implement the process of downloading files into a script to make it more convenient for users.

@zhyncs
Copy link
Member

zhyncs commented Dec 9, 2024

Additionally, the section about TensorRT LLM is very good! Would you be willing to help improve this custom task script to make it easier to test TensorRT LLM?
https://github.com/sgl-project/sglang/blob/main/test/srt/experiment_runner.py
ref #2407
If considering doing it, it can be implemented in another PR. Thanks!

@zhyncs
Copy link
Member

zhyncs commented Dec 9, 2024

close #1273

@zhyncs zhyncs self-assigned this Dec 9, 2024
@zhyncs
Copy link
Member

zhyncs commented Dec 9, 2024

gradientai/Llama-3-8B-Instruct-Gradient-1048k GradientAI LOL your previous work. cc @michaelfeil

@iankur
Copy link
Author

iankur commented Dec 9, 2024

@zhyncs

Implement the process of downloading files into a script to make it more convenient for users.

Sounds good, I will merge the downloading script for sglang, we can keep the downloading script for tensorrt.

I will also work on the custom task script PR. I am traveling, so it may take some time but will try to do it asap.

)
parser.add_argument("--data-dir", type=str, default="./data")
parser.add_argument("--start-idx", type=int, default=0)
parser.add_argument("--end-idx", type=int, default=None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add more descriptions about the "--start-idx" and "--end-idx" ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed these arguments, which were borrowed from tensorrt eval script, and added num-samples with description.

@merrymercy
Copy link
Contributor

Is this ready to be merged?
We can have this first and then add this to CI in the next PR.

@merrymercy
Copy link
Contributor

cc @iankur and @zhyncs . Ready to merge this first part?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants