-
Notifications
You must be signed in to change notification settings - Fork 429
Open
Description
Evaluation short description
AIME26 is the latest AIME-style math reasoning benchmark, commonly used to evaluate LLM performance on competition-level problems requiring multi-step reasoning and exact answers.
It is widely used in the community as a standard reference benchmark for mathematical reasoning, alongside earlier AIME versions.
Evaluation metadata
- Paper url: N/A (AIME is an AMC competition benchmark)
- Github url: N/A
- Dataset url: https://huggingface.co/datasets/EleutherAI/aime_2024 (AIME-style datasets; AIME26 variants also available in community repos)
Hi LightEval team,
Does LightEval currently support evaluating models on AIME26?
If not, is there a recommended way to add it as a custom task, or any plan to support it officially?
Thanks!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels