This folder contains all the benchmarks for the project. Each benchmark is organized in its own subfolder, and includes the necessary code and data to run the benchmark.
- AI Idea Bench 2025: A benchmark for evaluating the performance of AI generating novel, creative, and feasible research ideas.
- MLE-bench: A benchmark for evaluating the performance of machine learning models on a variety of tasks.
- SciCodeBench: A benchmark for scientific code generation and understanding.
To clone all the submodules, use the following command:
git submodule update --init --recursive