-
Notifications
You must be signed in to change notification settings - Fork 429
Open
Labels
Description
(eval_venv) root@b2c98f779d6b:~/workshop# lighteval endpoint litellm ./qwen3_nothink.yaml 'bigbench:tracking_shuffled_objects' --max-samples 5
[2026-01-31 17:38:00,695] [ WARNING]: --max_samples WAS SET. THESE NUMBERS ARE ONLY PARTIAL AND SHOULD NOT BE USED FOR COMPARISON UNLESS YOU KNOW WHAT YOU ARE DOING. (pipeline.py:132)
[2026-01-31 17:38:00,695] [ INFO]: --- INIT SEEDS --- (pipeline.py:254)
[2026-01-31 17:38:00,695] [ INFO]: --- LOADING TASKS --- (pipeline.py:211)
[2026-01-31 17:38:00,792] [ WARNING]: /root/workshop/eval_venv/lib/python3.12/site-packages/syllapy/data_loader.py:3: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
import pkg_resources
(warnings.py:110)
[2026-01-31 17:38:00,968] [ INFO]: Loaded 646 task configs in 0.3 seconds (registry.py:379)
╭──────────────────────────────── Traceback (most recent call last) ─────────────────────────────────╮
│ /root/workshop/eval_venv/lib/python3.12/site-packages/lighteval/main_endpoint.py:304 in litellm │
│ │
│ 301 │ │ remove_reasoning_tags=remove_reasoning_tags, │
│ 302 │ │ reasoning_tags=reasoning_tags, │
│ 303 │ ) │
│ ❱ 304 │ pipeline = Pipeline( │
│ 305 │ │ tasks=tasks, │
│ 306 │ │ pipeline_parameters=pipeline_params, │
│ 307 │ │ evaluation_tracker=evaluation_tracker, │
│ │
│ /root/workshop/eval_venv/lib/python3.12/site-packages/lighteval/pipeline.py:142 in __init__ │
│ │
│ 139 │ │ │
│ 140 │ │ # We init tasks first to fail fast if one is badly defined │
│ 141 │ │ self._init_random_seeds() │
│ ❱ 142 │ │ self._init_tasks_and_requests(tasks=tasks) │
│ 143 │ │ │
│ 144 │ │ self.model_config = model_config │
│ 145 │ │ self.accelerator, self.parallel_context = self._init_parallelism_manager() │
│ │
│ /root/workshop/eval_venv/lib/python3.12/site-packages/lighteval/pipeline.py:224 in │
│ _init_tasks_and_requests │
│ │
│ 221 │ │ self.tasks_dict: dict[str, LightevalTask] = self.registry.load_tasks() │
│ 222 │ │ LightevalTask.load_datasets(self.tasks_dict, self.pipeline_parameters.dataset_lo │
│ 223 │ │ self.documents_dict = { │
│ ❱ 224 │ │ │ task.full_name: task.get_docs(self.pipeline_parameters.max_samples) for _, t │
│ 225 │ │ } │
│ 226 │ │ │
│ 227 │ │ self.sampling_docs = collections.defaultdict(list) │
│ │
│ /root/workshop/eval_venv/lib/python3.12/site-packages/lighteval/tasks/lighteval_task.py:378 in │
│ get_docs │
│ │
│ 375 │ │ Raises: │
│ 376 │ │ │ ValueError: If no documents are available for evaluation. │
│ 377 │ │ """ │
│ ❱ 378 │ │ eval_docs = self.eval_docs() │
│ 379 │ │ │
│ 380 │ │ if len(eval_docs) == 0: │
│ 381 │ │ │ raise ValueError(f"Task {self.name} has no documents to evaluate skipping.") │
│ │
│ /root/workshop/eval_venv/lib/python3.12/site-packages/lighteval/tasks/lighteval_task.py:355 in │
│ eval_docs │
│ │
│ 352 │ │ │ list[Doc]: Evaluation documents. │
│ 353 │ │ """ │
│ 354 │ │ if self._docs is None: │
│ ❱ 355 │ │ │ self._docs = self._get_docs_from_split(self.evaluation_split) │
│ 356 │ │ │ if self.must_remove_duplicate_docs: │
│ 357 │ │ │ │ self._docs = self.remove_duplicate_docs(self._docs) │
│ 358 │ │ return self._docs │
│ │
│ /root/workshop/eval_venv/lib/python3.12/site-packages/lighteval/tasks/lighteval_task.py:298 in │
│ _get_docs_from_split │
│ │
│ 295 │ │ │
│ 296 │ │ docs = [] │
│ 297 │ │ for split in splits: │
│ ❱ 298 │ │ │ for ix, item in enumerate(self.dataset[split]): │
│ 299 │ │ │ │ # Some tasks formatting is applied differently when the document is used │
│ 300 │ │ │ │ # vs when it's used for the actual prompt. That's why we store whether w │
│ 301 │ │ │ │ # doc for a fewshot sample (few_shots=True) or not, which then leads to │
│ │
│ /root/workshop/eval_venv/lib/python3.12/site-packages/datasets/dataset_dict.py:86 in __getitem__ │
│ │
│ 83 │ │
│ 84 │ def __getitem__(self, k) -> Dataset: │
│ 85 │ │ if isinstance(k, (str, NamedSplit)) or len(self) == 0: │
│ ❱ 86 │ │ │ return super().__getitem__(k) │
│ 87 │ │ else: │
│ 88 │ │ │ available_suggested_splits = [ │
│ 89 │ │ │ │ split for split in (Split.TRAIN, Split.TEST, Split.VALIDATION) if split │
╰────────────────────────────────────────────────────────────────────────────────────────────────────╯
KeyError: 'default'Expected behavior
A clear and concise description of what you expected to happen.
Version info
Please provide your operating system, lighteval version or commit if you installed from main, and pip/conda environment if your problem concerns dependencies.
Reactions are currently unavailable