Skip to content

[BUG] KeyError: 'default' #1163

@2niuhe

Description

@2niuhe
(eval_venv) root@b2c98f779d6b:~/workshop# lighteval endpoint litellm  ./qwen3_nothink.yaml  'bigbench:tracking_shuffled_objects' --max-samples 5
[2026-01-31 17:38:00,695] [ WARNING]: --max_samples WAS SET. THESE NUMBERS ARE ONLY PARTIAL AND SHOULD NOT BE USED FOR COMPARISON UNLESS YOU KNOW WHAT YOU ARE DOING. (pipeline.py:132)
[2026-01-31 17:38:00,695] [    INFO]: --- INIT SEEDS --- (pipeline.py:254)
[2026-01-31 17:38:00,695] [    INFO]: --- LOADING TASKS --- (pipeline.py:211)
[2026-01-31 17:38:00,792] [ WARNING]: /root/workshop/eval_venv/lib/python3.12/site-packages/syllapy/data_loader.py:3: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
  import pkg_resources
 (warnings.py:110)
[2026-01-31 17:38:00,968] [    INFO]: Loaded 646 task configs in 0.3 seconds (registry.py:379)
╭──────────────────────────────── Traceback (most recent call last) ─────────────────────────────────╮
│ /root/workshop/eval_venv/lib/python3.12/site-packages/lighteval/main_endpoint.py:304 in litellm    │
│                                                                                                    │
│   301 │   │   remove_reasoning_tags=remove_reasoning_tags,                                         │
│   302 │   │   reasoning_tags=reasoning_tags,                                                       │
│   303 │   )                                                                                        │
│ ❱ 304 │   pipeline = Pipeline(                                                                     │
│   305 │   │   tasks=tasks,                                                                         │
│   306 │   │   pipeline_parameters=pipeline_params,                                                 │
│   307 │   │   evaluation_tracker=evaluation_tracker,                                               │
│                                                                                                    │
│ /root/workshop/eval_venv/lib/python3.12/site-packages/lighteval/pipeline.py:142 in __init__        │
│                                                                                                    │
│   139 │   │                                                                                        │
│   140 │   │   # We init tasks first to fail fast if one is badly defined                           │
│   141 │   │   self._init_random_seeds()                                                            │
│ ❱ 142 │   │   self._init_tasks_and_requests(tasks=tasks)                                           │
│   143 │   │                                                                                        │
│   144 │   │   self.model_config = model_config                                                     │
│   145 │   │   self.accelerator, self.parallel_context = self._init_parallelism_manager()           │
│                                                                                                    │
│ /root/workshop/eval_venv/lib/python3.12/site-packages/lighteval/pipeline.py:224 in                 │
│ _init_tasks_and_requests                                                                           │
│                                                                                                    │
│   221 │   │   self.tasks_dict: dict[str, LightevalTask] = self.registry.load_tasks()               │
│   222 │   │   LightevalTask.load_datasets(self.tasks_dict, self.pipeline_parameters.dataset_lo     │
│   223 │   │   self.documents_dict = {                                                              │
│ ❱ 224 │   │   │   task.full_name: task.get_docs(self.pipeline_parameters.max_samples) for _, t     │
│   225 │   │   }                                                                                    │
│   226 │   │                                                                                        │
│   227 │   │   self.sampling_docs = collections.defaultdict(list)                                   │
│                                                                                                    │
│ /root/workshop/eval_venv/lib/python3.12/site-packages/lighteval/tasks/lighteval_task.py:378 in     │
│ get_docs                                                                                           │
│                                                                                                    │
│   375 │   │   Raises:                                                                              │
│   376 │   │   │   ValueError: If no documents are available for evaluation.                        │
│   377 │   │   """
│ ❱ 378 │   │   eval_docs = self.eval_docs()                                                         │
│   379 │   │                                                                                        │
│   380 │   │   if len(eval_docs) == 0:                                                              │
│   381 │   │   │   raise ValueError(f"Task {self.name} has no documents to evaluate skipping.")     │
│                                                                                                    │
│ /root/workshop/eval_venv/lib/python3.12/site-packages/lighteval/tasks/lighteval_task.py:355 in     │
│ eval_docs                                                                                          │
│                                                                                                    │
│   352 │   │   │   list[Doc]: Evaluation documents.                                                 │
│   353 │   │   """                                                                                  │
│   354 │   │   if self._docs is None:                                                               │
│ ❱ 355 │   │   │   self._docs = self._get_docs_from_split(self.evaluation_split)                    │
│   356 │   │   │   if self.must_remove_duplicate_docs:                                              │
│   357 │   │   │   │   self._docs = self.remove_duplicate_docs(self._docs)                          │
│   358 │   │   return self._docs                                                                    │
│                                                                                                    │
│ /root/workshop/eval_venv/lib/python3.12/site-packages/lighteval/tasks/lighteval_task.py:298 in     │
│ _get_docs_from_split                                                                               │
│                                                                                                    │
│   295 │   │                                                                                        │
│   296 │   │   docs = []                                                                            │
│   297 │   │   for split in splits:                                                                 │
│ ❱ 298 │   │   │   for ix, item in enumerate(self.dataset[split]):                                  │
│   299 │   │   │   │   # Some tasks formatting is applied differently when the document is used     │
│   300 │   │   │   │   # vs when it's used for the actual prompt. That's why we store whether w     │
│   301 │   │   │   │   # doc for a fewshot sample (few_shots=True) or not, which then leads to      │
│                                                                                                    │
│ /root/workshop/eval_venv/lib/python3.12/site-packages/datasets/dataset_dict.py:86 in __getitem__   │
│                                                                                                    │
│     83 │                                                                                           │
│     84 │   def __getitem__(self, k) -> Dataset:                                                    │
│     85 │   │   if isinstance(k, (str, NamedSplit)) or len(self) == 0:                              │
│ ❱   86 │   │   │   return super().__getitem__(k)                                                   │
│     87 │   │   else:                                                                               │
│     88 │   │   │   available_suggested_splits = [                                                  │
│     89 │   │   │   │   split for split in (Split.TRAIN, Split.TEST, Split.VALIDATION) if split     │
╰────────────────────────────────────────────────────────────────────────────────────────────────────╯
KeyError: 'default'

Expected behavior

A clear and concise description of what you expected to happen.

Version info

Please provide your operating system, lighteval version or commit if you installed from main, and pip/conda environment if your problem concerns dependencies.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions