-
Notifications
You must be signed in to change notification settings - Fork 429
Open
Description
Describe the bug
The lextreme benchmark failed with KeyError or configuration errors because the evaluation_splits defined in the configuration (["validation", "test"]) did not match the actual available splits in the dataset for several subsets.
To Reproduce
task = "lextreme:multi_eurlex_level_1|5"
pipeline = Pipeline(
tasks=task,
pipeline_parameters=pipeline_params,
evaluation_tracker=evaluation_tracker,
model_config=model_config,
)
pipeline.evaluate()
pipeline.save_and_push_results()
pipeline.show_results() 141 self._init_random_seeds()
--> 142 self._init_tasks_and_requests(tasks=tasks)
144 self.model_config = model_config
145 self.accelerator, self.parallel_context = self._init_parallelism_manager()
...
88 available_suggested_splits = [
89 split for split in (Split.TRAIN, Split.TEST, Split.VALIDATION) if split in self
90 ]
KeyError: 'validation'Expected behavior
The configuration should only reference splits that are actually available on the Hugging Face Hub for each subset.
Version info
- OS: mac
- Lighteval version: main (local development)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels