-
Notifications
You must be signed in to change notification settings - Fork 399
Open
Labels
Description
Describe the bug
encounter an ArrowInvalid error while saving experiment tracker.
The most process of evaluation is done, but error occur when saving.
The error info is as follow:
[2025-04-06 18:27:14,942] [�[32m INFO�[0m]: Saving experiment tracker (evaluation_tracker.py:180)�[0m
| Task |Version|Metric|Value | |Stderr|
|-------------|------:|------|-----:|---|-----:|
|all | |em |0.0102|± |0.0020|
| | |qem |0.0110|± |0.0021|
| | |pem |0.1925|± |0.0078|
| | |pqem |0.3937|± |0.0097|
| | |acc |0.2578|± |0.0087|
|helm:med_qa:0| 0|em |0.0102|± |0.0020|
| | |qem |0.0110|± |0.0021|
| | |pem |0.1925|± |0.0078|
| | |pqem |0.3937|± |0.0097|
| | |acc |0.2578|± |0.0087|
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /home/lc/.conda/envs/lighteval/lib/python3.11/site-packages/lighteval/main_v │
│ llm.py:163 in vllm │
│ │
│ 160 │ │
│ 161 │ results = pipeline.get_results() │
│ 162 │ │
│ ❱ 163 │ pipeline.save_and_push_results() │
│ 164 │ │
│ 165 │ return results │
│ 166 │
│ │
│ /home/lc/.conda/envs/lighteval/lib/python3.11/site-packages/lighteval/pipeli │
│ ne.py:536 in save_and_push_results │
│ │
│ 533 │ def save_and_push_results(self): │
│ 534 │ │ logger.info("--- SAVING AND PUSHING RESULTS ---") │
│ 535 │ │ if self.is_main_process(): │
│ ❱ 536 │ │ │ self.evaluation_tracker.save() │
│ 537 │ │
│ 538 │ def _init_final_dict(self): │
│ 539 │ │ if self.is_main_process(): │
│ │
│ /home/lc/.conda/envs/lighteval/lib/python3.11/site-packages/lighteval/loggin │
│ g/evaluation_tracker.py:201 in save │
│ │
│ 198 │ │ details_datasets: dict[str, Dataset] = {} │
│ 199 │ │ for task_name, task_details in self.details_logger.details.ite │
│ 200 │ │ │ # Create a dataset from the dictionary - we force cast to │
│ ❱ 201 │ │ │ dataset = Dataset.from_list([asdict(detail) for detail in │
│ 202 │ │ │ │
│ 203 │ │ │ # We don't keep 'id' around if it's there │
│ 204 │ │ │ column_names = dataset.column_names │
│ │
│ /home/lc/.conda/envs/lighteval/lib/python3.11/site-packages/datasets/arrow_d │
│ ataset.py:986 in from_list │
│ │
│ 983 │ │ """ │
│ 984 │ │ # for simplicity and consistency wrt OptimizedTypedSequence w │
│ 985 │ │ mapping = {k: [r.get(k) for r in mapping] for k in mapping[0] │
│ ❱ 986 │ │ return cls.from_dict(mapping, features, info, split) │
│ 987 │ │
│ 988 │ @staticmethod │
│ 989 │ def from_csv( │
│ │
│ /home/lc/.conda/envs/lighteval/lib/python3.11/site-packages/datasets/arrow_d │
│ ataset.py:940 in from_dict │
│ │
│ 937 │ │ │ │ ) │
│ 938 │ │ │ arrow_typed_mapping[col] = data │
│ 939 │ │ mapping = arrow_typed_mapping │
│ ❱ 940 │ │ pa_table = InMemoryTable.from_pydict(mapping=mapping) │
│ 941 │ │ if info is None: │
│ 942 │ │ │ info = DatasetInfo() │
│ 943 │ │ info.features = features │
│ │
│ /home/lc/.conda/envs/lighteval/lib/python3.11/site-packages/datasets/table.p │
│ y:758 in from_pydict │
│ │
│ 755 │ │ Returns: │
│ 756 │ │ │ `datasets.table.Table` │
│ 757 │ │ """ │
│ ❱ 758 │ │ return cls(pa.Table.from_pydict(*args, **kwargs)) │
│ 759 │ │
│ 760 │ @classmethod │
│ 761 │ def from_pylist(cls, mapping, *args, **kwargs): │
│ │
│ in pyarrow.lib._Tabular.from_pydict:1968 │
│ │
│ in pyarrow.lib._from_pydict:6337 │
│ │
│ in pyarrow.lib.asarray:402 │
│ │
│ in pyarrow.lib.array:252 │
│ │
│ in pyarrow.lib._handle_arrow_array_protocol:114 │
│ │
│ /home/lc/.conda/envs/lighteval/lib/python3.11/site-packages/datasets/arrow_w │
│ riter.py:229 in __arrow_array__ │
│ │
│ 226 │ │ │ │ out = list_of_np_array_to_pyarrow_listarray(data) │
│ 227 │ │ │ else: │
│ 228 │ │ │ │ trying_cast_to_python_objects = True │
│ ❱ 229 │ │ │ │ out = pa.array(cast_to_python_objects(data, only_1d_fo │
│ 230 │ │ │ # use smaller integer precisions if possible │
│ 231 │ │ │ if self.trying_int_optimization: │
│ 232 │ │ │ │ if pa.types.is_int64(out.type): │
│ │
│ in pyarrow.lib.array:372 │
│ │
│ in pyarrow.lib._sequence_to_array:42 │
│ │
│ in pyarrow.lib.pyarrow_internal_check_status:155 │
│ │
│ in pyarrow.lib.check_status:92 │
╰──────────────────────────────────────────────────────────────────────────────╯
ArrowInvalid: cannot mix list and non-list, non-null values
To Reproduce
I executed the following command to eval Qwen2.5-0.5B-Instruct with med_qa benchmark, but got error.
NAMESPACE=Qwen
MODEL_NAME=Qwen2.5-0.5B-Instruct #
MODEL=$NAMESPACE/$MODEL_NAME
MODEL_ARGS="pretrained=$MODEL,dtype=bfloat16,max_model_length=2048,gpu_memory_utilization=0.8,generation_parameters={max_new_tokens:2048,temperature:0.6,top_p:0.95}"
OUTPUT_DIR=data/evals/$MODEL
TASK=med_qa
LOG_FILE=logs/evals/${TASK}_${MODEL_NAME}.log
CUDA_VISIBLE_DEVICES=0 nohup lighteval vllm $MODEL_ARGS "helm|$TASK|0|0" \
--use-chat-template \
--output-dir $OUTPUT_DIR \
> ${LOG_FILE} 2>&1 &Expected behavior
Save results of evaluation successfully.
Version info
Ubuntu 24.04
lighteval 0.8.1
cuda 12.4
vllm 0.8.3
torch 2.6.0