Skip to content

[BUG] encounter an ArrowInvalid error while saving experiment tracker #660

@DIaacKr

Description

@DIaacKr

Describe the bug

encounter an ArrowInvalid error while saving experiment tracker.
The most process of evaluation is done, but error occur when saving.
The error info is as follow:

[2025-04-06 18:27:14,942] [�[32m    INFO�[0m]: Saving experiment tracker (evaluation_tracker.py:180)�[0m
|    Task     |Version|Metric|Value |   |Stderr|
|-------------|------:|------|-----:|---|-----:|
|all          |       |em    |0.0102|±  |0.0020|
|             |       |qem   |0.0110|±  |0.0021|
|             |       |pem   |0.1925|±  |0.0078|
|             |       |pqem  |0.3937|±  |0.0097|
|             |       |acc   |0.2578|±  |0.0087|
|helm:med_qa:0|      0|em    |0.0102|±  |0.0020|
|             |       |qem   |0.0110|±  |0.0021|
|             |       |pem   |0.1925|±  |0.0078|
|             |       |pqem  |0.3937|±  |0.0097|
|             |       |acc   |0.2578|±  |0.0087|
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /home/lc/.conda/envs/lighteval/lib/python3.11/site-packages/lighteval/main_v │
│ llm.py:163 in vllm                                                           │
│                                                                              │
│   160 │                                                                      │
│   161 │   results = pipeline.get_results()                                   │
│   162 │                                                                      │
│ ❱ 163 │   pipeline.save_and_push_results()                                   │
│   164 │                                                                      │
│   165 │   return results                                                     │
│   166                                                                        │
│                                                                              │
│ /home/lc/.conda/envs/lighteval/lib/python3.11/site-packages/lighteval/pipeli │
│ ne.py:536 in save_and_push_results                                           │
│                                                                              │
│   533 │   def save_and_push_results(self):                                   │
│   534 │   │   logger.info("--- SAVING AND PUSHING RESULTS ---")              │
│   535 │   │   if self.is_main_process():                                     │
│ ❱ 536 │   │   │   self.evaluation_tracker.save()                             │
│   537 │                                                                      │
│   538 │   def _init_final_dict(self):                                        │
│   539 │   │   if self.is_main_process():                                     │
│                                                                              │
│ /home/lc/.conda/envs/lighteval/lib/python3.11/site-packages/lighteval/loggin │
│ g/evaluation_tracker.py:201 in save                                          │
│                                                                              │
│   198 │   │   details_datasets: dict[str, Dataset] = {}                      │
│   199 │   │   for task_name, task_details in self.details_logger.details.ite │
│   200 │   │   │   # Create a dataset from the dictionary - we force cast to  │
│ ❱ 201 │   │   │   dataset = Dataset.from_list([asdict(detail) for detail in  │
│   202 │   │   │                                                              │
│   203 │   │   │   # We don't keep 'id' around if it's there                  │
│   204 │   │   │   column_names = dataset.column_names                        │
│                                                                              │
│ /home/lc/.conda/envs/lighteval/lib/python3.11/site-packages/datasets/arrow_d │
│ ataset.py:986 in from_list                                                   │
│                                                                              │
│    983 │   │   """                                                           │
│    984 │   │   # for simplicity and consistency wrt OptimizedTypedSequence w │
│    985 │   │   mapping = {k: [r.get(k) for r in mapping] for k in mapping[0] │
│ ❱  986 │   │   return cls.from_dict(mapping, features, info, split)          │
│    987 │                                                                     │
│    988 │   @staticmethod                                                     │
│    989 │   def from_csv(                                                     │
│                                                                              │
│ /home/lc/.conda/envs/lighteval/lib/python3.11/site-packages/datasets/arrow_d │
│ ataset.py:940 in from_dict                                                   │
│                                                                              │
│    937 │   │   │   │   )                                                     │
│    938 │   │   │   arrow_typed_mapping[col] = data                           │
│    939 │   │   mapping = arrow_typed_mapping                                 │
│ ❱  940 │   │   pa_table = InMemoryTable.from_pydict(mapping=mapping)         │
│    941 │   │   if info is None:                                              │
│    942 │   │   │   info = DatasetInfo()                                      │
│    943 │   │   info.features = features                                      │
│                                                                              │
│ /home/lc/.conda/envs/lighteval/lib/python3.11/site-packages/datasets/table.p │
│ y:758 in from_pydict                                                         │
│                                                                              │
│    755 │   │   Returns:                                                      │
│    756 │   │   │   `datasets.table.Table`                                    │
│    757 │   │   """                                                           │
│ ❱  758 │   │   return cls(pa.Table.from_pydict(*args, **kwargs))             │
│    759 │                                                                     │
│    760 │   @classmethod                                                      │
│    761 │   def from_pylist(cls, mapping, *args, **kwargs):                   │
│                                                                              │
│ in pyarrow.lib._Tabular.from_pydict:1968                                     │
│                                                                              │
│ in pyarrow.lib._from_pydict:6337                                             │
│                                                                              │
│ in pyarrow.lib.asarray:402                                                   │
│                                                                              │
│ in pyarrow.lib.array:252                                                     │
│                                                                              │
│ in pyarrow.lib._handle_arrow_array_protocol:114                              │
│                                                                              │
│ /home/lc/.conda/envs/lighteval/lib/python3.11/site-packages/datasets/arrow_w │
│ riter.py:229 in __arrow_array__                                              │
│                                                                              │
│   226 │   │   │   │   out = list_of_np_array_to_pyarrow_listarray(data)      │
│   227 │   │   │   else:                                                      │
│   228 │   │   │   │   trying_cast_to_python_objects = True                   │
│ ❱ 229 │   │   │   │   out = pa.array(cast_to_python_objects(data, only_1d_fo │
│   230 │   │   │   # use smaller integer precisions if possible               │
│   231 │   │   │   if self.trying_int_optimization:                           │
│   232 │   │   │   │   if pa.types.is_int64(out.type):                        │
│                                                                              │
│ in pyarrow.lib.array:372                                                     │
│                                                                              │
│ in pyarrow.lib._sequence_to_array:42                                         │
│                                                                              │
│ in pyarrow.lib.pyarrow_internal_check_status:155                             │
│                                                                              │
│ in pyarrow.lib.check_status:92                                               │
╰──────────────────────────────────────────────────────────────────────────────╯
ArrowInvalid: cannot mix list and non-list, non-null values

To Reproduce

I executed the following command to eval Qwen2.5-0.5B-Instruct with med_qa benchmark, but got error.

NAMESPACE=Qwen
MODEL_NAME=Qwen2.5-0.5B-Instruct #
MODEL=$NAMESPACE/$MODEL_NAME
MODEL_ARGS="pretrained=$MODEL,dtype=bfloat16,max_model_length=2048,gpu_memory_utilization=0.8,generation_parameters={max_new_tokens:2048,temperature:0.6,top_p:0.95}"
OUTPUT_DIR=data/evals/$MODEL

TASK=med_qa
LOG_FILE=logs/evals/${TASK}_${MODEL_NAME}.log
CUDA_VISIBLE_DEVICES=0 nohup lighteval vllm $MODEL_ARGS "helm|$TASK|0|0" \
    --use-chat-template \
    --output-dir $OUTPUT_DIR \
    > ${LOG_FILE} 2>&1 &

Expected behavior

Save results of evaluation successfully.

Version info

Ubuntu 24.04
lighteval 0.8.1
cuda 12.4
vllm 0.8.3
torch 2.6.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions