Benchmark data to evaluate numerical reasoning and information fusion of LLMs.
SportsMetrics: Blending Text and Numerical Data to Understand Information Fusion in LLMs
Yebowen Hu, Kaiqiang Song, Sangwoo Cho, Xiaoyang Wang, Hassan Foroosh, Dong Yu, Fei Liu
In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL'24), Bangkok, Thailand.
Arxiv Paper
- Select the task from data/
- Import GeneralTaskLoader from sportsmetrics.py
from sportsmetrics import GeneralTaskLoader
batch_size = False # by default
if not batch_size:
# load the task instance one by one
for i in task.iter_instance():
yiled i['system_message'], i['user_message']
else:
# load the task instance by batch
for i in task.iter_batch(batch_size):
yiled i['system_message'], i['user_message']
Instance from TaskLoader
{
"id": str,
"system_message": str,
"user_message": str,
"ground_truth": dict()
}
The LLM is mandatorily required to generate responses in JSON format.
- reasoning-team_points_tracking: Tracking team points in one match.
- reasoning-key_stats_tracking: Tracking the key statistics for sports analytics.
- conflict-one_point_rule: All scoring actions in the competition are set to be worth only one point.
- conflict-swap_{num}_players: Swap {num} of spalyer between two teams.
- robustness-duplicate_{prob}: Replicate the non-scoring move with a probability of {prob}.
- robustness-remove_{prob}: Remove the non-scoring move with a probability of {prob}.
- robustness-shuffled_pbp: Shuffle the order of all moves in play-by-play descriptions while maintain the original order of timestamps.
- robustness-{num}_fiction_names: Randomly select {num} of players from both teams and replace them with names from fiction movies.
- Set <API-Key> in ./openai.yaml
api-key: <Your API>
parameters:
temperature: 0
max_tokens: 4096
top_p: 1
frequency_penalty: 0
presence_penalty: 0
- Customize the script evaluation_sample.py accordingly to generate responses.
Bibtex
@misc{hu2024sportsmetricsblendingtextnumerical,
title={SportsMetrics: Blending Text and Numerical Data to Understand Information Fusion in LLMs},
author={Yebowen Hu and Kaiqiang Song and Sangwoo Cho and Xiaoyang Wang and Hassan Foroosh and Dong Yu and Fei Liu},
year={2024},
eprint={2402.10979},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2402.10979},
}