Skip to content

hipe-eval/HIPE-OCRepair-2026-data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

38 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

HIPE-OCRepair-2026 Data Repository

HIPE-OCRepair-2026 is an ICDAR 2026 Competition focused on LLM-assisted OCR post-correction of historical documents, with a particular emphasis on historical newspapers.

With renewed interest driven by large language models (LLMs), OCR post-correction has (re)gained momentum, resulting in a growing number of models and experimental approaches. However, these efforts often rely on heterogeneous legacy datasets that come with important limitations, making systematic evaluation and meaningful comparison across approaches difficult.

A central question motivating this competition is:

To what extent can modern large language models address the OCR debt accumulated in large-scale digitized historical collections?

The competition addresses this by providing HIPE-OCRepair-Bench, a unified multilingual benchmark for OCR post-correction, comprising curated datasets, an evaluation protocol, baseline systems, and an open leaderboard.

πŸ“‹ Participation Guidelines

All information about the task, datasets, evaluation protocol, and submission instructions is available in the Participation Guidelines.

πŸ”— Important Links

🌐 Competition website https://hipe-eval.github.io/HIPE-OCRepair-2026/
πŸ“‹ Participation Guidelines README-Participation-Guidelines.md
πŸ“ˆ Scorer https://github.com/hipe-eval/HIPE-OCRepair-scorer
πŸ“Š Evaluation repository (after competition) https://github.com/hipe-eval/HIPE-OCRepair-2026-eval
πŸ† Leaderboard (to come) https://huggingface.co/spaces/hipe-ocrepair-2026-eval
πŸ“ Registration & contact see competition website

πŸ“¦ Data

Data is available:

  • in the data/ folder of this repository and in the git releases
  • later: also on Zenodo

Release History

  • 02.03.2026: release v0.9

Acknowledgements

The HIPE-OCRepair-2026 organising team expresses its sincere appreciation to the ICDAR-2026 Competition Committee for the overall coordination and support.

HIPE-OCRepair-2026 is part of the HIPE-eval series of shared tasks on historical document and information processing and evaluation.

HIPE-eval editions are organised within the framework of the Impresso – Media Monitoring of the Past project, funded by the Swiss National Science Foundation under grant No. CRSII5_213585 and by the Luxembourg National Research Fund under grant No. 17498891.

About

Data for the HIPE OCRepair-2026 shared task (ICDAR 2026 Competition).

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors