HIPE-OCRepair-2026 is an ICDAR 2026 Competition focused on LLM-assisted OCR post-correction of historical documents, with a particular emphasis on historical newspapers.
With renewed interest driven by large language models (LLMs), OCR post-correction has (re)gained momentum, resulting in a growing number of models and experimental approaches. However, these efforts often rely on heterogeneous legacy datasets that come with important limitations, making systematic evaluation and meaningful comparison across approaches difficult.
A central question motivating this competition is:
To what extent can modern large language models address the OCR debt accumulated in large-scale digitized historical collections?
The competition addresses this by providing HIPE-OCRepair-Bench, a unified multilingual benchmark for OCR post-correction, comprising curated datasets, an evaluation protocol, baseline systems, and an open leaderboard.
All information about the task, datasets, evaluation protocol, and submission instructions is available in the Participation Guidelines.
| 🌐 Competition website | https://hipe-eval.github.io/HIPE-OCRepair-2026/ |
| 📋 Participation Guidelines | README-Participation-Guidelines.md |
| 📈 Scorer | https://github.com/hipe-eval/HIPE-OCRepair-scorer |
| 📊 Evaluation repository (after competition) | https://github.com/hipe-eval/HIPE-OCRepair-2026-eval |
| 🏆 Leaderboard (to come) | https://huggingface.co/spaces/hipe-ocrepair-2026-eval |
| 📝 Registration & contact | see competition website |
Data is available:
- 20.03.2026: release of train and dev sets for
dta19dataset | release tag v0.9.2. - 11.03.2026: hot fix for
impresso-snippetsdataset | release tag v0.9.1. - 02.03.2026: first data release with
overproof,icdar17,impresso-nzzandimpresso-snippets| release tag v0.9.
The HIPE-OCRepair-2026 organising team expresses its sincere appreciation to the ICDAR-2026 Competition Committee for the overall coordination and support.
HIPE-OCRepair-2026 is part of the HIPE-eval series of shared tasks on historical document and information processing and evaluation.
HIPE-eval editions are organised within the framework of the Impresso – Media Monitoring of the Past project, funded by the Swiss National Science Foundation under grant No. CRSII5_213585 and by the Luxembourg National Research Fund under grant No. 17498891.