Skip to content

Testing OCR on Llama Vision and Google Vision

shubhh139 edited this page Jun 14, 2025 · 1 revision

✅ English Language (General Images)

Google Vision:
Successfully extracted characters from images with high accuracy.
Handled printed and handwritten text well.

LLaMA Vision Models (3.2, 11B/90B):
Performed well on English text.
Output quality was good for clear, printed or typed English sentences.

🌐 Other Languages (Including Hindi, Marathi, etc.)

Google Vision:
Handled multilingual OCR tasks efficiently.
Able to accurately extract text from images in languages other than English, including Hindi, Marathi, and other Indic scripts.

LLaMA Vision:
Struggled with non-English scripts, especially Indic languages.
Unable to convert Devanagari (used in Hindi, Marathi) characters properly due to limited multilingual support in the vision-language model.

🔤 Indic Language OCR via indic-trocr

indic-trocr was explored to handle OCR for Indic languages (e.g., Hindi, Marathi).

The initial setup involved multiple issues during:

    Cloning the repository

    Installing dependencies

    Running the provided scripts

These issues were resolved, and the environment is now set up.

⚠️ Note: Testing with real-world datasets is still pending.
Clone this wiki locally