-
Notifications
You must be signed in to change notification settings - Fork 0
Testing OCR on Llama Vision and Google Vision
shubhh139 edited this page Jun 14, 2025
·
1 revision
Google Vision:
Successfully extracted characters from images with high accuracy.
Handled printed and handwritten text well.
LLaMA Vision Models (3.2, 11B/90B):
Performed well on English text.
Output quality was good for clear, printed or typed English sentences.
Google Vision:
Handled multilingual OCR tasks efficiently.
Able to accurately extract text from images in languages other than English, including Hindi, Marathi, and other Indic scripts.
LLaMA Vision:
Struggled with non-English scripts, especially Indic languages.
Unable to convert Devanagari (used in Hindi, Marathi) characters properly due to limited multilingual support in the vision-language model.
🔤 Indic Language OCR via indic-trocr
indic-trocr was explored to handle OCR for Indic languages (e.g., Hindi, Marathi).
The initial setup involved multiple issues during:
Cloning the repository
Installing dependencies
Running the provided scripts
These issues were resolved, and the environment is now set up.
⚠️ Note: Testing with real-world datasets is still pending.