Advanced client-side invoice OCR, parsing, export, and history system — powered entirely by browser technologies, no backend required.
Live Demo: https://atorhub.github.io/anj-dual-ocr-parser/
- Quick OCR – fast text extraction for simple bills
- Enhanced OCR – high-accuracy deep recognition
- PDF Support via pdf.js
- Automatic merge of both OCR passes for improved accuracy
Automatically extracts:
- Merchant / Shop name
- Invoice date
- Total amount + currency detection
- Line items: name, qty, price, total
- Category detection (Food, Shopping, Finance, etc.)
- Auto-corrections (date fixes, currency normalization)
- Issue detection & confidence scoring
- Extracted fields
- Items table
- Raw OCR text
- Cleaned text
- JSON structured output
- Issues & corrections viewer
Export parsed invoices as:
- JSON
- TXT
- CSV
- PDF (with preview capture)
- ZIP bundle (JSON + TXT + CSV + Preview PNG)
- Save every parsed invoice locally
- Reload instantly from history
- Clear all saved invoices
- 100% offline and persistent
Includes 6 animated / pastel / galaxy themes:
- Rose Nebula
- Lilac Glow
- Cotton Candy Sky
- Galaxy Glitter
- Dreamy Blush
- Fairy Dust
Themes are selectable and saved automatically.
- No servers
- No API keys
- No payments
- No privacy issues
Runs fully in-browser using: - pdf.js
- Tesseract.js
- IndexedDB
- html2canvas
- jsPDF
- JSZip
- If PDF → converted to text with pdf.js
- If Image → processed with Tesseract.js
- Quick + Enhanced OCR → combined
Custom rule engine detects:
- Date formats (DD/MM/YYYY, YYYY-MM-DD, etc.)
- Totals (₹, Rs, $, €, £ detection)
- Item rows with qty/price
- Corrects decimals, removes noise, fixes broken lines
Checks for:
- Missing totals
- Invalid dates
- Mangled characters (e.g., “₹” → “Rs”)
- Bad line items
All preview sections update instantly.
Data converts into:
- JSON (structured)
- TXT (raw)
- CSV (Excel friendly)
- PDF (visual snapshot)
- ZIP (bundle)
Saved via IndexedDB with:
- Merchant
- Date
- Total
- Items
- Raw text
- Corrections