Ocrs

ocrs is a Rust library and CLI tool for extracting text from images, also known as OCR (Optical Character Recognition).

The goal is to create a modern OCR engine that:

Works well on a wide variety of images (scanned documents, photos containing text, screenshots etc.) with zero or much less preprocessing effort compared to earlier engines like Tesseract. This is achieved by using machine learning more extensively in the pipeline.
Is easy to compile and run across a variety of platforms, including WebAssembly
Is trained on open and liberally licensed datasets
Has a codebase that is easy to understand and modify

Under the hood, the library uses neural network models trained in PyTorch, which are then exported to ONNX and executed using the RTen engine. See the models section for more details.

Status

ocrs is currently in an early preview. Expect more errors than commercial OCR engines.

CLI installation

To install the CLI tool, you will first need Rust and Cargo installed. Then run:

$ cargo install ocrs-cli

CLI usage:

To extract text from an image, run:

$ ocrs image.png

When the tool is run for the first time, it will download the required models automatically and store them in ~/.cache/ocrs.

Additional examples

Extract text from an image and write to content.txt:

$ ocrs image.png -o content.txt

Extract text and layout information from the image in JSON format:

$ ocrs image.png --json -o content.json

Annotate an image to show the location of detected words and lines:

$ ocrs image.png --png -o annotated.png

Models and datasets

ocrs uses neural network models written in PyTorch. See the ocrs-models repository for more details about the models and datasets, as well as tools for training custom models. These models are also available in ONNX format for use with other machine learning runtimes.

Development

To build and run the ocrs library and CLI tool locally you will need a recent stable Rust version installed. Then run:

git clone https://github.com/robertknight/ocrs.git
cd ocrs
cargo run -p ocrs-cli -r -- image.png

Library installation

See the ocrs crate README for details on how to use ocrs as a Rust library.

Name		Name	Last commit message	Last commit date
Latest commit History 169 Commits
.github/workflows		.github/workflows
js-examples/ocr-node		js-examples/ocr-node
ocrs-cli		ocrs-cli
ocrs-extension		ocrs-extension
ocrs		ocrs
tools		tools
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE-APACHE.txt		LICENSE-APACHE.txt
LICENSE-MIT.txt		LICENSE-MIT.txt
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ocrs

Status

CLI installation

CLI usage:

Additional examples

Models and datasets

Development

Library installation

About

Releases

Packages

Languages

License

but-io/ocrs

Folders and files

Latest commit

History

Repository files navigation

Ocrs

Status

CLI installation

CLI usage:

Additional examples

Models and datasets

Development

Library installation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages