Releases: naptha/tesseract.js
Releases · naptha/tesseract.js
v6.0.0
What's Changed
- Fixed memory leaks (#977)
- This version fixed a long-standing issue where memory would rise over time, eventually leading to a crash.
- Reduced runtime and memory usage for most users by updating default formats (#916).
- Fixed compatibility with Electron main process (#925)
- Fixed bug where user-provided parameters were overwritten by defaults (#975).
Breaking Changes
- All outputs formats other than
text
are now disabled by default.- To re-enable the
hocr
output (for example), set the following:worker.recognize(image, {}, { hocr: true })
- See here for a list of possible output formats.
- To re-enable the
- The JavaScript object output format (
blocks
) was tweaked.- Only the array of blocks (
blocks
) is returned.- Previous versions would automatically generate lists of every unit of text (
words
,symbols
, etc.).- If needed, these should now be generated by the user.
- Previous versions would automatically generate lists of every unit of text (
- Only text-based blocks are reported.
- Previous versions reported non-text blocks when detected by Tesseract (e.g. line segments).
- The shape of some objects were changed.
- See the type declarations for reference on properties.
- The main properties--
text
andbbox
--are unchanged.
- Only the array of blocks (
- Various functions and options marked as depreciated previously have been removed.
- This includes
worker.initialize
andworker.loadLanguage
, along with several depreciated options from v2.
- This includes
See #993 for additional discussion about this release.
New Contributors
- @IgorAufricht made their first contribution in #971
Full Changelog: v5.1.1...v6.0.0
v5.1.1
v5.1.0
What's Changed
- Added line size metrics to
blocks
output (#906)line
objects now include a property namedrowAttributes
, which is an object containingascenders
,descenders
, androw_height
metrics- These metrics allow for manual font size calculations that are more accurate than using the
font_size
property.
- Updates to documentation, types, and dependencies
New Contributors
- @Kishlay-notabot made their first contribution in #896
- @k-nero made their first contribution in #922
Full Changelog: v5.0.5...v5.1.0
v5.0.5
What's Changed
- Fixed bug triggered by running
worker.recognize
while a previous call toworker.recognize
is still running (#875)- Sending multiple jobs to the same worker at the same time is still not recommended.
- Instead, schedulers should be used to coordinate running jobs in parallel (see this example)
- Fixed bug with
rotateAuto
option unnecessarily inflating runtime (#892) - Minor fixes to documentation and types
Full Changelog: v5.0.4...v5.0.5
v5.0.4
What's Changed
- Fixed support for setting "init only" parameters using
config
option ofcreateWorker
(#862)- For example,
load_number_dawg
is an "init only" parameter that cannot be set using eitherworker.setParameters
or theoptions
argument ofworker.recognize
. - However,
load_number_dawg
can be set by the followingcreateWorker
statement.createWorker('eng', "0", {}, {load_number_dawg: "0"});
- For example,
- Improvements to documentation
New Contributors
Full Changelog: v5.0.3...v5.0.4
v5.0.3
What's Changed
- Minor changes to types, documentation, and build
New Contributors
- @dora-micha made their first contribution in #843
Full Changelog: v5.0.2...v5.0.3
v5.0.2
What's Changed
- Fixed bugs with wrong lang data being loaded per #834 and #835 by @Balearica in #836
Version 5.0.1
is nearly identical to 5.0.2
and was the latest version for under a day, so does not have its own release notes.
Full Changelog: v5.0.0...v5.0.2
v5.0.0
What's Changed
Major New Features
- Significantly smaller file sizes
- 54% smaller file sizes for English, 73% smaller for Chinese (see #806 for details)
- This results in a ~50% decrease in runtime for first-time users (who do not yet have the data downloaded/cached)
- Significantly lower memory usage
- Worker memory utilization in the web benchmark is reduced from 311 MB to 164 MB (47% reduction)
- The lower memory footprint makes it feasible to use more workers, significantly improving performance for projects that utilize schedulers for parallel processing
- Compatible with iOS 17 (using default settings)
- iOS 17 broke compatibility with Tesseract.js v4--upgrading to v5 should resolve
- See discussion section below for details
- iOS 17 broke compatibility with Tesseract.js v4--upgrading to v5 should resolve
Breaking Changes Impacting Many Users
createWorker
arguments changed- Setting non-default language and OEM now happens in
createWorker
- E.g.
createWorker("chi_sim", 1)
- E.g.
- Setting non-default language and OEM now happens in
worker.initialize
andworker.loadLanguage
functions now do nothing and can be deleted from code- Loading the language and initialization now occurs in
createWorker
- Workers can be re-initialized with different settings using
worker.reinitialize
- Loading the language and initialization now occurs in
In other words, code should be modified from this:
const worker = await Tesseract.createWorker();
await worker.loadLanguage('eng');
await worker.initialize('eng');
const ret = await worker.recognize(file);
To this:
const worker = await Tesseract.createWorker("eng");
const ret = await worker.recognize(file);
Breaking Changes Impacting Fewer Users
- Users who manually set
corePath
will need to update the contents of theircorePath
directorycorePath
should point to a directory that contains all 4 of the files below from Tesseract.js-core v5:tesseract-core.wasm.js
tesseract-core-simd.wasm.js
tesseract-core-lstm.wasm.js
tesseract-core-simd-lstm.wasm.js
- Tesseract.js will automatically select the correct version to use
worker.detect
function disabled by default- Orientation + script detection is a function of the Legacy model only, which is no longer included by default
- To enable, set arguments
legacyCore: true
andlegacyLang: true
increateWorker
options- E.g.
Tesseract.createWorker("eng", 1, {legacyCore: true, legacyLang: true});
- E.g.
- Language of progress logs standardized
- This should only impact users who parse status logs (e.g. to update a loading bar)
Non-Breaking Changes
- Language data loaded from
jsdelivr
by default (rather than GitHub pages)- This should result in improved performance and uptime
- Separate "development" build (that produced
tesseract.dev.js
andworker.dev.js
removed - Documentation and examples were modified to prevent new users from using
Tesseract.recognize
andTesseract.detect
- Users who already use these functions are encouraged to modify their code to use
worker.recognize
andworker.detect
instead
- Users who already use these functions are encouraged to modify their code to use
Considering upgrading from v2 to v5? See #771 for a full guide for updating.
Full Changelog: v4.1.3...v5.0.0
v4.1.4
What's Changed
- Restored compatibility with certain versions of Node.js v14
Full Changelog: v4.1.3...v4.1.4
v4.1.3
What's Changed
- Detect browsers in a Deno-compatible way by @yudai-nkt in #821
- Minor changes (#821, ff173ce)
New Contributors
- @yudai-nkt made their first contribution in #821
Full Changelog: v4.1.2...v4.1.3