Releases: Crivella/ocr_translate
Release v0.7.4
Changes
- Added
ocr_translate_libretranslateplugin to allow using LibreTranslate (https://libretranslate.com/) or similar tool with the same API as a translation backend.
Running LibreTranslate is separate from this server and the plugin should be pointed to the correct endpoint with theOCT_LIBRETRANSLATE_ENDPOINTenvironment variable. (See the PLUGIN PAGE for more info)
Full Changelog: v0.7.3...v0.7.4
Release v0.7.3
- Bugfix missing dependency in
plugins_data.jsonforocr_translate-paddleplugin and wrongocr_translate-googleversion
Full Changelog: v0.7.2...v0.7.3
Release v0.7.2
Breaking changes
-
staka/fugumt-...models has been removed as it is not working with the newer dependencies.- Issue related to huggingface/transformers#24657 (comment)
- Unfortunately the version of transformers/tokenizers can't be lowered below
4.48.0/0.20.2
as tokenizers does not support python 3.13 before https://github.com/huggingface/tokenizers/releases/tag/v0.20.2
-
If upgrading python version (eg running with docker that now uses 3.13 instead of 3.11) it is suggested to delete your plugin installations first (
<OCT_BASE_DIR>/plugins/). Basically every package will be updated anyway.
Major Changes
- dependencies updates:
-
python: support for>=3.10, <=3.11extended to>=3.10(will test3.14once it is out with all its prebuilt packages in PyPI) -
CUDA: from11.8updated to12.8(allow usingsm_120GPUs like the RTX 5000 series) -
torch: from2.2.1updated to2.8.0 -
easyocr: from1.7.1updated to1.7.2 -
paddleocr: from2.8.1updated to3.2.0
Different models and much ampler set of languages supported
-
- Improved logging of the server using rich
- Environment variables:
- REMOVED:
AUTO_CREATE_LANGUAGES- Languages are now created once at first initialization if missing
For now the languages have never changed across versions so it was never needed to recreate/updated them.AUTOCREATE_MODELS- Models are now synchronized with the available entrypoints at every server start
Removed models will be deactivated, and present models will be updated/created.
- ADDED:
OCT_LOGFILE-[true/false/path]. If true, a logfile named$OCT_BASE_DIR/ocr_translate.logwill be created. If a path is provided, that will be used instead.
- REMOVED:
- Logic of the
run_server.pymoved inside the package to improve testing and modularity (the script will appear much smaller in the release) - QoL improvements for manual plugins installations
- If you want to try your own plugins or you want to manually install the supported ones (eg with modifications) you can now do so by installing them in the server
environment. - The server will automatically pick up the plugin module name and add it to the DJANGO apps, and synchronize the models in the database at launch.
- NOTE: packages coming from the managed plugins will take precedence over manually installed ones. This could cause package conflicts. Mix stuff at your own risk.
- If you want to try your own plugins or you want to manually install the supported ones (eg with modifications) you can now do so by installing them in the server
Fixes
- Fix #55
New Contributors
- @Prowler1000 made their first contribution in #56
Full Changelog: v0.6.3...v0.7.2
Release v0.6.3
Changes
- Bumped version of
ocr_translate_ollamaplugin to0.2.0in order to allow using newer versions ofollama(> 0.5.5)
Related
Release v0.6.2
Changes
- Added capability to use
django-cors-headersto set CORS headers in server responses - Added capability to set
CSRF_TRUSTED_ORIGINSto make admin interface properly work from a docker container - Documented new environment variables
Fixes:
- Fix #45
Release v0.6.1
This is a minor release introducing a few convenience features for the server configuration and a some improvements to the working of the plugin manager.
- Ensure changing an option for only a step of the pipeline will not cause the other steps to also be re-triggered
(Unless there is some change in the incoming input from a changed step) - Added locks to API to avoid loading/using models while plugin_manager is working.
plugin_managerwill try to reinstall failed packages 3 times with an interval before failing- Added capability to load both MOST or LAST model used at server start (Fixes #42)
LOAD_ON_STARTcan now be set to eithermostorlastto load the most used or the last used model at server start.
Setting it totruewill default tomostwith a deprecation warning. - Added 2 environment variables to control server update behavior:
OCT_VERSION= ([current_number]/latest) - The version to install/update to default to the version of the downloaded release.
If set tolatestthe server will attempt to update to the latest version.
Can be set to any number of version available on PyPI https://pypi.org/project/django-ocr_translate/#historyOCT_AUTOUPDATE= ([false]/true) - If set totruethe server will attempt to upgrade the version at every start
to the configured value ofOCT_VERSION(default is release version).
Release v0.6.0
The main change is the introduction of a plugin manager to install the plugins+dependencies on demand.
This makes the release versions (both windows EXE and docker image) much smaller, and allow users to decide which functionalities they want to use.
IMPORTANT
From version 0.6 onward python and pip need to be installed on the system (3.10 or 3.11).
See more below in the Changes section.
-
Windows: https://www.python.org/downloads/windows/
- NOTE: make sure to check the box that says "Add Python to PATH" so that pip can be found by the server script without having to make any assumptions
-
Linux: Use your package manager (e.g. sudo apt install python3 python3-pip)
While i think for the most part everything should work fine, i assume there might be some edge cases that I've not considered,
that might make the way I've handled the plugin manager not work for everyone.
Feel free to report such issues and I'll try to fix them as soon as possible.
(If to many problems arise I might consider redesigning the plugin manager itself)
Changes
- Removed the frozen executable from the release files in favor of an Automatic1111 stile batch script
- Even with the plugin manager, installing some dependencies that requiers actual compilation by invoking pip from within the frozen executable was giving non trivial to fix trouble.
For this reason I decided to axe the PyInstaller frozen EXE all together and go with a batch script that will:- Allow user to more easily set environment variables (a few of the most relevant ones are already set as empty in the script)
- Create or reuse a virtual environment in a folder venv in the same directory as the script
- Install the minimum required packages in it to run the server
- Run the server
- Even with the plugin manager, installing some dependencies that requiers actual compilation by invoking pip from within the frozen executable was giving non trivial to fix trouble.
- Added a plugin manager to install/uninstall plugins on demand
-
The installed plugins can be controlled via the new version of the firefox extension or directly using the
manage_plugins/ endpoint. -
The plugins will by be installed under
$OCT_BASE_DIR/pluginswhich by default will be under your user profile (e.g.C:\Users\<USERNAME>\.ocr_translateon windows).
If you have trouble with space under C:\ consider setting the OCT_BASE_DIR environment variable to a different location. -
The plugin data is stored in a JSON file inside the project plugins_data.json
-
Version/Scope/Extras of a package to be installed can be controlled via environment variables
OCT_PKG_<package_name(uppercase)>_[VERSION|SCOPE|EXTRAS](eg to change torch to version A.B.C you would set OCT_PKG_TORCH_VERSION="A.B.C").
If the package name contains a-it should be replaced with_min_in the package name -
Removed env variable AUTOCREATE_VALIDATED_MODELS and relative server initialization.
Now models are created/activated or deactivated via the plugin manager, when the respective plugin is installed/uninstalled.
-
- Streamlined docker image to also use the run_server.py script for initialization.
- Added plugin for ollama (https://github.com/ollama/ollama) for translation using LLMs
- Note ollama needs to be run/installed separately and the plugin will just make calls to the server.
- Use the OCT_OLLAMA_ENDPOINT environment variable to specify the endpoint of the ollama server
(see the plugin page for more details)
- Added plugin for PaddleOCR (https://github.com/PaddlePaddle/PaddleOCR) (Box and OCR) (seems to work very well
with chinese).- The default versions installed by the plugin_manager of paddlepaddle (2.5.2 on linux and 2.6.1 on windows)
might not work for every system as there can be underlying failures in the C++ code that the plugin uses.
The version installed can be controlled using the environment variable OCT_PKG_PADDLEPADDLE_VERSION.
- The default versions installed by the plugin_manager of paddlepaddle (2.5.2 on linux and 2.6.1 on windows)
- Added possibility to specify extra DJANGO_ALLOWED_HOSTS and a server bind address via environment variables. (Fixes #30)
- Manual model is not implemented as an entrypoint anymore (will work also without recreating models).
- OCR models can now use a tokenizer and a processor from different models.
- Added caching of the languages and allowed box/ocr/tsl models for faster response times on the handshake endpoint.
- New endpoint run_tsl_xua made to work with XUnity.AutoTranslator (https://github.com/bbepis/XUnity.AutoTranslator)
- Improved API return codes
Migrating from an older version
As usual, the database will be upgraded automatically to the new version.
For safety, it is suggested to make a copy of it (by default under %USERPROFILE%/.ocr_translate) in case you need to downgrade.
Already downloaded model can be reused, but the new structure is slightly different, before you would have something like:
- %USERPROFILE%/.ocr_translate/
- <huggingface_models>
- .easyocr/
- <easyocr_models>
- tesseract/
- <tesseract_models>
Now by default you will have:
- %USERPROFILE%/.ocr_translate/ (or whatever
OCT_BASE_DIRis set to)- models/
- huggingface/ (or whatever
TRANSFORMERS_CACHEis set to)- <huggingface_models>
- easyocr/ (or whatever
EASYOCR_PREFIXis set to)- <easyocr_models>
- tesseract/ (or whatever
TESSEARCT_PREFIXis set to)- <tesseract_models>
- paddleocr/ (or whatever
PADDLEOCR_PREFIXis set to)- <paddleocr_models>
- huggingface/ (or whatever
- models/
You can move them manually to mimic the new structure or delete the them and let the server re-download them.
Plugins will be stored under OCT_BASE_DIR/plugins (default to %USERPROFILE%/.ocr_translate)
OCT_BASE_DIR/(default to%USERPROFILE%/.ocr_translate)- plugins.json (list of installed plugins)
- plugins/
- <plugin_data>/
The installed python packages divided by scope depending if they are ment to be used for CPU/GPU/BOTH
- <plugin_data>/
This folder can go up to several GB when installing torch (huggingface and easyocr) for GPU, so make sure you have enough space.
Fixes
v0.5.1
What's Changed
Now it is possible to upload manual translation by editing the textboxes (requires extension >=0.2.2). Also the extension can now actually control the advanced option if you like to tinker.
The admin interface of the server has been improved and a superuser is automatically created to access it in order to add other models if you want without having to edit plugins or source files.
- Implemented endpoint for manual translation
- Added autocorrect capability to Trie
- Added endpoint for sending allowed options given the loaded models
- Improved admin interface to allow users to more easily add models to the database
- Changed handshake endpoint behavior to send more information required by the extension
- Improved run_server script for better modularity and reporting
- Minor fixes
v0.4.0
What's Changed
Now it is possible to use OCR models that work on a single line.
Before the pipeline would pass the entire BOX to the OCR model which would make model trained on single line spit out nonsensical results.
Now model can be created with ocr_mode set to merged[default] or single.
If set to single the non-merged bounding boxes will be passed and the model.
The text results will afterward be stiched together by reasonably ordering the Boxes by line/column chunks.
- Modified the API for the
OCRBoxModel._box_detectionshould now return a list of dictionaries containing'merged: tuple[int, int, int, int]the merged bounding box and'single': list[tuple[int, int, int, int]]a list of single bounding boxes that has been merged intomerged. - Modified the database models:
OCRModel: Addedocr_modefield with possible values:merged[default]single.BBox: Foreign keyfrom_ocrrenamed tofrom_ocr_mergedBBox: Added foreign keyfrom_ocr_singleBBox: Added foreign keyto_merged(point to the mergedBBoxgenerated by merging THIS + other boxes)OCRRun: Foreign keyresultrenamed toresult_merged(denote the output was from a merged real/mock run)OCRRun: Added foreign keyresult_single(denote the output was from a single run)
- Fixed a bug related to Issue #11 where the
%userprofile%/.ocr_translatefolder was not being properly created by the EXE release if it did not exists.
v0.3.2
IMPORTANT:
Due to this bug the run_server.py is not creating the %userprofile%/.ocr_translate folder automatically.
If this is your first install of the server please manually create the %userprofile%/.ocr_translate folder (you can type %userprofile% in file explorer and create a folder named .ocr_translate).
This will be fixed in the next release of the code
What's Changed
- All feature for box/ocr/tsl have been moved to plugins in separate packages (They are still bundled together with the EXE release)
- Improved pre-parsing of OCRed text before translation for languages with latin alphabet
- Introduced a way to remove ghost character generated at the begin/end of every string (e.g. tesseract would produce random character at the begin/end of many string, probably due to speech bubble edges included in the box).
- Introduced Trie capability (only for languages with a list/freq file ... for now only English)
- Can use trie to detect if an incorrect work ("helloworld") should be split into multiple valid words (["hello", "world"])
- Added English word list/freq file.
- From the plugin for easyocr, the boxes now are merged with higher tolerances, to reduce occurrence of multiple boxes in a single speech bubble (It would cause translation to be much worse since boxes are translated 1by1)
NOTE: There is also a plugin to run translation with google translate. It is not included in the EXE release as it does not fit with the main idea of the tool being something that will run entirely on system, but if there is request for it I can include it in successive release bundles (anyway you would need to select it to use it).