Skip to content

Releases: Crivella/ocr_translate

Release v0.7.4

07 Mar 02:23
882ed4f

Choose a tag to compare

Changes

  • Added ocr_translate_libretranslate plugin to allow using LibreTranslate (https://libretranslate.com/) or similar tool with the same API as a translation backend.
    Running LibreTranslate is separate from this server and the plugin should be pointed to the correct endpoint with the OCT_LIBRETRANSLATE_ENDPOINT environment variable. (See the PLUGIN PAGE for more info)

Full Changelog: v0.7.3...v0.7.4

Release v0.7.3

29 Sep 22:15
19808d1

Choose a tag to compare

  • Bugfix missing dependency in plugins_data.json for ocr_translate-paddle plugin and wrong ocr_translate-google version

Full Changelog: v0.7.2...v0.7.3

Release v0.7.2

20 Sep 21:39
70c397e

Choose a tag to compare

Breaking changes

  • staka/fugumt-... models has been removed as it is not working with the newer dependencies.

  • If upgrading python version (eg running with docker that now uses 3.13 instead of 3.11) it is suggested to delete your plugin installations first (<OCT_BASE_DIR>/plugins/). Basically every package will be updated anyway.

Major Changes

  • dependencies updates:
    • python: support for >=3.10, <=3.11 extended to >=3.10 (will test 3.14 once it is out with all its prebuilt packages in PyPI)
    • CUDA: from 11.8 updated to 12.8 (allow using sm_120 GPUs like the RTX 5000 series)
    • torch: from 2.2.1 updated to 2.8.0
    • easyocr: from 1.7.1 updated to 1.7.2
    • paddleocr: from 2.8.1 updated to 3.2.0
      Different models and much ampler set of languages supported
  • Improved logging of the server using rich
  • Environment variables:
    • REMOVED:
      • AUTO_CREATE_LANGUAGES - Languages are now created once at first initialization if missing
        For now the languages have never changed across versions so it was never needed to recreate/updated them.
      • AUTOCREATE_MODELS - Models are now synchronized with the available entrypoints at every server start
        Removed models will be deactivated, and present models will be updated/created.
    • ADDED:
      • OCT_LOGFILE - [true/false/path]. If true, a logfile named $OCT_BASE_DIR/ocr_translate.log will be created. If a path is provided, that will be used instead.
  • Logic of the run_server.py moved inside the package to improve testing and modularity (the script will appear much smaller in the release)
  • QoL improvements for manual plugins installations
    • If you want to try your own plugins or you want to manually install the supported ones (eg with modifications) you can now do so by installing them in the server
      environment.
    • The server will automatically pick up the plugin module name and add it to the DJANGO apps, and synchronize the models in the database at launch.
    • NOTE: packages coming from the managed plugins will take precedence over manually installed ones. This could cause package conflicts. Mix stuff at your own risk.

Fixes

New Contributors

Full Changelog: v0.6.3...v0.7.2

Release v0.6.3

16 Jul 11:53
2a766ac

Choose a tag to compare

Changes

  • Bumped version of ocr_translate_ollama plugin to 0.2.0 in order to allow using newer versions of ollama (> 0.5.5)

Related

Release v0.6.2

11 Mar 14:11
0a4e01c

Choose a tag to compare

Changes

  • Added capability to use django-cors-headers to set CORS headers in server responses
  • Added capability to set CSRF_TRUSTED_ORIGINS to make admin interface properly work from a docker container
  • Documented new environment variables

Fixes:

Release v0.6.1

11 Feb 12:54
41be83a

Choose a tag to compare

This is a minor release introducing a few convenience features for the server configuration and a some improvements to the working of the plugin manager.

  • Ensure changing an option for only a step of the pipeline will not cause the other steps to also be re-triggered
    (Unless there is some change in the incoming input from a changed step)
  • Added locks to API to avoid loading/using models while plugin_manager is working.
  • plugin_manager will try to reinstall failed packages 3 times with an interval before failing
  • Added capability to load both MOST or LAST model used at server start (Fixes #42)
    LOAD_ON_START can now be set to either most or last to load the most used or the last used model at server start.
    Setting it to true will default to most with a deprecation warning.
  • Added 2 environment variables to control server update behavior:
    • OCT_VERSION = ([current_number]/latest) - The version to install/update to default to the version of the downloaded release.
      If set to latest the server will attempt to update to the latest version.
      Can be set to any number of version available on PyPI https://pypi.org/project/django-ocr_translate/#history
    • OCT_AUTOUPDATE = ([false]/true) - If set to true the server will attempt to upgrade the version at every start
      to the configured value of OCT_VERSION (default is release version).

Release v0.6.0

19 Aug 22:11
4170bed

Choose a tag to compare

The main change is the introduction of a plugin manager to install the plugins+dependencies on demand.
This makes the release versions (both windows EXE and docker image) much smaller, and allow users to decide which functionalities they want to use.

IMPORTANT

From version 0.6 onward python and pip need to be installed on the system (3.10 or 3.11).
See more below in the Changes section.

  • Windows: https://www.python.org/downloads/windows/

    • NOTE: make sure to check the box that says "Add Python to PATH" so that pip can be found by the server script without having to make any assumptions
  • Linux: Use your package manager (e.g. sudo apt install python3 python3-pip)

While i think for the most part everything should work fine, i assume there might be some edge cases that I've not considered,
that might make the way I've handled the plugin manager not work for everyone.
Feel free to report such issues and I'll try to fix them as soon as possible.
(If to many problems arise I might consider redesigning the plugin manager itself)

Changes

  • Removed the frozen executable from the release files in favor of an Automatic1111 stile batch script
    • Even with the plugin manager, installing some dependencies that requiers actual compilation by invoking pip from within the frozen executable was giving non trivial to fix trouble.
      For this reason I decided to axe the PyInstaller frozen EXE all together and go with a batch script that will:
      • Allow user to more easily set environment variables (a few of the most relevant ones are already set as empty in the script)
      • Create or reuse a virtual environment in a folder venv in the same directory as the script
      • Install the minimum required packages in it to run the server
      • Run the server
  • Added a plugin manager to install/uninstall plugins on demand
    • The installed plugins can be controlled via the new version of the firefox extension or directly using the
      manage_plugins/ endpoint.

    • The plugins will by be installed under $OCT_BASE_DIR/plugins which by default will be under your user profile (e.g. C:\Users\<USERNAME>\.ocr_translate on windows).
      If you have trouble with space under C:\ consider setting the OCT_BASE_DIR environment variable to a different location.

    • The plugin data is stored in a JSON file inside the project plugins_data.json

    • Version/Scope/Extras of a package to be installed can be controlled via environment variables

      OCT_PKG_<package_name(uppercase)>_[VERSION|SCOPE|EXTRAS]
      

      (eg to change torch to version A.B.C you would set OCT_PKG_TORCH_VERSION="A.B.C").
      If the package name contains a - it should be replaced with _min_ in the package name

    • Removed env variable AUTOCREATE_VALIDATED_MODELS and relative server initialization.
      Now models are created/activated or deactivated via the plugin manager, when the respective plugin is installed/uninstalled.

  • Streamlined docker image to also use the run_server.py script for initialization.
  • Added plugin for ollama (https://github.com/ollama/ollama) for translation using LLMs
    • Note ollama needs to be run/installed separately and the plugin will just make calls to the server.
    • Use the OCT_OLLAMA_ENDPOINT environment variable to specify the endpoint of the ollama server
      (see the plugin page for more details)
  • Added plugin for PaddleOCR (https://github.com/PaddlePaddle/PaddleOCR) (Box and OCR) (seems to work very well
    with chinese).
    • The default versions installed by the plugin_manager of paddlepaddle (2.5.2 on linux and 2.6.1 on windows)
      might not work for every system as there can be underlying failures in the C++ code that the plugin uses.
      The version installed can be controlled using the environment variable OCT_PKG_PADDLEPADDLE_VERSION.
  • Added possibility to specify extra DJANGO_ALLOWED_HOSTS and a server bind address via environment variables. (Fixes #30)
  • Manual model is not implemented as an entrypoint anymore (will work also without recreating models).
  • OCR models can now use a tokenizer and a processor from different models.
  • Added caching of the languages and allowed box/ocr/tsl models for faster response times on the handshake endpoint.
  • New endpoint run_tsl_xua made to work with XUnity.AutoTranslator (https://github.com/bbepis/XUnity.AutoTranslator)
  • Improved API return codes

Migrating from an older version

As usual, the database will be upgraded automatically to the new version.
For safety, it is suggested to make a copy of it (by default under %USERPROFILE%/.ocr_translate) in case you need to downgrade.

Already downloaded model can be reused, but the new structure is slightly different, before you would have something like:

  • %USERPROFILE%/.ocr_translate/
    • <huggingface_models>
    • .easyocr/
      • <easyocr_models>
    • tesseract/
      • <tesseract_models>

Now by default you will have:

  • %USERPROFILE%/.ocr_translate/ (or whatever OCT_BASE_DIR is set to)
    • models/
      • huggingface/ (or whatever TRANSFORMERS_CACHE is set to)
        • <huggingface_models>
      • easyocr/ (or whatever EASYOCR_PREFIX is set to)
        • <easyocr_models>
      • tesseract/ (or whatever TESSEARCT_PREFIX is set to)
        • <tesseract_models>
      • paddleocr/ (or whatever PADDLEOCR_PREFIX is set to)
        • <paddleocr_models>

You can move them manually to mimic the new structure or delete the them and let the server re-download them.

Plugins will be stored under OCT_BASE_DIR/plugins (default to %USERPROFILE%/.ocr_translate)

  • OCT_BASE_DIR/ (default to %USERPROFILE%/.ocr_translate)
    • plugins.json (list of installed plugins)
    • plugins/
      • <plugin_data>/
        The installed python packages divided by scope depending if they are ment to be used for CPU/GPU/BOTH

This folder can go up to several GB when installing torch (huggingface and easyocr) for GPU, so make sure you have enough space.

Fixes

v0.5.1

17 Dec 07:45
25360e3

Choose a tag to compare

What's Changed

Now it is possible to upload manual translation by editing the textboxes (requires extension >=0.2.2). Also the extension can now actually control the advanced option if you like to tinker.
The admin interface of the server has been improved and a superuser is automatically created to access it in order to add other models if you want without having to edit plugins or source files.

  • Implemented endpoint for manual translation
  • Added autocorrect capability to Trie
  • Added endpoint for sending allowed options given the loaded models
  • Improved admin interface to allow users to more easily add models to the database
  • Changed handshake endpoint behavior to send more information required by the extension
  • Improved run_server script for better modularity and reporting
  • Minor fixes

v0.4.0

29 Oct 00:58
b3f8ae4

Choose a tag to compare

What's Changed

Now it is possible to use OCR models that work on a single line.
Before the pipeline would pass the entire BOX to the OCR model which would make model trained on single line spit out nonsensical results.
Now model can be created with ocr_mode set to merged[default] or single.
If set to single the non-merged bounding boxes will be passed and the model.
The text results will afterward be stiched together by reasonably ordering the Boxes by line/column chunks.

  • Modified the API for the OCRBoxModel._box_detection should now return a list of dictionaries containing 'merged: tuple[int, int, int, int] the merged bounding box and 'single': list[tuple[int, int, int, int]] a list of single bounding boxes that has been merged into merged.
  • Modified the database models:
    • OCRModel: Added ocr_mode field with possible values: merged[default] single.
    • BBox: Foreign key from_ocr renamed to from_ocr_merged
    • BBox: Added foreign key from_ocr_single
    • BBox: Added foreign key to_merged (point to the merged BBox generated by merging THIS + other boxes)
    • OCRRun: Foreign key result renamed to result_merged (denote the output was from a merged real/mock run)
    • OCRRun: Added foreign key result_single (denote the output was from a single run)
  • Fixed a bug related to Issue #11 where the %userprofile%/.ocr_translate folder was not being properly created by the EXE release if it did not exists.

v0.3.2

09 Oct 04:33
06f9bd9

Choose a tag to compare

IMPORTANT:

Due to this bug the run_server.py is not creating the %userprofile%/.ocr_translate folder automatically.
If this is your first install of the server please manually create the %userprofile%/.ocr_translate folder (you can type %userprofile% in file explorer and create a folder named .ocr_translate).
This will be fixed in the next release of the code

What's Changed

  • All feature for box/ocr/tsl have been moved to plugins in separate packages (They are still bundled together with the EXE release)
  • Improved pre-parsing of OCRed text before translation for languages with latin alphabet
    • Introduced a way to remove ghost character generated at the begin/end of every string (e.g. tesseract would produce random character at the begin/end of many string, probably due to speech bubble edges included in the box).
    • Introduced Trie capability (only for languages with a list/freq file ... for now only English)
      • Can use trie to detect if an incorrect work ("helloworld") should be split into multiple valid words (["hello", "world"])
    • Added English word list/freq file.
  • From the plugin for easyocr, the boxes now are merged with higher tolerances, to reduce occurrence of multiple boxes in a single speech bubble (It would cause translation to be much worse since boxes are translated 1by1)

NOTE: There is also a plugin to run translation with google translate. It is not included in the EXE release as it does not fit with the main idea of the tool being something that will run entirely on system, but if there is request for it I can include it in successive release bundles (anyway you would need to select it to use it).