Web application for automating the translation of manga/comic page images using AI. Targets speech bubbles and text outside of speech bubbles. Supports 54 languages and custom font pack usage.
- Speech bubble detection, segmentation, cleaning (YOLO + SAM 2.1)
- Outside speech bubble text detection & inpainting (YOLO + Flux Kontext/CV2)
- LLM-powered OCR and translations (supports 54 languages)
- Text rendering and alignment (with custom font packs)
- Upscaling (2x-AnimeSharpV4)
- Single/Batch image processing with directory structure preservation and ZIP file support
- Two interfaces: Web UI (Gradio) and CLI
- All-in-one button; no human intervention required
- Various options to tailor the process
- Python 3.10+
- PyTorch (CPU, CUDA, ROCm)
- YOLO model (
.pt) for speech bubble detection; auto-downloaded - Font pack with
.ttf/.otf - Any LLM for Japanese source text; vision-capable LLM for other languages (API or local)
Download the standalone zip from the releases page: Releases
- Default package: Download once, run
setup.batbefore first launch to install dependencies, andupdate-standalone.batto update to the latest version (see Updating). InstallsPyTorch v2.9.1+cu128. - Pre-downloaded package: Download per version, no setup required, and no included update script. Contains
PyTorch v2.9.1+cu128. - Both include the Komika (for normal text), Cookies (for OSB text), and Comicka (for either) font packs
- Clone and enter the repo
git clone https://github.com/meangrinch/MangaTranslator.git
cd MangaTranslator- Create and activate a virtual environment (recommended)
python -m venv venv
# Windows PowerShell/CMD
.\venv\Scripts\activate
# Linux/macOS
source venv/bin/activate- Install PyTorch (see: PyTorch Install)
# Example (CUDA 12.8)
pip install torch==2.9.1+cu128 torchvision==0.24.1+cu128 --extra-index-url https://download.pytorch.org/whl/cu128
# Example (CPU)
pip install torch torchvision- Install Nunchaku (optional, for inpainting outside-bubble text.)
- Nunchaku wheels are not on PyPI. Install directly from the v1.0.2 GitHub release URL, matching your OS and Python version. CUDA only.
# Example (Windows, Python 3.13, PyTorch 2.9.1)
pip install https://github.com/nunchaku-tech/nunchaku/releases/download/v1.0.2/nunchaku-1.0.2+torch2.9-cp313-cp313-win_amd64.whl- Install dependencies
pip install -r requirements.txt- The application will automatically download and use all required models
- Put font packs as subfolders in
fonts/with.otf/.ttffiles - Prefer filenames that include
italic/boldor both so variants are detected - Example structure:
fonts/
├─ CC Wild Words/
│ ├─ CCWildWords-Regular.otf
│ ├─ CCWildWords-Italic.otf
│ ├─ CCWildWords-Bold.otf
│ └─ CCWildWords-BoldItalic.otf
└─ Komika/
├─ KOMIKA-HAND.ttf
└─ KOMIKA-HANDBOLD.ttf
- Providers: Google, OpenAI, Anthropic, xAI, DeepSeek, Z.ai, Moonshot AI, OpenRouter, OpenAI-Compatible
- Web UI: configure provider/model/key in the Config tab (stored locally)
- CLI: pass keys/URLs as flags or via env vars
- Env vars:
GOOGLE_API_KEY,OPENAI_API_KEY,ANTHROPIC_API_KEY,XAI_API_KEY,DEEPSEEK_API_KEY,ZAI_API_KEY,MOONSHOT_API_KEY,OPENROUTER_API_KEY,OPENAI_COMPATIBLE_API_KEY - OpenAI-compatible default URL:
http://localhost:1234/v1
- If you want to use the OSB text pipeline, you need a Hugging Face token with access to the
deepghs/AnimeText_yoloandblack-forest-labs/FLUX.1-Kontext-devrepositories. - Follow these steps to create one:
- Sign in or create a Hugging Face account
- Visit and accept the terms on: AnimeText_yolo and FLUX.1 Kontext (dev)
- Create a new access token in your Hugging Face settings with read access to gated repos ("Read access to contents of public gated repos")
- Add the token to the app:
- Web UI: set
hf_tokenin Config - Env var (alternative): set
HUGGINGFACE_TOKEN
- Web UI: set
- Save config to preserve the token across sessions
- Windows: double-click
start-webui.bat(venvmust be present for manual install) - Or run:
python app.py --open-browserOptions: --models (default ./models), --fonts (default ./fonts), --port (default 7676), --cpu.
First launch can take ~1–2 minutes.
Examples:
# Single image, Japanese → English, Google provider
python main.py --input <image_path> \
--font-dir "fonts/Komika" --provider Google --google-api-key <AI...>
# Batch folder, custom source/target languages, OpenAI-Compatible provider (LM Studio)
python main.py --input <folder_path> --batch \
--font-dir "fonts/Komika" \
--input-language <src_lang> --output-language <tgt_lang> \
--provider OpenAI-Compatible --openai-compatible-url http://localhost:1234/v1 \
--output ./output
# Single Image, Japanese → English (Google), OSB text pipeline, custom OSB text font
python main.py --input <image_path> \
--font-dir "fonts/Komika" --provider Google --google-api-key <AI...> \
--osb-enable --osb-font-name "fonts/fast_action"
# Cleaning-only mode (no translation/text rendering)
python main.py --input <image_path> --cleaning-only
# Test mode (no translation; render placeholder text)
python main.py --input <image_path> --test-mode
# Full options
python main.py --help- Launch the Web UI (use
start-webui.baton Windows, or the command above) - Upload image(s) in the Translator tab (single) or Batch tab (multiple)
- Choose a font pack; set source/target languages
- Open Config and set: LLM provider/model, API key or endpoint, reading direction (
rtlfor manga,ltrfor comics) - Click Translate / Start Batch Translating — outputs save to
./output/ - Enable "Cleaning-only Mode" or "Test Mode" in Other to skip translation and/or render placeholder text
- Windows portable:
- Default Package: Run
update-standalone.bat. To update requirements, runupdate-standalone.bat->setup.bat - Pre-downloaded Package: Download the latest version from the releases page
- Default Package: Run
- Manual install: from the repo root:
git pull
# Update requirements if needed
venv\Scripts\activate # If venv present
pip install -r requirements.txt- License: Apache-2.0 (see LICENSE)
- Author: grinnch
- YOLOv8m Speech Bubble Detector: kitsumed
- Comic Speech Bubble Detector YOLOv8m: ogkalu
- SAM 2.1 (Segment Anything): Meta AI
- FLUX.1 Kontext: Black Forest Labs
- Nunchaku: Nunchaku Tech
- 2x-AnimeSharpV4: Kim2091
- Manga OCR: kha-white
- Manga109 YOLO: deepghs
- AnimeText YOLO: deepghs

