Make translation pipeline more token-efficient#929
Merged
ewels merged 1 commit intonextflow-io:masterfrom Mar 31, 2026
Merged
Conversation
Fix verification bugs that caused ~20% wasted API spend: - Fix translation notice insertion for files without H1 headings (conventions.md, nxf_versions.md, README.md always failed verification) - Replace LLM-based semantic verification with line-count truncation check (eliminates false-positive retries and 1 API call per file) Add heading-based chunk translation: - Split files at ## headings, diff old vs new English chunks - Only translate changed sections, reuse existing translations for the rest - Three-pass chunk matching: exact heading, content hash, fuzzy heading - Falls back to full-file translation when chunking is not feasible Add file rename/move detection: - Detect git renames and move translation files instead of orphan+retranslate - Check if content also changed to decide if re-translation is needed Add prompt-change efficiency: - Send prompt diff to model for targeted updates instead of generic "update minimally" instructions Reduce API parallelism: - DEFAULT_PARALLEL: 50 → 10 - GitHub Actions max-parallel: 5 languages at a time - Max concurrent API calls: ~50 (was ~500) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
✅ Deploy Preview for nextflow-training ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Contributor
|
Nextflow linting complete! ✅ 171 files had no errors View formatting changes
|
Member
Author
|
Self-merging, as translation is a vibes-only arena and this stands to have a significant impact on Claude API usage. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
##headings, diff old vs new English, only translate changed sections (estimated 75-85% output token reduction for incremental updates)Details
Verification bug fixes
postprocess.py:ensure_translation_notice()now inserts notice after frontmatter for files without H1 (previously returned unchanged, causing every retry to fail)verify.py: Replaced LLM-based semantic verification (1 API call per file, false positives on transcripts) with a line-count ratio truncation checkChunk translation (
chunking.py,core.py,prompts.py)split_into_chunks()splits at##headings (code-fence aware)diff_chunks()uses three-pass matching: exact heading → content hash → fuzzy heading (>0.8 similarity)_translate_chunks_async()translates only modified/added chunks, reuses existing translations for unchanged sectionsFile rename detection (
git_utils.py)get_renamed_files()usesgit diff --name-status -Mto detect renamesgather_work()moves translation files before computing orphans/missingParallelism (
config.py,translate.yml)DEFAULT_PARALLEL: 50 → 10max-parallel: 5on language matrix (max 5 × 10 = 50 concurrent API calls)Test plan
01_hello_world.md: 7 chunks,essential_scripting_patterns/index.md: 12 chunks)translate syncon one language to verify end-to-end🤖 Generated with Claude Code