feat(docs): move translation catalogues to git submodule (#8143)#8169
Conversation
Move the gettext .po documentation catalogues out of the main tree into the dedicated zeroclaw-labs/zeroclaw-docs-translations submodule mounted at docs/book/po, per RFC zeroclaw-labs#7184 (.po-only scope, zeroclaw-labs#8143). The mount point is path-transparent: book.toml's gettext preprocessor, cargo mdbook sync, and cargo mdbook build read po/ unchanged. The Rust dev loop (build/test/clippy) needs no submodule; only docs-deploy and release docs jobs check it out (submodules: recursive). bump-version.sh pins the gitlink to the submodule's v{version} tag at release time, falling back to main with a warning when the tag is not yet cut. The in-catalogue version-literal sweep moves to the submodule release; the main-tree sweep no longer touches .po files. docs-deploy.yml gains submodules: recursive on its build checkout. Contributor and maintainer docs document the submodule init path and the two-repo release flow.
WareWolf-MoonWall
left a comment
There was a problem hiding this comment.
Review — PR #8169 feat(docs): move translation catalogues to git submodule (#8143)
Reviewer: WareWolf-MoonWall
Head reviewed: e4be90dda27535c5429fb7d019e525a8e78e1ced
Labels: ci, docs, domain:architecture, risk: medium, scripts, size: M
Verdict: --approve
Clean architectural migration of the gettext .po documentation catalogues to a dedicated git submodule. All required checks pass on the first inspection.
✅ Submodule URL is under the zeroclaw-labs org
.gitmodules:
[submodule "docs/book/po"]
path = docs/book/po
url = https://github.com/zeroclaw-labs/zeroclaw-docs-translations.git
URL resolves to zeroclaw-labs/zeroclaw-docs-translations — the project org, not a personal fork. ✅
✅ Submodule pinned to a specific commit
The docs/book/po gitlink in the diff:
+Subproject commit 04c39aa70c798fe717f51817618485a906776f52
This is a concrete SHA, not a floating branch pointer. The submodule is correctly pinned at merge time. ✅
✅ CI checks out the submodule for docs-deploy
.github/workflows/docs-deploy.yml:
- uses: actions/checkout@...
with:
ref: ${{ steps.version.outputs.TAG }}
fetch-depth: 1
+ submodules: recursiveThe submodules: recursive flag is added to the build checkout step. The Rust build / test / clippy / check jobs are confirmed unmodified — they do not checkout the submodule, which is correct since cargo check --workspace with a deinited submodule is verified clean in the Validation Evidence. ✅
✅ Rollback documented with observable failure symptoms (risk: medium requirement met)
Rollback section:
- Fast path:
git revert <merge-sha>, restore four.pofiles from submodule's tagged commit, drop.gitmodules - Feature flags: None (appropriate — this is a structural change, not a runtime toggle)
- Observable failure symptoms:
docs-deploy"Validate .po format" step finds emptydocs/book/po(no*.pomatched), orcargo mdbook buildrenders translated locales empty because submodule was not initialised
These are precise, observable symptoms. The risk: medium Rollback requirement is satisfied. ✅
✅ Fluent key coverage unaffected
Scope boundary explicitly states: "Does NOT touch the Fluent .ftl catalogues, FTL_CATALOGS in crates/zeroclaw-config/src/schema.rs, include_str! / load_ftl_from_disk sites, or locales.toml." The cli_fluent_coverage CI gate checks .ftl files; moving .po files (gettext documentation translations) does not affect it. ✅
✅ Contributor impact documented
PR body, Compatibility section, and updated maintainer docs (docs/book/src/maintainers/docs-and-translations.md, docs/book/src/_snippets/docs-build-commands.md) document:
- Existing clones:
git submodule update --init docs/book/poonce before building docs - Fresh clones:
git clone --recurse-submodules - Rust dev loop: no action required
The documentation update is proportionate to the contributor impact. ✅
✅ Release-time pinning script is correct
scripts/release/bump-version.sh is updated to:
- Pin the submodule gitlink to the matching
v{version}tag at release time - Fall back to the submodule's
mainwith a warning when the tag is not yet cut
The bash -n syntax check passes (per author's Validation Evidence). The fallback-to-main path includes a warning, which is the right operator signal. ✅
✅ No AI attribution trailers
Commit message grep across the single commit returned no matches. Clean. ✅
✅ CI fully green
All 18 jobs pass (run 27958219362): Format, Lint, Test, Build x86_64, Check all features, Check 32-bit, Check aarch64, Check Windows, Docs Style, Security, Installer Drift, Nix Module Eval, Benchmarks Compile, CI Required Gate, Zerocode RPC Boundary, main. ✅
🟡 Warning — Quality gate on .po content does not run on main-repo PRs post-migration
Before this PR, a PR that changed .po files would have those changes run through the main repo's scripts/ci/docs_quality_gate.sh (which validates changed docs files in the diff). After this PR, .po changes happen in the zeroclaw-docs-translations submodule repo, not in the main repo. PRs to the main repo that only bump the submodule gitlink will not re-validate the .po content through the main repo's CI.
This is architecturally expected for a submodule split — the submodule repo must carry its own CI to gate .po quality. Please confirm that zeroclaw-labs/zeroclaw-docs-translations has a CI workflow that:
- Validates
.poformat on PRs to that repo (equivalent to the currentValidate .po formatstep indocs-deploy) - Runs any msgfmt or locale consistency checks currently done in the main repo context
If the submodule repo does not yet have CI, this is a gap that should be tracked as a follow-up issue. Not blocking for this PR (the docs-deploy job does validate format when it checks out the submodule for deployment), but worth confirming before the submodule starts receiving translation contributions directly.
🔵 Suggestion — Consider adding shallow submodule fetch depth to docs-deploy
The docs-deploy.yml checkout uses fetch-depth: 1 for the main repo. Adding shallow-submodules: true to the actions/checkout step would keep the submodule fetch shallow as well, preventing CI from pulling full translation history on every docs build. Minor CI speed optimization.
🟢 Scope boundary is clean
The .ftl move (the other half of RFC #7184) is explicitly out of scope and not touched. The PR touches exactly: .gitmodules, docs/book/po gitlink, four deleted .po files, docs-deploy.yml, bump-version.sh, and three documentation files. No Rust source, no Fluent files, no config schema. Surgical. ✅
Add a CI gate so direct translation contributions to this submodule repo are validated before merge, instead of only failing at deploy time in the consuming docs build. - msgfmt -c gates format and syntax for every catalogue. - A pure msgid-set diff (via msgcat --sort-output) gates parity across catalogues, ignoring translation status so legitimate untranslated entries pass while added, removed, or renamed keys fail. Closes the validation gap raised in zeroclaw-labs/zeroclaw#8169 review.
The docs/book/po submodule pin only advances through bump-version.sh on a release PR, but nothing in the consuming repo validated the pinned catalogues: the submodule's own CI runs on its main branch in isolation and never reports back across the repo boundary, so the main build could pin an unvalidated or broken SHA. Add a PR-time gate in the main repo that checks out the submodule recursively and validates the catalogues at the exact pinned commit: - msgfmt --check-format per catalogue, mirroring the docs-deploy gate - cross-catalogue msgid-set parity so no locale drifts from the others Triggers when a PR moves the gitlink or .gitmodules, so a bad pin blocks the PR that introduces it instead of surfacing post-merge at deploy time.
Audacity88
left a comment
There was a problem hiding this comment.
Reviewed current head 8683b81796e51ad1c43fa184890f10c64c8d214c against #8143 / #7184, the prior @WareWolf-MoonWall review, the submodule gitlink, docs-deploy.yml, bump-version.sh, and the FND-004 CI policy. The .po submodule split itself is close, but the new validation workflow needs two fixes before this can merge.
🟢 What looks good — The submodule split is otherwise path-transparent
Deleting the generated .po files from the main tree, mounting the submodule at docs/book/po, leaving book.toml's gettext path unchanged, and adding recursive checkout in docs-deploy.yml all line up with the accepted .po-only scope. The shared docs-build snippet and maintainer docs also give existing clones explicit submodule initialization steps.
🔴 Blocking — Pin the new checkout action to a reviewed SHA
.github/workflows/validate-translations-pin.yml adds a new workflow with uses: actions/checkout@v4. ZeroClaw's CI policy requires every uses: reference in workflow files to be pinned to a full commit SHA with a trailing version comment; mutable tags are not permitted, including first-party actions/*.
Because this workflow becomes the gate for the new translation-submodule pin, merging it as-is would add an unreviewed mutable action ref where the rest of the repo uses reviewed SHA pins. Please switch this to a reviewed full-SHA pin with the matching version comment.
🔴 Blocking — Compare complete msgids in the parity gate
The new parity step builds each catalogue's ID set with:
msgcat --sort-output "$po" 2>/dev/null | grep '^msgid ' | sort -uThat drops continuation lines for multi-line msgids, and msgcat wraps long entries by default. On the current es.po catalogue, the same normalization sees 12,713 entries, 4,767 of them with continuation lines, while the grep '^msgid ' output captures only 7,946 unique lines. A bad submodule commit could delete or change one of those multi-line source strings and still compare equal as msgid "", so the new CI does not reliably enforce the catalogue parity it claims.
This follows from the same submodule-validation risk @WareWolf-MoonWall called out earlier; the added workflow is the right direction, but it needs a gettext-aware extractor/canonicalizer that preserves complete msgids, or a canonical .po / POT comparison that includes continuation lines.
WareWolf-MoonWall
left a comment
There was a problem hiding this comment.
Review — PR #8169 feat(docs): move translation catalogues to git submodule (#8143)
Reviewer: WareWolf-MoonWall
Prior head reviewed: e4be90dda27535c5429fb7d019e525a8e78e1ced (Round 25 — APPROVED)
Current head: 8683b81796e51ad1c43fa184890f10c64c8d214c
Incremental scope: 1 new file — .github/workflows/validate-translations-pin.yml (+77 lines)
Verdict: --approve
Incremental commit ci(docs): validate translation submodule at pinned commit adds a
validate-pin CI workflow that gates the docs/book/po gitlink on every PR that touches
the submodule path or .gitmodules. The job: checks out with submodules: recursive,
installs gettext, verifies at least one .po catalogue resolves at the pinned SHA,
runs msgfmt --check-format on all catalogues, and validates msgid parity across locales.
This directly closes the gap flagged in the prior round's 🟡 warning: .po quality is now
gated in the main-repo CI before a bad pin can merge, without depending solely on the
submodule repo's own CI. The validate-pin job already ran and passed on the current CI run
(run 27988327545, 22 s). All 19 CI jobs pass.
✅ Resolved — Prior 🟡 .po CI gap is addressed
Round 25 warned: "If the submodule repo does not yet have CI, this is a gap that should be
tracked as a follow-up issue." The new validate-pin workflow closes the main-repo side
of that gap immediately: any PR that moves the docs/book/po gitlink or changes .gitmodules
now runs catalogue-format and msgid-parity checks before merge. The submodule repo's own CI
question is moot for the main repo's merge gate. ✅
✅ Logic correct — four-step validation
git -C docs/book/po rev-parse HEADconfirms the submodule resolved to a concrete SHA.find docs/book/po -maxdepth 1 -name '*.po' | wc -lasserts at least one catalogue is present.msgfmt --check-format "$po" -o /dev/nullvalidates each catalogue's format.msgcat --sort-output | grep '^msgid '+diffvalidates msgid set parity across locales.
Step 4 prevents silent translation drift (one locale gaining or losing strings that others have).
This is the right shape for a submodule-pin gate. ✅
✅ Workflow security posture is correct
permissions: contents: read— no write permissions added.cancel-in-progress: trueconcurrency group — correct for a PR gate.- No secrets, no push, no external network beyond apt and the submodule checkout.
- Triggering only on
docs/book/po,.gitmodules, and the workflow file itself — no over-triggering. ✅
🟡 Warning — actions/checkout@v4 is a floating tag pin
All other new workflow files in this PR family (and across docker-image-pr.yml, docs-deploy.yml)
pin actions/checkout to a full SHA with a version comment, e.g.
actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2.
The new validate-translations-pin.yml uses actions/checkout@v4 — a floating major version tag.
This is inconsistent with the project's supply-chain posture. A malicious or accidental update to
the v4 tag in the actions/actions repo would silently affect this workflow without any diff
in this repo. Not blocking here because the job is read-only (contents: read) and checkout
is a well-trusted action, but it should be brought into line with the rest of the workflow fleet
in a follow-up commit or an immediate fixup.
✅ No AI/bot attribution trailers
Commit message ci(docs): validate translation submodule at pinned commit — clean, no AI trailers. ✅
✅ All CI checks passing
19/19 jobs pass, including the new validate-pin job (22 s, run 27988327545). ✅
Two fixes to the translation-pin gate: - actions/checkout was on the mutable @v4 tag, violating the repo CI policy that every uses: ref be a full reviewed SHA. Pin to the same de0fac2 (v6.0.2) every other workflow uses. - The parity step extracted only `^msgid ` heads, dropping continuation lines for multi-line msgids (msgcat wraps long entries by default). On es.po that saw 7,946 lines instead of the 12,668 complete msgids, so a bad submodule commit could alter a multi-line source string and still compare equal. Canonicalize with msgcat --no-wrap so continuation lines collapse onto the msgid line before comparison, and include msgid_plural.
Addressed both blockers: actions/checkout is pinned to de0fac2 (v6.0.2), the same reviewed SHA every other workflow uses. The parity gate now canonicalizes with msgcat --no-wrap and compares complete msgids (incl. continuation lines and msgid_plural) instead of grep ^msgid heads; verified locally it sees 12,668 full msgids on es.po vs the old 7,946 and parity passes across all four catalogues. Re-requesting review.
Audacity88
left a comment
There was a problem hiding this comment.
Thanks for the quick fix. I re-reviewed current head dcd32a339f37eb940f3492487d49f276d4c1174f against my prior request-changes review, the live PR diff, the updated validate-translations-pin.yml, and public-safe gettext counts from the current pinned catalogues. The checkout pin issue is resolved, but the msgid parity gate still does not compare every complete source string.
✅ Resolved — Checkout is now pinned to a reviewed SHA
The new workflow now uses actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2, matching the repo's full-SHA workflow policy. That resolves the action-pinning blocker from my prior review.
🔴 Blocking — --no-wrap still leaves embedded-newline msgids invisible
The new extract_ids() is a good narrowing of the original bug, but it still only prints lines beginning with msgid or msgid_plural after msgcat --no-wrap --sort-output. --no-wrap stops gettext from wrapping long logical strings, but it cannot collapse msgids that contain real newline characters. Those still render as msgid "" plus continuation lines.
On the current es.po catalogue, msgcat --no-wrap --sort-output still produces 12,713 msgid entries, with 45 entries carrying continuation lines. The new extractor captures 12,668 unique msgid / msgid_plural lines, so those 45 embedded-newline source strings are still reduced to the same msgid "" head and their actual text is not compared. A bad submodule commit could change or delete one of those multi-line strings in one locale while another multi-line entry keeps msgid "" present, and this parity gate would still compare equal.
Please make the extractor accumulate the complete gettext entry content, including continuation lines until the corresponding msgstr, or use a gettext-aware parser/canonicalizer that emits the full logical msgid / msgid_plural values before sorting and diffing.
msgcat --no-wrap stops line wrapping but cannot collapse msgids that contain real newline characters; those 45 es.po entries still reduced to a bare msgid "" head, so a change to one of them compared equal. Replace the grep-head extractor with an awk state machine that accumulates every continuation string line of each msgid / msgid_plural entry into one NUL-delimited record (sort -z, cmp -s). Verified locally it now captures all 12,713 entries incl. the 45 multi-line ones, parity passes across all four catalogues, and a tampered continuation line is detected.
Addressed: the parity extractor is now a gettext-aware awk state machine that accumulates every continuation line of each msgid/msgid_plural entry into one NUL-delimited record (sort -z, cmp -s), so embedded-newline source strings are compared in full. Verified locally it captures all 12,713 es.po entries incl. the 45 multi-line ones, parity passes across all four catalogues, and a tampered continuation line is now detected. Re-requesting review.
Audacity88
left a comment
There was a problem hiding this comment.
Reviewed current head 3c4c671e2efa46dbf6e21acbd21d33a74592a827 against #8143 / #7184, my earlier request-changes review, the latest validate-translations-pin.yml diff, the docs/release guidance, and the live PR checks. I am approving because the two blockers from my prior review are now resolved. The new validate-pin workflow has passed on this head; the broader Quality Gate was still running when I checked, so that still needs to finish before merge.
✅ Resolved — Checkout is pinned to a reviewed SHA
The workflow still uses actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2, so the mutable-action blocker from my first review remains fixed.
✅ Resolved — The parity gate now keeps embedded-newline msgids visible
The latest extract_ids() no longer compares only msgid / msgid_plural header lines. It accumulates each gettext source field, including continuation string lines until the corresponding msgstr, emits NUL-delimited records for each msgid / msgid_plural source field, sorts those records with sort -z, and compares the resulting files with cmp -s.
That fixes the failure mode from my earlier review: entries that render as msgid "" because they contain real embedded newlines now carry their continuation text into the compared record, so a catalogue pin that changes one of those source strings should no longer compare equal by accident.
🟢 What looks good — The submodule boundary is documented and gated
The PR keeps the common Rust build/test/clippy loop submodule-free, mounts the gettext catalogues at the path mdBook already expects, documents the fresh-clone and existing-clone setup commands, and adds a focused main-repo gate for .gitmodules, the docs/book/po gitlink, and the validation workflow itself. That is the right shape for keeping translation-cache churn out of the main tree without making ordinary contributors initialize the submodule unless they are building or syncing translated docs.
Replace the hand-run six-command submodule dance (cargo mdbook sync,
git -C add/commit/push/tag/push, then bump-version.sh) with a single
scripted entry point so the maintainer never types a git -C invocation
or a version literal during a release.
scripts/release/refresh-translations.sh derives the version from
Cargo.toml (the same source bump-version.sh reads), runs the sync pass,
commits and pushes the catalogues to the zeroclaw-docs-translations
submodule, tags it v{version}, pushes the tag, then pins the main-repo
gitlink via bump-version.sh. Supports --no-translate and an explicit
version override; aborts if the tag already exists.
Make the bump-version.sh submodule pin fail closed: a missing v{version}
catalogue tag now aborts the release instead of silently pinning the
gitlink to a moving main, which defeated the reproducibility the tag
exists to guarantee. Drop the duplicate tag fetch.
Rewrite the release runbook Step 2 and the docs-and-translations release
workflow to the single ./scripts/release/refresh-translations.sh call:
no bash prefix, no git -C, no X.Y.Z placeholder. Add a short README note
on cloning with the docs submodule.
Separate the two release scripts by concern and document them as two
ordered steps instead of one script calling the other.
bump-version.sh runs first and owns version-reference sync only: it no
longer pins the docs submodule gitlink. Drop the pin block and correct
the stale comment that claimed the main-tree sweep touches .po files;
the catalogues live in the submodule and own their own version swaps.
refresh-translations.sh runs second and owns the catalogues end to end:
it reads the release version from Cargo.toml, initialises the submodule
if needed, runs the translation pass, commits and pushes the catalogues,
cuts and pushes the v{version} tag, and stages the main-repo gitlink
pinned to that tag. It no longer invokes bump-version.sh.
Rewrite runbook Step 2 and the docs-and-translations release workflow to
run bump-version.sh then refresh-translations.sh in that order, with the
reason refresh reads the version the bump set. Link the seven-step
overview list to its Step headings, fold the translations content into a
single procedural home in Step 2, and leave Step 7 a deploy-time pointer.
Add a README note on cloning with the docs submodule.
…bs#8169) Closes zeroclaw-labs#8143. - e4be90d feat(docs): move translation catalogues to git submodule - 8683b81 ci(docs): validate translation submodule at pinned commit - dcd32a3 fix(ci): pin checkout action and compare complete msgids in parity gate - 3c4c671 fix(ci): compare full gettext entries incl. embedded-newline msgids - 6a6918d feat(release): script docs translation refresh into one command - ab7d9f4 refactor(release): split translation refresh from version bump
Summary
master(all contributions).podocumentation catalogues out of the main tree into the new dedicated submodule zeroclaw-labs/zeroclaw-docs-translations, mounted atdocs/book/po, per RFC RFC: Move translated .ftl and .po files into a git submodule #7184 (.po-only scope). This drops ~208k generated lines from the main tree and decouples translation-cache churn from core PRs.book.tomlis unchanged: the submodule mounts at the gettext preprocessor's defaultpo/directory, socargo mdbook sync,cargo mdbook build, and the preprocessor all resolve catalogues with no path override.scripts/release/bump-version.shpins the submodule gitlink to the matchingv{version}tag at release time, falling back to the submodule'smainwith a warning when the tag is not yet cut. The in-catalogue version-literal sweep moves to the submodule's own release; the main-tree sweep no longer touches.pofiles..github/workflows/docs-deploy.ymlgainssubmodules: recursiveon its build checkout. Rustbuild/test/clippy/checkjobs are untouched and need no submodule.git submodule update --init docs/book/po) and the two-repo release flow (commit + tag the submodule, then bump pins the gitlink)..ftlcatalogues,FTL_CATALOGSincrates/zeroclaw-config/src/schema.rs,include_str!/load_ftl_from_disksites, orlocales.toml. The.ftlmove from RFC RFC: Move translated .ftl and .po files into a git submodule #7184 is explicitly excluded from this issue.messages.potand*.failures.logstay regenerated, gitignored artifacts in both repos.git submodule update --init docs/book/po; the common Rust dev loop is unaffected.docs-deployis the only CI job that checks out the submodule.docs,ci,domain:architecture,risk: medium,size: MValidation Evidence (required)
This is a docs + CI + release-script + repo-structure change with no Rust source edits, so the battery is the docs gate, a shell syntax check, a Rust compile to prove the dev loop stays submodule-free, and end-to-end clone tests.
cargo check --workspacewith the submodule fully deinited (docs/book/poempty) → clean, provingcargo build/test/clippydo not require the submodule.git clone --depth 1 --recurse-submodules --shallow-submodules) registered the submodule from.gitmodules(HTTPS URL) and checked out all four.pofiles at the pinned commit.zeroclaw-docs-translationsdirectly →es.po fr.po ja.po zh-CN.popresent.book.tomlneeds nopo-diroverride (defaultpo/== mount point).mdbook buildof the translated locales was not run locally (requires installingmdbook-i18n-helpers+gettexttoolchain); thedocs-deployjob exercises that path in CI with the newsubmodules: recursivecheckout, and theValidate .po formatstep there will fail loudly if the submodule is absent.Security & Privacy Impact (required)
git submodulefetch is a dev/CI-time clone of a public repo)Compatibility (required)
git submodule update --init docs/book/poonce before building the docs. The Rust dev loop needs no action. Fresh clones should usegit clone --recurse-submodules.Rollback (required for
risk: medium)git revert <merge-sha>, then restore the four.pofiles intodocs/book/pofrom the submodule's tagged commit and drop.gitmodules.docs-deploy"Validate .po format" step finds an emptydocs/book/po(no*.pomatched), orcargo mdbook buildrenders translated locales as empty because the submodule was not initialised.