Wenyan Book Video

Pipeline that converts the Wenyan Book (《文言陰符》) into narrated video chapters with generated audio, transcripts, translations, and Remotion compositions.

Repository Structure

Path	Purpose
`book/`	Git submodule tracking wenyan-lang/book; keep it synced instead of editing files directly.
`processor/`	Python (uv-managed) tools that parse chapters, segment sentences, translate, transcribe, and synthesize audio.
`renderer/`	Remotion project plus scripts that turn processor output into React components and videos.
`renderer/public/**`	Generated JSON, transcripts, and audio assets that the renderer consumes.
`renderer/src/generated/`	Segment metadata emitted by `bun run scripts/generate-segments.ts`.
`transcription-utils/`	Helpers shared by multiple processor scripts.
`uploader/`	Python tools to upload generated videos to YouTube with metadata and thumbnails.

Large media trees such as renderer/public/audios/, renderer/public/transcripts/build/, and renderer/src/generated/segments-*.ts are reproducible—regenerate them via the pipeline instead of editing by hand.

End-to-End Pipeline

flowchart LR
    subgraph Source
        BMD[book/*.md\nWenyan Book text]
    end

    subgraph Processor
        PM[parse-markdown.py → renderer/public/chapters]
        BS[build-sentences.py → renderer/public/sentences]
        SG[segment-text.py → renderer/public/segments]
        TR[translate.py → renderer/public/translations]
        TS[transcribe.py → renderer/public/transcripts]
        BS2[build-segmented-transcripts.py → renderer/public/transcripts/build]
        SY[synthesize.py → renderer/public/audios]
        VC[voice-change.py → renderer/public/audios/female]
        TT[transcribe-titles.py → renderer/public/transcripts/audio-n.txt]
        TA[synthesize-titles.py → renderer/public/audios/audio-n.mp3]
        
        subgraph Utils
            FG[fill-segment-gaps.py]
            RS[reconstruct_segment_transcripts.py]
        end
    end

    subgraph Renderer
        GS[bun run scripts/generate-segments.ts → renderer/src/generated/segments-*.ts]
        RV[bun run remotion render → final videos]
    end

    subgraph Uploader
        UP[python uploader/upload.py → YouTube]
    end

    BMD --> PM --> BS --> SG --> TR --> TS --> BS2 --> SY --> VC --> GS --> RV --> UP
    PM --> TT --> TA --> GS
    TS -.-> RS -.-> BS2
    SG -.-> FG -.-> SG

Consult processor/README.md for script-by-script details.

Getting Started

Clone & Install

git clone <repo-url>
cd wenyan-book-video
git submodule update --init --recursive
bun install

Always use bun (bun install, bun run …) for JavaScript/TypeScript workspaces.
Python tooling inside processor/ and uploader/ is managed by uv. Run scripts with uv run … so imports resolve against the package.

Development Environments

Nix Flake shell (recommended)

nix develop          # or allow direnv and just `cd` into the repo

The flake provides Python 3.13, uv, Bun, Node 20, ffmpeg, sox, espeak-ng, git, and other CLI dependencies. Python packages are synced automatically via uv sync when entering the shell.

Manual setup

Install Bun and run bun install at the repo root.
Install uv (pip install uv or via package manager) and run uv sync inside processor/ and uploader/.
Ensure system packages ffmpeg, sox, and espeak-ng are available for the audio pipeline.

Working the Pipeline

Prepare chapter data
- Run processor scripts in order (parse-markdown.py, build-sentences.py, segment-text.py, translate.py, transcribe.py, build-segmented-transcripts.py). Use uv run for each, e.g. cd processor && uv run python segment-text.py. Use -c <chapter> for build-segmented-transcripts.py to limit scope.
- Titles follow transcribe-titles.py → synthesize-titles.py.
- Helper scripts: fill-segment-gaps.py to fill missing segments, reconstruct_segment_transcripts.py to rebuild transcripts from segments.
Synthesize audio
- uv run python synthesize.py produces raw TTS chunks under renderer/public/audios/. Use -c <chapter> to limit to one chapter.
- uv run python voice-change.py transforms them (e.g., female timbre) into renderer/public/audios/female/.
Generate renderer segments
- From renderer/, run bun run scripts/generate-segments.ts. This snapshots current processor artifacts into renderer/src/generated/segments-*.ts consumed by the Remotion components.
Render videos
- Still in renderer/, execute bun run remotion render. Confirm that the renderer/src/generated/segments-*.ts files and audio/transcript assets exist beforehand.
Upload to YouTube
- Configure uploader/config.toml and uploader/client_secrets.json.
- Run uv run uploader/upload.py <chapter_id> to upload the rendered video.
- Supports thumbnails (uploader/thumbnails/{id}.png) and auto-playlist addition.

Handy Commands

segment-text – launches marimo for processor/segment-text.py with live reload.
voice-change – launches marimo for processor/voice-change.py.
main – convenience entry point defined in the dev shell for orchestrating the processor pipeline.

All three are available inside nix develop (or via direnv).

Submodule maintenance

Update the Wenyan text submodule without touching its tracked files directly:

git submodule update --remote book
git add book
git commit -m "Update Wenyan book submodule"

Troubleshooting

Missing Python deps – cd processor && uv sync (or cd uploader && uv sync) rehydrates from uv.lock.
C library/FFmpeg issues – make sure you are inside nix develop or have the binaries installed locally.
Regenerating assets – delete problematic items under renderer/public/audios/, renderer/public/transcripts/build/, or renderer/src/generated/ and rerun the relevant scripts instead of editing the outputs.

License

This project is released under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 418 Commits
book @ d73bb7b		book @ d73bb7b
processor		processor
renderer		renderer
segmenter		segmenter
transcription-utils		transcription-utils
uploader		uploader
.envrc		.envrc
.gitignore		.gitignore
.gitmodules		.gitmodules
AGENTS.md		AGENTS.md
LICENSE		LICENSE
README.md		README.md
bun.lock		bun.lock
flake.lock		flake.lock
flake.nix		flake.nix
package.json		package.json
shell.nix		shell.nix
wenyan-book.pdf		wenyan-book.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Wenyan Book Video

Repository Structure

End-to-End Pipeline

Getting Started

Clone & Install

Development Environments

Working the Pipeline

Handy Commands

Submodule maintenance

Troubleshooting

License

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Wenyan Book Video

Repository Structure

End-to-End Pipeline

Getting Started

Clone & Install

Development Environments

Working the Pipeline

Handy Commands

Submodule maintenance

Troubleshooting

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages