Single source of truth for AI agents and contributors.
RAG Tech — biweekly tech podcast. Handle: @ragtechdev on Spotify · YouTube · Apple Podcasts · Instagram · TikTok · LinkedIn.
| Name | Role | Image |
|---|---|---|
| Natasha | Software Engineer | public/assets/team/natasha.PNG |
| Saloni | Software Developer | public/assets/team/saloni.PNG |
| Victoria | Solutions Engineer | public/assets/team/victoria.PNG |
All cohost images have transparent backgrounds.
- Config:
public/brand.json— being migrated tobrands/ragtech/brand.json(Phase 0.5) - Logo:
public/assets/logo/transparent-bg-logo.png - Font: Nunito (variable, loaded via
remotion/loadFonts.ts) - Mascot: Techybara (capybara) — PNGs in
public/assets/techybara/ - Intro/outro music:
public/sounds/intro-outro-music.mp3 - Background music:
public/sounds/jazz-cafe-music.mp3
Raw audio/video
↓ [sync] FFT cross-correlation → synced-output-{N}.mp4 (one per angle)
↓ [transcribe] Whisper.cpp → token-level timestamps → transcript.raw.json
↓ [diarize] Speaker turn detection → diarization.json
↓ [assign-speakers] Labels segments with speaker names
↓ [align] WhisperX forced alignment → refines t_dtw, populates t_end
↓ [edit-transcript] Merges phrases → sentences → transcript.doc.txt + transcript.json
Human edits doc (cuts, corrections, hooks, camera cues)
↓ [merge-doc] Applies doc edits → transcript.json
↓ [setup-camera] Face detection per angle → camera-profiles.json
↓ Remotion transcript.json + camera-profiles.json → composed video
Intermediate files: public/transcribe/output/. Synced video(s): public/sync/output/.
Entry point: scripts/wizard.js (60KB procedural — being replaced with typed DAG runner in Phase 2).
meta
videoSrc?: string path relative to /public (overrides composition src prop)
videoSrcs?: string[] all angle paths (multi-angle); used by setup-camera
videoStart?: number source seconds; segments before are excluded
videoEnd?: number source seconds; segments after are excluded
fps: 60
outputAspect?: "9:16" short-form only
segments[]
id, start, end source-video timestamps in seconds
speaker display name (e.g. "Natasha")
text human-readable sentence
cut: boolean true = entire segment removed
tokens[]
t_dtw: number word start time (WhisperX-aligned or Whisper t_dtw)
t_end?: number word end time (forced alignment only)
text: string
cut: boolean
cuts: TimeCut[] [{from, to}] intra-segment ranges to skip
synthetic?: boolean true when created by a > SPEAKER split in the doc; absent on raw segments
hook?: boolean when true, prepended as hook/teaser before main
hookFrom?, hookTo? clip bounds within the segment (seconds)
cameraCues[] explicit camera shot overrides (> CAM directives in doc)
token.t_end is populated only after forced alignment. Without it, deriveCuts falls back to CUT_START_BIAS heuristic.
{
"sourceWidth": 1920, "sourceHeight": 1080,
"outputWidth": 1920, "outputHeight": 1080,
"wideViewport": { "cx": 0.5, "cy": 0.5, "w": 1, "h": 1 },
"angles": {
"angle1": { "videoSrc": "sync/output/synced-output-1.mp4",
"sourceWidth": 1920, "sourceHeight": 1080 }
},
"speakers": {
"Natasha": {
"label": "Natasha",
"angleName": "angle1",
"closeupViewport": { "cx": 0.3, "cy": 0.4, "w": 0.35, "h": 0.35 },
"portraitCx": 0.3
}
}
}CropViewport: cx/cy = normalised centre (0–1), w/h = crop dimensions (0–1).
Full range [videoStart, lastSegment.end] plays continuously. Cuts are opt-in:
| Source | Mechanism |
|---|---|
| Entire segment removed | segment.cut = true |
| Intra-segment word/phrase | segment.cuts[] (from {curly braces} in doc) |
| Inter-segment silence | merge-doc:cut-pauses writes silence ranges into cuts[] |
No implicit cuts. Gaps between segments play as silence unless merge-doc:cut-pauses is run.
| ID | Component | Notes |
|---|---|---|
ragTechVodcast |
MyComposition |
Full episode: hooks → intro → main video |
PodcastIntro |
PodcastIntroComposition |
7 s intro (420 frames @ 60 fps) |
ShortFormClip |
ShortFormClip |
1080 × 1920 @ 60 fps portrait |
transcript.json
→ getActiveSegments() filter by meta.videoStart/videoEnd
→ buildSections() → {hookSections[], mainSections[]}
→ SegmentPlayer / CameraPlayer
→ SectionGroupPlayer OffthreadVideo + trimBefore per section
SegmentPlayer—buildSections,buildMainSubClips, jump-cut engine. At framef:sourceFrame = S(f)via summed section durations.CameraPlayer—buildCameraShots,sourceToOutputFrame, multi-angle viewport. Stacks oneSegmentPlayerper unique angle; active angleopacity:1, othersopacity:0, muted.SectionGroupPlayer— rendersOffthreadVideowithtrimBefore/trimAfterper section.HookOverlay— hook karaoke captions, Techybara mascot, hook timing.CaptionOverlay— short-form full-duration captions.OverlayRenderer— dispatchesGraphicsCue→ component; currently usesReact.FC<any>(fixed in Phase 5).
Viewport transform:
scale = max(outW / (srcW × vp.w), outH / (srcH × vp.h))
tx = (0.5 - vp.cx) × 100%
ty = (0.5 - vp.cy) × 100%
→ CSS: scale(${scale}) translate(${tx}%, ${ty}%)
Fixed —hookClipEnd()has 4 separate implementations (CameraPlayer,SegmentPlayer,Composition,HookOverlay) — can disagree by 1–3 frames.remotion/lib/hookTiming.tsis now the single source of truth; all consumers import from it.buildCaptions()duplicated acrossHookOverlayandCaptionOverlay. Fix:remotion/lib/captions.ts.- No
OverlayErrorBoundary— overlay crash kills the composition. - No transcript validation on load.
| Constant | Value | File |
|---|---|---|
PAUSE_THRESHOLD |
0.8 s | edit-transcript.js |
WORD_DURATION_ESTIMATE |
0.4 s | edit-transcript.js |
CUT_START_BIAS |
1.0 | edit-transcript.js |
HOOK_TAIL_PAD_UNBOUNDED_SECONDS |
0.16 s | remotion/lib/hookTiming.ts |
HOOK_TAIL_PAD_BOUNDED_SECONDS |
0.02 s | remotion/lib/hookTiming.ts |
HOOK_BRIDGE_MAX_GAP_SECONDS |
1.0 s | remotion/lib/hookTiming.ts |
HOOK_END_FADE_FRAMES |
12 | SegmentPlayer.tsx |
DECLICK_FRAMES |
3 | SegmentPlayer.tsx |
MIN_WIDE_S |
1.5 s | CameraPlayer.tsx |
MAX_CLOSEUP_S |
10 s | CameraPlayer.tsx |
PERIODIC_WIDE_S |
45 s | CameraPlayer.tsx |
CUTAWAY_INTERVAL_S |
20 s | CameraPlayer.tsx |
CUTAWAY_DURATION_S |
3 s | CameraPlayer.tsx |
CUTAWAY_WIDE_ANGLE |
'angle3' |
CameraPlayer.tsx |
| File | Purpose | Refactor note |
|---|---|---|
remotion/Composition.tsx |
Root composition, duration calc, asset loading | Add transcript validation (Phase 5) |
remotion/lib/hookTiming.ts |
hookClipEnd(), getHookSubClips(), buildHookSections() — single source of truth for all hook clip boundary calculations |
— |
remotion/components/SegmentPlayer.tsx |
Jump-cut player, section builders | Extract captions (Phase 5) |
remotion/components/CameraPlayer.tsx |
Camera shots, multi-angle viewport (779 lines) | Extract cameraShots lib → <350 lines (Phase 6) |
remotion/components/HookOverlay.tsx |
Hook captions, Techybara (518 lines) | Extract captions.ts (Phase 5) |
remotion/components/OverlayRenderer.tsx |
Graphics cue dispatcher; uses CORE_TEMPLATE_MAP + getBrandOverlays(brand.id) |
Remove remaining brand hardcoding (Phase 0.5 Steps 6–7) |
remotion/lib/brandRegistry.ts |
getBrandOverlays(brandId) — brand overlay registry; ragtech imports from current paths |
Switch to require() form after overlays move to brands/ragtech/components/ (Phase 0.5 Step 3) |
remotion/types/transcript.ts |
Segment, Token, TimeCut, Transcript |
Will import from scripts/types/ (Phase 6) |
remotion/types/camera.ts |
CameraProfiles, CameraShot, CropViewport |
Will import from scripts/types/ (Phase 6) |
remotion/types/brand.ts |
Brand design tokens + extended identity/hosts/mascot/audio; id: string field required by registry |
Move overlays + parameterize (Phase 0.5 Steps 3, 6–7) |
scripts/config/project.ts |
ProjectFile type, readProject/writeProject, ProjectNotFoundError; typed PipelineParams (timestamp_offset, diarization_seed, num_speakers, sync_window_seconds) |
Sprint 1 Issues #1, #3 |
scripts/config/metadata.js |
stampMetadata(artifact, cwd?) — prepends schema_version + tool_versions to any JSON artifact; reads tool versions from .ragtech/project.json if present |
Sprint 1 Issue #4 |
scripts/edit-transcript.js |
Sentence merging, deriveCuts, doc generation |
Migrate to .ts (Phase 3) |
scripts/sync/AudioSyncer.js |
FFT sync, syncMultiple |
Add FFT tie-breaking (Phase 0) |
scripts/wizard.js |
Interactive pipeline runner (60KB) | Replace with DAG runner (Phase 2) |
scripts/camera/setup-camera.js |
Frame extraction, face detection, angles.json |
|
app/camera/page.tsx |
Camera GUI (face box editor, angle tabs) | |
app/editor/page.tsx |
Transcript editor | Add PreviewPlayer + scroll-sync (Phase 8) |
app/editor/Timeline.tsx |
Timeline component (630 lines) | Decompose + add waveform (Phase 8) |
app/components/AutoCarouselForm.tsx |
Carousel generator (810 lines) | Decompose (Phase 8) |
app/context/AuthContext.tsx |
Auth context with hardcoded credentials | Fix (Phase 7) |
vscode-transcript-language/src/extension.js |
VSCode transcript extension | Add Wrap-in-cut command (Phase 0.5) |
Entry: npm run shorts:wizard. Two paths:
Path A — clip from longform: public/edit/transcript.json must exist. User selects time range; wizard creates clips without re-running sync/transcription.
Path B — dedicated portrait recording: Own sync/transcribe/align pipeline rooted at public/shorts/.
Output: public/shorts/short-{id}/transcript.json with meta.outputAspect: "9:16", meta.videoStart/videoEnd, meta.parentTranscript (Path A only). Segments keep absolute timestamps from source.
Scripts: scripts/shorts/extract-short-doc.js, scripts/shorts/merge-short-doc.js, scripts/shorts/portrait-camera-setup.js.
Master doc: docs/PRODUCTION_REFACTOR_PLAN.md — read in full before starting any phase.
Core problems being fixed: non-deterministic output, no project file, no pipeline DAG, no type safety across scripts, hookClipEnd() bug in 4 files, brand content hardcoded in TypeScript, no GPU acceleration.
| Phase | Branch | Goal |
|---|---|---|
| 0 | refactor/p0-project-file |
.ragtech/project.json, deterministic runs, content-addressed artifacts |
| 0.5 | refactor/p0-brand |
brands/ragtech/ directory, extended Brand type, brand abstraction |
| 1 | refactor/p1-hardware |
scripts/config/hardware.ts, GPU-accelerated FFmpeg encode/decode |
| 2 | refactor/p2-dag |
Typed pipeline DAG replaces wizard.js; AI hook suggestions; captions export |
| 3 | refactor/p3-scripts-ts |
Migrate 53 .js scripts → .ts strict |
| 4 | refactor/p4-scripts-tests |
Unit tests, ≥60% coverage on scripts/**/*.ts |
| 5 | refactor/p5-remotion-correctness |
Fix hook timing bug, add error boundaries, transcript validation |
| 6 | refactor/p6-remotion-arch |
BrandContext, extract cameraShots.ts, merge duplicate components |
| 7 | refactor/p7-app-api |
Remove hardcoded credentials, withErrorHandler middleware |
| 8 | refactor/p8-app-components |
PreviewPlayer, waveform, scroll-sync, decompose large components |
| 9 | refactor/p9-polish |
ESLint no-explicit-any, dead code removal |
Phases 0–4 (scripts) and 5–6 (Remotion) can proceed on separate branches in parallel.
.gitignore rule: Every directory under .ragtech/ (and any other runtime-generated directory such as runs/, cache/, output artifact dirs) must be listed in .gitignore. These hold generated/binary data, not source code, and must never be committed.
.ragtech/
project.json episode metadata, tool versions, run parameters
artifacts/{sha256}.mp4 content-addressed artifact store
runs/{timestamp}/ run logs per pipeline stage
brands/
ragtech/
brand.json extended Brand config (identity, hosts, mascot, audio)
assets/ team/, logo/, techybara/, episodes/
sounds/
components/ brand-specific overlays (AIOverlay, RagtechOverlay, etc.)
scripts/
types/ shared TS interfaces (imported by scripts + remotion)
config/ project.ts, hardware.ts, paths.ts, parseArgs.ts, artifacts.ts
pipeline/ dag.ts, runner.ts, nodes/{sync,transcribe,...}.ts
remotion/
lib/ hookTiming.ts, captions.ts, cameraShots.ts, constants.ts
context/ BrandContext.tsx
types/overlayProps.ts discriminated union for all overlay prop types
Standards doc: docs/TESTING_STANDARDS.md — read before writing any test.
Three test types, three runners:
| Type | Location | Runner | Command |
|---|---|---|---|
| Unit (scripts/pipeline) | scripts/**/*.test.ts |
Jest node project |
npm test |
| Unit (React components) | app/**/*.test.tsx, remotion/**/*.test.tsx |
Jest react project |
npm run test:react |
| Integration | tests/integration/**/*.test.ts |
Jest node project |
npm run test:integration |
| E2E | e2e/**/*.test.ts |
Playwright | npm run test:e2e |
First-time Playwright setup: npx playwright install chromium
Every new pure function gets a unit test. Every new React component gets a smoke render test. E2E tests are required for Phase 8 user-facing features. Coverage thresholds are defined per phase in the standards doc.
Key placement rule: If a test uses os.tmpdir(), fs.mkdtempSync, real file reads/writes, or spawns a process, it is an integration test — place it in tests/integration/, not next to the source file in scripts/. Unit tests in scripts/**/*.test.ts must have no real I/O.
For any multi-step task:
- Write an implementation doc in
docs/implementation-guides/before starting - Each step must have a Status check — a single command or file-existence test that confirms it is done
- One isolated commit per step; commit message slug must match the doc heading exactly
- Never combine steps — isolation enables partial recovery
Resuming interrupted work:
git log --oneline— see completed commits- Open the relevant
docs/implementation-guides/file - Match the last commit slug against doc headings → continue from next unstarted step
Per-PR checklist:
- Behaviour parity — smoke-test relevant
npm runscripts or Remotion compositions npm run testpassestsc --noEmitpasses (where applicable)- Scope discipline — only files listed in the implementation doc touched
- No new hardcoded paths in
scripts/; no new duplicated timing constants inremotion/ - (Remotion phases) —
remotion studiolaunches; frame comparison againstdocs/render-baselines/ - Type shapes match spec — if a type is defined in
docs/PRODUCTION_REFACTOR_PLAN.md, the implementation must use the exact field names, types, and required/optional status from the spec. Downstream phase steps depend on specific field names by reference. If the spec must change, update it first and get agreement before diverging in code. - New runtime fields must update the TypeScript type — any field added to a JSON artifact (e.g.,
transcript.json,camera-profiles.json,.ragtech/project.json) by a.jsscript must also be declared in the corresponding TypeScript type inremotion/types/orscripts/types/. Optional fields (present only on some objects) use?: type. Also add a one-line entry to the schema table in this file under the relevant artifact heading.