Tα»± Δα»ng tαΊ‘o video tin tα»©c ngαΊ―n 9:16 (~60s) tiαΊΏng Viα»t cho TikTok / YouTube Shorts / Instagram Reels tα»« URL bΓ i bΓ‘o hoαΊ·c file
.txt.Auto-generate Vietnamese 9:16 short news videos (~60s) for TikTok / YouTube Shorts / Instagram Reels from a news URL or
.txtfile.
Auto News Video lΓ mα»t dα»± Γ‘n mΓ£ nguα»n mα» giΓΊp bαΊ‘n biαΊΏn bαΊ₯t kα»³ bΓ i bΓ‘o cΓ΄ng nghα» tiαΊΏng Viα»t nΓ o thΓ nh mα»t video ngαΊ―n motion-graphic chuyΓͺn nghiα»p chα» vα»i 1 lα»nh duy nhαΊ₯t trong Claude Code.
Pipeline tα»± Δα»ng lΓ m cΓ‘c bΖ°α»c:
- Δα»c URL bΓ i bΓ‘o (hoαΊ·c file
.txt) vΓ phΓ’n tΓch nα»i dung - Sinh kα»ch bαΊ£n JSON vα»i 6 loαΊ‘i template visual khΓ‘c nhau (hook, comparison, stat-hero, feature-list, callout, outro) β chα»n theo nα»i dung bΓ i viαΊΏt
- Tα»ng hợp giα»ng Δα»c tiαΊΏng Viα»t qua LucyLab hoαΊ·c ElevenLabs
- Render video MP4 vα»i HyperFrames (Puppeteer + GSAP + FFmpeg) β phong cΓ‘ch studio shell + animation hiα»n ΔαΊ‘i
- XuαΊ₯t kΓ¨m script.txt vΓ voice.mp3 Δα» bαΊ‘n import vΓ o CapCut Pro thΓͺm caption / nhαΊ‘c nα»n
- β Phong cΓ‘ch HeyGen-quality: persistent brand shell (icon, channel, handle), grain texture, gradient navy + cyan + purple
- β 6 loαΊ‘i scene template tα»± pick theo nα»i dung β khΓ΄ng rαΊp khuΓ΄n
- β Δa nhΓ cung cαΊ₯p TTS: LucyLab (giα»ng Viα»t tα»± nhiΓͺn + SRT free) hoαΊ·c ElevenLabs (Δa ngΓ΄n ngα»―, nhiα»u voice library)
- β
TΓch hợp Claude Code skill β chα» cαΊ§n
/create-news-video <url>lΓ xong - β Mα» rα»ng Δược: schema rΓ΅ rΓ ng, code modular, cΓ³ test suite
| Lα»p | CΓ΄ng nghα» |
|---|---|
| Runtime | Node.js β₯ 22, TypeScript 5+, ESM |
| Render engine | HyperFrames (Puppeteer + GSAP + FFmpeg) |
| TTS providers | LucyLab.io (JSON-RPC, Vietnamese cloning) hoαΊ·c ElevenLabs (REST, multilingual) |
| Validation | Zod (discriminated union schema) |
| HTTP | axios + nock (mocking) |
| Testing | Vitest |
| Audio | FFmpeg + ffprobe (mix, concat vα»i silence) |
| AI/Skill | Claude Code skill (/create-news-video) |
| Visual blocks | HyperFrames registry: grain-overlay, shimmer-sweep, tiktok-follow |
| Fonts | Inter + Anton (Google Fonts) |
HyperFrames lΓ framework HTML-to-video do HeyGen phΓ‘t triα»n vΓ mΓ£ nguα»n mα». KhΓ‘c vα»i cΓ‘ch dΓΉng After Effects hay Premiere thα»§ cΓ΄ng, HyperFrames cho phΓ©p bαΊ‘n viαΊΏt video bαΊ±ng HTML/CSS/JS rα»i render thΓ nh MP4 chαΊ₯t lượng cao mα»t cΓ‘ch deterministic (cΓΉng input β cΓΉng output frame-by-frame).
CΓ‘ch nΓ³ hoαΊ‘t Δα»ng trong dα»± Γ‘n:
- Pipeline sinh ra mα»t file
index.htmlchα»©a toΓ n bα» scenes + GSAP timeline - HyperFrames spawn headless Chrome (Puppeteer) Δα» load file ΔΓ³
- Capture tα»«ng frame α» ΔΓΊng timestamp (30fps Γ 60s = 1800 frames)
- Encode tαΊ₯t cαΊ£ frames + audio thΓ nh MP4 dΓΉng FFmpeg
TαΊ‘i sao chα»n HyperFrames?
- β
CΓ³ sαΊ΅n 50+ pre-built blocks trong registry (transitions, social cards, data viz, kinetic typography...) β dΓΉng
npx hyperframes add <name> - β GSAP timeline ΔΓ£ Δược tΓch hợp sαΊ΅n cho animations mượt mΓ
- β Skill-friendly cho AI agent β Claude/GPT cΓ³ thα» tα»± sinh composition HTML
- β
Lint built-in (
npx hyperframes lint) phΓ‘t hiα»n lα»i composition trΖ°α»c khi render - β Aspect ratio 9:16 native β sinh ra cho short-form video
CΓ‘c blocks/components dΓΉng trong dα»± Γ‘n:
grain-overlayβ film grain texture xuyΓͺn video (cαΊ£m giΓ‘c "analog warmth")shimmer-sweepβ light pass animation cho text headlinetiktok-followβ outro CTA card (ΔΓ£ sαΊ΅n 1080Γ1920)
| TiΓͺu chΓ | LucyLab | ElevenLabs |
|---|---|---|
| Giα»ng tiαΊΏng Viα»t | βββββ Tα»± nhiΓͺn (voice cloning) | ββββ Tα»t (multilingual) |
| Chi phΓ | RαΊ» (~25k VND / 1M kΓ½ tα»±) | ΔαΊ―t hΖ‘n (~$5 / 30k kΓ½ tα»±) |
| Voice library | Tα»± clone giα»ng | 1000+ voices cΓ³ sαΊ΅n |
| API style | JSON-RPC async (poll) | REST sync (instant) |
| SRT subtitle | β Free, kΓ¨m theo response | β KhΓ΄ng cΓ³ |
| Concurrency | 1 export/account | Parallel OK |
| Languages khΓ‘c | β Chα» tiαΊΏng Viα»t | β 30+ ngΓ΄n ngα»― |
KhuyαΊΏn nghα»:
- π»π³ Chα» lΓ m video tiαΊΏng Viα»t β chα»n LucyLab (rαΊ» + giα»ng tα»± nhiΓͺn + cΓ³ SRT)
- π LΓ m Δa ngΓ΄n ngα»― hoαΊ·c cαΊ§n voice library lα»n β chα»n ElevenLabs
- π KhΓ΄ng chαΊ―c β bαΊ―t ΔαΊ§u vα»i LucyLab, Δα»i sang ElevenLabs sau (chα» cαΊ§n Δα»i
TTS_PROVIDERtrong.env.local)
Zod lΓ TypeScript-first schema library. Trong project nΓ y, Zod ΔαΊ£m bαΊ£o script.json (do Claude sinh) luΓ΄n ΔΓΊng cαΊ₯u trΓΊc trΖ°α»c khi pipeline chαΊ‘y.
// Discriminated union: 6 loαΊ‘i template, mα»i loαΊ‘i cΓ³ data shape khΓ‘c nhau
const TemplateData = z.discriminatedUnion("template", [
HookData, ComparisonData, StatHeroData, FeatureListData, CalloutData, OutroData
]);Lợi Γch:
- PhΓ‘t hiα»n ngay nαΊΏu Claude sinh script sai (vd:
template: "stat"khΓ΄ng tα»n tαΊ‘i) β fail Step 1 vα»i error message rΓ΅ rΓ ng - TypeScript types Δược suy ra tα»± Δα»ng tα»« Zod schema β composer khΓ΄ng cαΊ§n khai bΓ‘o type lαΊ‘i
- Schema = source of truth cho cαΊ£ validation runtime + type compile-time
Project tΓch hợp vα»i Claude Code qua skill markdown ΔαΊ·t tαΊ‘i .claude/skills/create-news-video/SKILL.md. Skill nΓ y hΖ°α»ng dαΊ«n Claude:
- WebFetch URL bΓ i bΓ‘o
- PhΓ’n tΓch nα»i dung tiαΊΏng Viα»t
- Pick template phΓΉ hợp cho tα»«ng scene (comparison nαΊΏu cΓ³ "vs", stat-hero nαΊΏu cΓ³ sα» liα»u...)
- Sinh
script.jsonΔΓΊng schema - Run pipeline qua Bash
Ζ―u Δiα»m: bαΊ‘n chα» cαΊ§n gΓ΅ /create-news-video <url> β Claude tα»± lΓ m hαΊΏt phαΊ§n "creative" (viαΊΏt kα»ch bαΊ£n tiαΊΏng Viα»t, chα»n template, viαΊΏt cΓ’u hook hαΊ₯p dαΊ«n). PhαΊ§n "deterministic" (gα»i API, render) do Node CLI lo.
Vitest (ESM-native, replacement cho Jest) cho 35 unit tests:
- Schema validation tests vα»i fixtures (valid + invalid scripts)
- TTS client tests vα»i
nockmock HTTP (khΓ΄ng gα»i API thαΊt, khΓ΄ng tα»n quota) - Audio tools tests vα»i fixture mp3 files (440Hz/2s, 880Hz/3s sine waves)
- HTML composer snapshot test β ΔαΊ£m bαΊ£o output HTML khΓ΄ng bα» break khi refactor
ChαΊ‘y npm test Δα» verify mα»i thα»© work trΖ°α»c khi push.
FFmpeg + ffprobe Δược dΓΉng Δα»:
ffprobe: Δo duration cα»§a mp3 tα»«ng scene (Δα» compute timing trong composition)ffmpeg: concat cΓ‘c scene mp3 vα»i 0.3s silence gap βvoice.mp3cuα»iffmpeg(qua HyperFrames): encode 1800 frame PNG + audio thΓ nh MP4
PhαΊ£i cΓ³ trong PATH (ffmpeg -version). TrΓͺn Windows: winget install Gyan.FFmpeg.
| Mα»₯c | PhiΓͺn bαΊ£n | Ghi chΓΊ |
|---|---|---|
| Node.js | β₯ 22 | node --version |
| FFmpeg + ffprobe | bαΊ₯t kα»³ phiΓͺn bαΊ£n hiα»n ΔαΊ‘i nΓ o | trong PATH (ffmpeg -version) |
| Chrome / Chromium | bαΊ₯t kα»³ | HyperFrames Puppeteer cαΊ§n β sαΊ½ auto-download lαΊ§n ΔαΊ§u chαΊ‘y |
| Claude Code CLI | latest | cΓ i tαΊ‘i ΔΓ’y |
| TΓ i khoαΊ£n TTS | mα»t trong hai | LucyLab.io HOαΊΆC ElevenLabs |
# 1. Clone repo
git clone <repo-url> auto_create_video
cd auto_create_video
# 2. CΓ i dependencies
npm install
# 3. TαΊ‘o file env vΓ Δiα»n API key
cp .env.example .env.local
# β mα» .env.local, chα»n TTS provider (lucylab hoαΊ·c elevenlabs) vΓ Δiα»n key
# 4. Verify cΓ i ΔαΊ·t
node --version # β₯ 22
ffmpeg -version # in version OK
ffprobe -version
npm test # all tests pass (35 tests)Mα» .env.local vΓ chα»n mα»t trong hai provider:
- ΔΔng kΓ½ tαΊ‘i https://lucylab.io
- LαΊ₯y API key + voice ID (UUID 22 kΓ½ tα»±)
- ΔαΊ·t
TTS_PROVIDER=lucylab - β Ζ―u Δiα»m: giα»ng Viα»t tα»± nhiΓͺn (voice cloning), trαΊ£ kΓ¨m file SRT subtitle miα» n phΓ
β οΈ HαΊ‘n chαΊΏ: chα» 1 export/account Δα»ng thα»i (pipeline tα»± xα» lΓ½)
TTS_PROVIDER=lucylab
VIETNAMESE_API_KEY=sk_live_xxxxxxxxxxxxxxxxxxxx
VIETNAMESE_VOICEID=22charvoiceiduuidhere- ΔΔng kΓ½ tαΊ‘i https://elevenlabs.io
- LαΊ₯y API key tαΊ‘i https://elevenlabs.io/app/settings/api-keys
- Chα»n voice tαΊ‘i https://elevenlabs.io/app/voice-library (lαΊ₯y voice ID)
- ΔαΊ·t
TTS_PROVIDER=elevenlabs - β Ζ―u Δiα»m: Δa ngΓ΄n ngα»―, thΖ° viα»n voice phong phΓΊ, chαΊ₯t lượng cao
β οΈ HαΊ‘n chαΊΏ: ΔαΊ―t hΖ‘n LucyLab, khΓ΄ng cΓ³ SRT Δi kΓ¨m
TTS_PROVIDER=elevenlabs
ELEVENLABS_API_KEY=sk_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
ELEVENLABS_VOICE_ID=EXAVITQu4vr4xnSDxMaL
ELEVENLABS_MODEL_ID=eleven_multilingual_v2Mα»i video tα»± Δα»ng kαΊΏt thΓΊc vα»i mα»t TikTok follow card (slide tα»« dΖ°α»i lΓͺn + animation click follow) β chuαΊ©n HyperFrames style. TαΊ₯t cαΊ£ lΓ tΓΉy chα»n β defaults work out of the box:
TIKTOK_DISPLAY_NAME=CΓ΄ng nghα» 24h
TIKTOK_HANDLE=@congnghe24h
TIKTOK_FOLLOWERS=1.2M followers
# TΓΉy chα»n: URL αΊ£nh avatar TikTok thαΊt cα»§a bαΊ‘n (jpg/png, vuΓ΄ng, β₯256x256)
# NαΊΏu khΓ΄ng set β dΓΉng default `assets/avatar.jpg` ΔΓ£ bundled
TIKTOK_AVATAR_URL=https://example.com/your-avatar.jpgCΓ‘ch thay avatar:
- CΓ‘ch 1 (ΔΖ‘n giαΊ£n): thay file
assets/avatar.jpgbαΊ±ng αΊ£nh cα»§a bαΊ‘n (square, ~256x256+) - CΓ‘ch 2 (URL): set
TIKTOK_AVATAR_URLtrong.env.localβ pipeline tα»± download mα»i lαΊ§n render
Card xuαΊ₯t hiα»n α» giΓ’y thα»© ~1.6 trong scene outro vα»i chuα»i animation:
- Card slide tα»« bottom lΓͺn (0.5s)
- Hold ~0.9s Δα» ngΖ°α»i xem Δα»c
- Button "Follow" press-in + chuyα»n sang "Following β" vα»i mΓ u chuyα»n tα»« Δα» β xΓ‘m Δen
- Card stay visible ΔαΊΏn hαΊΏt video
Mα»i video tα»± cΓ³ sound effect mix layer vΓ o voice (volume thαΊ₯p, khΓ΄ng lαΊ₯n voice). KhΓ΄ng phαΊ£i random β pipeline pick theo loαΊ‘i template:
| Template | Default SFX | Khi nΓ o nghe |
|---|---|---|
hook |
transition/whoosh-soft |
ΔαΊ§u video, entrance dramatic |
comparison |
transition/swoosh |
Khi 2 cards xuαΊ₯t hiα»n |
stat-hero |
emphasis/ding |
LΓΊc sα»/% xuαΊ₯t hiα»n |
feature-list |
transition/pop |
Mα»i bullet appear |
callout |
alert/notification |
Statement quan trα»ng |
outro |
outro/tada |
Ending signature |
Library sounds ΔΓ£ sαΊ΅n trong assets/sfx/ (download tα»« myinstants.com, royalty-free use):
assets/sfx/
βββ transition/ (whoosh-soft, swoosh, pop)
βββ emphasis/ (ding, tick, chime)
βββ alert/ (notification)
βββ outro/ (tada)
Tα»± thΓͺm SFX cα»§a bαΊ‘n:
- Download mp3 tα»« myinstants.com hoαΊ·c pixabay sound effects
- ΔαΊ·t vΓ o folder phΓΉ hợp
assets/sfx/<category>/<name>.mp3 - Reference trong script.json:
"sfx": { "name": "transition/your-sound", "volume": 0.4 }
Smart override theo nα»i dung (Claude tα»± pick khi sinh script):
- "cαΊ£nh bΓ‘o", "rα»§i ro" β
alert/notification - "vượt", "kα»· lα»₯c", "xuαΊ₯t sαΊ―c" β
emphasis/chime - Disable cho scene ΔΓ³ β
"sfx": { "name": "none" }
Mα» Claude Code trong thΖ° mα»₯c project vΓ gΓ΅:
/create-news-video https://vnexpress.net/iphone-17-200mp
HoαΊ·c vα»i file .txt:
/create-news-video news/my-article.txt
Sau ~3-5 phΓΊt (TTS + render):
β Video: output/<slug>-<timestamp>/video.mp4 β video cuα»i
β Audio: output/<slug>-<timestamp>/voice.mp3 β Δα» import CapCut
β Script: output/<slug>-<timestamp>/script.txt β cho CapCut auto-caption
NαΊΏu ΔΓ£ cΓ³ sαΊ΅n script.json (vd Δα» debug hoαΊ·c tα»± viαΊΏt kα»ch bαΊ£n):
npm run pipeline -- output/<slug>-<timestamp>/script.jsonNαΊΏu ΔΓ£ cΓ³ voice files trong voice/ vΓ muα»n render lαΊ‘i visual:
npm run rerender -- output/<slug>-<timestamp>output/<slug>-<timestamp>/
βββ script.json # Input JSON (Claude sinh hoαΊ·c bαΊ‘n viαΊΏt tay)
βββ script.txt # Plain text cho CapCut auto-caption
βββ images/bg.jpg # og:image ΔΓ£ tαΊ£i (nαΊΏu cΓ³)
βββ voice/
β βββ scene-hook.mp3 # TTS tα»«ng scene
β βββ scene-hook.srt # SRT subtitle (chα» LucyLab)
β βββ scene-body-1.mp3
βββ voice.mp3 # Voice ΔΓ£ concat (cho CapCut)
βββ index.html # HyperFrames composition
βββ styles.css # CSS (copied tα»« template)
βββ animations.js # GSAP timeline (copied)
βββ hyperframes.json # HyperFrames project config
βββ meta.json # HyperFrames metadata
βββ video.mp4 # π Output cuα»i β 1080Γ1920 MP4
Mα»i video gα»m:
- Persistent shell xuyΓͺn suα»t (header brand
>_icon + tΓͺn channel + tag, footer handle TikTok, grain texture, gradient navy) - 5β8 scene vα»i template Δược Claude pick theo nα»i dung:
| Template | Khi nΓ o dΓΉng | VΓ dα»₯ |
|---|---|---|
hook |
Scene ΔαΊ§u tiΓͺn (3-5s) | "GPT 5.5" + "AI mαΊ‘nh nhαΊ₯t!" trΓͺn αΊ£nh og:image vα»i shimmer |
comparison |
Khi cΓ³ "X vs Y" / "vượt xa" / "so vα»i" | 2 cards: "GPT 5.4 75.1%" cyan vs "GPT 5.5 82.7%" purple (winner) |
stat-hero |
Khi cΓ³ sα»/% nα»i bαΊt | "1M" giant gradient + "Tokens / cα»a sα» ngα»― cαΊ£nh" |
feature-list |
Liα»t kΓͺ tΓnh nΔng | Card cΓ³ 4 bullets dot cyan glow |
callout |
Statement / cαΊ£nh bΓ‘o / quote | Glow purple card vα»i "CαΊ£nh bΓ‘o: AI tα»± chα»§ cαΊ§n cΓ’n nhαΊ―c" |
outro |
Scene cuα»i (3-5s) | "Theo dΓ΅i ngay" pill + "CΓ΄ng nghα» 24h" giant + underline gradient |
npm test # chαΊ‘y 35 unit tests
npm run test:watch # watch mode
npx tsc --noEmit # type-check khΓ΄ng build| Lα»i | CΓ‘ch khαΊ―c phα»₯c |
|---|---|
Missing VIETNAMESE_API_KEY / Missing ELEVENLABS_API_KEY |
Kiα»m tra .env.local ΔΓ£ cΓ³ vΓ ΔΓΊng TTS_PROVIDER |
hyperframes render failed |
ChαΊ‘y npx hyperframes render --help verify CLI; Chrome cΓ i chΖ°a? |
LucyLab polling timeout |
TΔng LUCYLAB_POLL_TIMEOUT_MS trong .env.local (default 120000ms) |
ElevenLabs 401 Invalid API key |
Verify key trΓͺn dashboard ElevenLabs, paste lαΊ‘i vΓ o .env.local |
Tα»ng duration ngoΓ i [48s, 72s] |
Re-trigger skill, hoαΊ·c chα»nh script.json viαΊΏt dΓ i/ngαΊ―n hΖ‘n |
ffprobe: command not found |
CΓ i FFmpeg: Windows winget install Gyan.FFmpeg, Mac brew install ffmpeg |
- Caption burned-in (forced alignment vα»i Whisper)
- Auto-select background music theo mood
- Multi-news compilation mode (
digest) - AI-generated images (Flux/Stable Diffusion khi khΓ΄ng cΓ³ og:image)
- Auto-upload TikTok / YouTube Shorts / Reels qua API
- Logo overlay tΓΉy chα»nh
- Multi-language (English, Chinese)
- Web UI standalone (khΓ΄ng cαΊ§n Claude Code)
MIT β sα» dα»₯ng tα»± do, fork tα»± do, ΔΓ³ng gΓ³p PR tα»± do.
Auto News Video is an open-source project that transforms any Vietnamese tech news article into a professional motion-graphic short video with a single command in Claude Code.
The pipeline automates the following steps:
- Reads the article URL (or
.txtfile) and analyzes the content - Generates a JSON script picking from 6 visual template types (hook, comparison, stat-hero, feature-list, callout, outro) based on content nature
- Synthesizes Vietnamese voice via LucyLab or ElevenLabs
- Renders MP4 video using HyperFrames (Puppeteer + GSAP + FFmpeg) with studio shell style and modern animation
- Exports script.txt and voice.mp3 alongside, ready to import into CapCut Pro for captions / BGM
- β HeyGen-quality look: persistent brand shell (icon, channel name, handle), grain texture, navy gradient with cyan + purple accents
- β 6 scene template types auto-picked by content β never monotonous
- β Multi-provider TTS: LucyLab (natural Vietnamese + free SRT) or ElevenLabs (multilingual, large voice library)
- β
Claude Code skill integration β just type
/create-news-video <url>and you're done - β Extensible: clean schema, modular code, full test suite
| Layer | Technology |
|---|---|
| Runtime | Node.js β₯ 22, TypeScript 5+, ESM |
| Render engine | HyperFrames (Puppeteer + GSAP + FFmpeg) |
| TTS providers | LucyLab.io (JSON-RPC, Vietnamese cloning) or ElevenLabs (REST, multilingual) |
| Validation | Zod (discriminated union schema) |
| HTTP | axios + nock (mocking) |
| Testing | Vitest |
| Audio | FFmpeg + ffprobe (mix, concat with silence) |
| AI/Skill | Claude Code skill (/create-news-video) |
| Visual blocks | HyperFrames registry: grain-overlay, shimmer-sweep, tiktok-follow |
| Fonts | Inter + Anton (Google Fonts) |
HyperFrames is an open-source HTML-to-video framework by HeyGen. Unlike traditional editors (After Effects, Premiere), HyperFrames lets you author videos with HTML/CSS/JS then render to high-quality MP4 deterministically (same input β identical output frame-by-frame).
How it works in this project:
- Pipeline generates an
index.htmlcontaining all scenes + GSAP timeline - HyperFrames spawns headless Chrome (Puppeteer) to load it
- Captures each frame at the precise timestamp (30fps Γ 60s = 1800 frames)
- Encodes all frames + audio into MP4 via FFmpeg
Why HyperFrames?
- β
50+ pre-built blocks in registry (transitions, social cards, data viz, kinetic typography...) β installable via
npx hyperframes add <name> - β GSAP timeline built-in for smooth animations
- β AI-agent friendly β Claude/GPT can author compositions in HTML
- β
Built-in lint (
npx hyperframes lint) catches composition errors before render - β 9:16 native β designed for short-form video
Blocks/components used in this project:
grain-overlayβ film grain texture throughout video (analog warmth feel)shimmer-sweepβ light pass animation for headline texttiktok-followβ outro CTA card (already 1080Γ1920)
| Criteria | LucyLab | ElevenLabs |
|---|---|---|
| Vietnamese voice | βββββ Natural (voice cloning) | ββββ Good (multilingual) |
| Cost | Cheap (~$1 / 1M chars) | Pricier (~$5 / 30k chars) |
| Voice library | Self-clone voices | 1000+ voices ready |
| API style | JSON-RPC async (poll) | REST sync (instant) |
| SRT subtitle | β Free, included in response | β Not provided |
| Concurrency | 1 export/account | Parallel OK |
| Other languages | β Vietnamese only | β 30+ languages |
Recommendation:
- π»π³ Vietnamese-only videos β use LucyLab (cheap + natural + with SRT)
- π Multilingual or need large voice library β use ElevenLabs
- π Not sure β start with LucyLab, switch later (just change
TTS_PROVIDERin.env.local)
Zod is a TypeScript-first schema library. In this project, Zod ensures script.json (generated by Claude) always has correct structure before pipeline runs.
// Discriminated union: 6 template types, each with different data shape
const TemplateData = z.discriminatedUnion("template", [
HookData, ComparisonData, StatHeroData, FeatureListData, CalloutData, OutroData
]);Benefits:
- Detects immediately if Claude generates wrong script (e.g.
template: "stat"doesn't exist) β fails Step 1 with clear error - TypeScript types are inferred from Zod schema β composer doesn't need to redeclare types
- Schema = single source of truth for both runtime validation + compile-time types
The project integrates with Claude Code via a skill markdown at .claude/skills/create-news-video/SKILL.md. This skill instructs Claude to:
- WebFetch the article URL
- Analyze Vietnamese content
- Pick the right template per scene (comparison if "vs", stat-hero if numbers...)
- Generate
script.jsonmatching schema - Run pipeline via Bash
Benefit: just type /create-news-video <url> β Claude handles all "creative" work (writing Vietnamese script, picking templates, crafting catchy hooks). The "deterministic" parts (API calls, rendering) are handled by Node CLI.
Vitest (ESM-native Jest replacement) provides 35 unit tests:
- Schema validation tests with fixtures (valid + invalid scripts)
- TTS client tests with
nockHTTP mocking (no real API calls, no quota wasted) - Audio tools tests with fixture mp3 files (440Hz/2s, 880Hz/3s sine waves)
- HTML composer snapshot test β ensures output HTML doesn't break on refactor
Run npm test to verify everything works before pushing.
FFmpeg + ffprobe is used to:
ffprobe: measure mp3 duration per scene (to compute timing in composition)ffmpeg: concat scene mp3s with 0.3s silence gap β finalvoice.mp3ffmpeg(via HyperFrames): encode 1800 PNG frames + audio into MP4
Must be in PATH (ffmpeg -version). On Windows: winget install Gyan.FFmpeg.
| Item | Version | Notes |
|---|---|---|
| Node.js | β₯ 22 | node --version |
| FFmpeg + ffprobe | any modern version | in PATH (ffmpeg -version) |
| Chrome / Chromium | any | required by HyperFrames Puppeteer β auto-downloaded on first render |
| Claude Code CLI | latest | install here |
| TTS account | one of two | LucyLab.io OR ElevenLabs |
# 1. Clone the repo
git clone <repo-url> auto_create_video
cd auto_create_video
# 2. Install dependencies
npm install
# 3. Create env file and fill in API keys
cp .env.example .env.local
# β open .env.local, choose TTS provider (lucylab or elevenlabs) and fill key
# 4. Verify installation
node --version # β₯ 22
ffmpeg -version # any version OK
ffprobe -version
npm test # all 35 tests should passOpen .env.local and pick one of two providers:
- Sign up at https://lucylab.io
- Get API key + voice ID (22-char UUID)
- Set
TTS_PROVIDER=lucylab - β Pros: natural Vietnamese voice (cloning), free SRT subtitle file included
β οΈ Cons: only 1 concurrent export per account (pipeline handles this)
TTS_PROVIDER=lucylab
VIETNAMESE_API_KEY=sk_live_xxxxxxxxxxxxxxxxxxxx
VIETNAMESE_VOICEID=22charvoiceiduuidhere- Sign up at https://elevenlabs.io
- Get API key at https://elevenlabs.io/app/settings/api-keys
- Browse voices at https://elevenlabs.io/app/voice-library (copy the voice ID)
- Set
TTS_PROVIDER=elevenlabs - β Pros: multilingual, rich voice library, high quality
β οΈ Cons: pricier than LucyLab, no SRT included
TTS_PROVIDER=elevenlabs
ELEVENLABS_API_KEY=sk_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
ELEVENLABS_VOICE_ID=EXAVITQu4vr4xnSDxMaL
ELEVENLABS_MODEL_ID=eleven_multilingual_v2Every video automatically ends with a TikTok follow card (slides up from bottom + follow-button click animation) β built from the official HyperFrames tiktok-follow block. All fields optional β defaults work out of the box:
TIKTOK_DISPLAY_NAME=CΓ΄ng nghα» 24h
TIKTOK_HANDLE=@congnghe24h
TIKTOK_FOLLOWERS=1.2M followers
# Optional: URL to your real TikTok avatar (jpg/png, square, β₯256x256)
# If not set β uses bundled default `assets/avatar.jpg`
TIKTOK_AVATAR_URL=https://example.com/your-avatar.jpgTo change the avatar:
- Option 1 (simple): replace
assets/avatar.jpgwith your image (square, ~256x256+) - Option 2 (URL): set
TIKTOK_AVATAR_URLin.env.localβ pipeline auto-downloads on every render
Card appears at ~1.6s into the outro scene with this animation sequence:
- Card slides up from bottom (0.5s)
- Hold ~0.9s for viewer to read
- "Follow" button press-in + transitions to "Following β" with redβdark-gray color shift
- Card stays visible until end of video
Every video automatically gets a sound effect layer mixed into the voice (low volume, doesn't overpower speech). Not random β the pipeline picks based on template type:
| Template | Default SFX | When you hear it |
|---|---|---|
hook |
transition/whoosh-soft |
Start of video, dramatic entrance |
comparison |
transition/swoosh |
When 2 cards appear |
stat-hero |
emphasis/ding |
When number/% reveals |
feature-list |
transition/pop |
Each bullet appears |
callout |
alert/notification |
Important statement |
outro |
outro/tada |
Ending signature |
Bundled sounds in assets/sfx/ (downloaded from myinstants.com):
assets/sfx/
βββ transition/ (whoosh-soft, swoosh, pop)
βββ emphasis/ (ding, tick, chime)
βββ alert/ (notification)
βββ outro/ (tada)
Add your own SFX:
- Download mp3 from myinstants.com or pixabay sound effects
- Drop into
assets/sfx/<category>/<name>.mp3 - Reference in script.json:
"sfx": { "name": "transition/your-sound", "volume": 0.4 }
Smart override by content (Claude auto-picks when generating script):
- "warning", "risk" β
alert/notification - "exceed", "record", "outstanding" β
emphasis/chime - Disable for this scene β
"sfx": { "name": "none" }
Open Claude Code in the project directory and type:
/create-news-video https://vnexpress.net/iphone-17-200mp
Or with a .txt file:
/create-news-video news/my-article.txt
After ~3-5 minutes (TTS + render):
β Video: output/<slug>-<timestamp>/video.mp4 β final video
β Audio: output/<slug>-<timestamp>/voice.mp3 β for CapCut
β Script: output/<slug>-<timestamp>/script.txt β for CapCut auto-caption
If you already have a script.json (e.g. for debugging or hand-written script):
npm run pipeline -- output/<slug>-<timestamp>/script.jsonIf voice files already exist in voice/ and you only want to re-render visuals:
npm run rerender -- output/<slug>-<timestamp>output/<slug>-<timestamp>/
βββ script.json # Input JSON (Claude-generated or hand-written)
βββ script.txt # Plain text for CapCut auto-caption
βββ images/bg.jpg # og:image downloaded (if available)
βββ voice/
β βββ scene-hook.mp3 # TTS per scene
β βββ scene-hook.srt # SRT subtitles (LucyLab only)
β βββ scene-body-1.mp3
βββ voice.mp3 # Concatenated voice (for CapCut)
βββ index.html # HyperFrames composition
βββ styles.css # CSS (copied from template)
βββ animations.js # GSAP timeline (copied)
βββ hyperframes.json # HyperFrames project config
βββ meta.json # HyperFrames metadata
βββ video.mp4 # π Final output β 1080Γ1920 MP4
Each video consists of:
- Persistent shell throughout (header brand
>_icon + channel name + tag, footer TikTok handle, grain texture, navy gradient) - 5β8 scenes with templates picked by Claude based on content:
| Template | When to use | Example |
|---|---|---|
hook |
First scene (3-5s) | "GPT 5.5" + "AI mαΊ‘nh nhαΊ₯t!" over og:image with shimmer |
comparison |
When content has "X vs Y" / "exceeds" / "compared to" | 2 cards: "GPT 5.4 75.1%" cyan vs "GPT 5.5 82.7%" purple (winner) |
stat-hero |
When there's a key number/% | "1M" giant gradient + "Tokens / context window" |
feature-list |
When listing features | Card with 4 bullets, cyan glow dots |
callout |
Statement / warning / quote | Purple glow card with "Warning: agentic AI needs caution" |
outro |
Last scene (3-5s) | "Follow now" pill + "CΓ΄ng nghα» 24h" giant + gradient underline |
npm test # run 35 unit tests
npm run test:watch # watch mode
npx tsc --noEmit # type-check without build| Error | Fix |
|---|---|
Missing VIETNAMESE_API_KEY / Missing ELEVENLABS_API_KEY |
Check .env.local exists and TTS_PROVIDER matches |
hyperframes render failed |
Run npx hyperframes render --help to verify CLI; is Chrome installed? |
LucyLab polling timeout |
Increase LUCYLAB_POLL_TIMEOUT_MS in .env.local (default 120000ms) |
ElevenLabs 401 Invalid API key |
Verify key on ElevenLabs dashboard, re-paste into .env.local |
Total duration outside [48s, 72s] |
Re-trigger skill, or edit script.json to make text longer/shorter |
ffprobe: command not found |
Install FFmpeg: Windows winget install Gyan.FFmpeg, Mac brew install ffmpeg |
- Burned-in captions (forced alignment with Whisper)
- Auto-select background music by mood
- Multi-news compilation mode (
digest) - AI-generated images (Flux/Stable Diffusion when og:image unavailable)
- Auto-upload TikTok / YouTube Shorts / Reels via API
- Custom logo overlay
- Multi-language (English, Chinese)
- Standalone Web UI (no Claude Code required)
MIT β use freely, fork freely, PRs welcome.
Pull requests welcome! For major changes, please open an issue first to discuss what you'd like to change.
# Fork β clone β branch
git checkout -b feature/my-improvement
# Make changes, ensure tests pass
npm test
# Commit (Conventional Commits style)
git commit -m "feat: add Google TTS provider support"
# Push and open PR
git push origin feature/my-improvement- HyperFrames by HeyGen β the HTML-to-video framework that makes this possible
- LucyLab.io β Vietnamese voice cloning API
- ElevenLabs β multilingual TTS
- Anthropic Claude β the LLM that generates scripts via Claude Code skill