Skip to content

shivapreetham/ai-automated-video-gen

Repository files navigation

AI-Automated-Video-Gen 🎬🤖

Autonomous shorts creator that scrapes trending topics, writes a tight script, generates visuals + voice, assembles a vertical video, and uploads it to Cloudflare—hands-off.

Mode: Dynamic agent only (no static/series).

Stack: Flask backend • Pollinations (images) • ElevenLabs (TTS) • ffmpeg (assembly) • Cloudflare R2 (artifacts) • Postgres (state) • Akash (deploy scraper + backend on CPU).

Why this is different

Agentic, not just “prompt → video.” A planner decides when to scrape, how to dedupe topics, how many shots to use, whether the script is claimable and on-time, and when to publish.

Vendor-swap friendly. Start with Pollinations/ElevenLabs; later plug in your own Akash GPU services for /t2i or /i2v without changing the control plane.

Storage-first. Every artifact is private in R2; you share short-lived signed links only.

Architecture (current reality)

Flask Backend (control plane): Single public API to kick off runs, stream progress, and expose signed URLs. Coordinates scraping, script/storyboard building, image + TTS calls, captioning, and ffmpeg assembly. Persists run state to Postgres.

Scraper (CPU microservice on Akash): Fetches trending/news items from whitelisted sources, normalizes them, dedupes by URL/title/hash, and writes a compact facts JSON for the backend.

External AI services:

Pollinations for text-to-image (frames).

ElevenLabs for voiceover (narration).

Processing: ffmpeg to stitch frames, transitions, narration, and captions into a vertical MP4.

Storage: Cloudflare R2 for frames, audio, captions, and final video; all access via signed URLs only.

Database: Postgres to track topics processed, run status/timings, dedupe keys, and cost counters.

Scheduler: Simple cron/job runner that triggers the dynamic agent N times/day per category.

End-to-end flow

Scrape & select Scraper pulls trending items for configured categories → filters by freshness window (e.g., last 24–48h), domain allowlist, and novelty (dedupe) → produces facts.json (title, summary, URL, timestamp, source).

Script Backend converts facts into an 80–120 word voice script with a punchy hook, 3–5 beats, and an outro line.

Storyboard Planner emits shots.json: 5–7 shots, on-screen text per beat, style preset, and which 1–2 shots get light emphasis (Ken Burns today; i2v later).

Media generation

Images: Pollinations → frame_1..N.png

Voice: ElevenLabs → narration.wav (+ char count for cost)

Captions Build captions.srt aligned to narration (word/phrase timing from the script beats).

Assembly ffmpeg composes vertical (720×1280 or 1080×1920), applies transitions, mixes narration (and optional BG music), burns captions if desired, exports final/video.mp4.

Upload & publish Backend uploads artifacts to R2 and returns a signed URL to the final MP4. (Platform auto-publish can be toggled later.)

Configuration you actually need

Create environment variables (or a config file) for:

ElevenLabs: API key, default voice id, speed

Pollinations: base URL and any rate/backoff parameters

Cloudflare R2: account id, access key, secret key, bucket name, signed URL TTL

Postgres: connection URL

App: Flask secret, base URL, allowed origins, log level

Scraper policy: categories, freshness window (hours), domain allowlist, max items/run, max runs/day

Planner policy: target duration (e.g., 30–45s), number of shots (5–7), caption style, music on/off, failure fallbacks (e.g., drop music if mux fails)

Keep secrets server-side only. Do not expose keys in the browser.

What each run produces

facts.json (scraper output + references)

script.json (beats + timestamps)

shots.json (prompts, on-screen text, durations)

frame_i.png (Pollinations)

narration.wav (ElevenLabs)

captions.srt

final/video.mp4 (stored in R2, returned as a signed URL)

Local development checklist (no code here, just the order)

Set your environment variables (above).

Run the Flask backend locally and point it at your Postgres/R2.

Start the scraper locally and verify facts.json lands in R2 (or returns to backend).

Trigger a single dynamic run with one category and confirm: script → images → TTS → captions → ffmpeg → R2 signed URL.

Test rate limits: throttle Pollinations + ElevenLabs with backoff/retry and verify graceful degradation (e.g., fewer shots).

Turn on the scheduler for 1 run/day and verify automation.

Deployment on Akash (CPU-only to start)

Containers:

backend (Flask app + ffmpeg + R2 client)

scraper (lightweight Python job service)

Networking: Expose only the backend’s HTTP port publicly; scraper can be private and talk to backend/storage.

State: Use a managed Postgres (Neon/Supabase/RDS) so you don’t lose state when leases churn.

Storage: Keep the R2 bucket private. Backend issues uploads/downloads via signed URLs.

Scheduling: Easiest: cron inside scraper container or a small scheduler thread in backend. Safer: an external job runner that calls backend’s “start run” endpoint.

Observability: Emit step events and durations; log vendor responses (status codes, ms, chars). Persist a run_audit.json per video with counts and costs.

Guardrails, costs & reality (brutally honest)

Pollinations: convenient but style can wander; use a prompt scaffold + consistent “style preset” and consider a seed to reduce jitter; expect occasional failures—retry with exponential backoff.

ElevenLabs: great quality; watch character quotas; cache intro/outro lines to save cost.

ffmpeg: most failures are from timing or audio mux; if assembly fails, retry without background music and with simpler transitions.

R2 egress: storage is cheap, egress isn’t—prefer signed links over public hosting; avoid re-downloading large assets during retries.

Akash GPUs: currently scarce—don’t block on them. Your path is correct: run scraper + backend on CPU now; swap Pollinations for your Akash /t2i when you land a GPU lease.

Operational policies that make it feel “agentic”

Freshness: ignore items older than your window (e.g., 36h).

Novelty: dedupe by URL + title hash; don’t cover the same topic twice within N days.

Quality gate: minimal source quality score per domain; auto-reject low-cred sites.

Time budget: if image or TTS stalls, drop to 5 shots and publish anyway.

Compliance: apply a safe-content filter on scraped text and prompts; show a compact “sources” overlay or description.

Roadmap (immediate next steps)

Today

Lock planner policy JSON (duration, shots, presets, fallback rules).

Finish scraper dedupe + allowlist and wire it to backend.

Add per-run cost chip (TTS chars, image calls, egress estimate).

Ship minimal status page that streams step updates and shows the final player via signed URL.

Next

Background music library + loudness normalization.

Thumbnail auto-gen.

Multi-category scheduling with concurrency caps.

Later (when GPUs free up)

Self-host /t2i on Akash; optional /i2v for 1–2 animated shots.

Auto-publish to YouTube/TikTok with rate guards.

About

Autonomous shorts creator that scrapes trending topics, writes a tight script, generates visuals + voice, assembles a vertical video, and uploads it to Cloudflare—hands-off.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors