Interactive longform-to-shortform video creator powered by Claude Code. Extracts viral-ready vertical clips from long videos using Claude as the intelligent orchestrator with Remotion-rendered premium animated captions.
Claude Code guides you through a 10-step interactive pipeline:
- Preflight - Validates input video, checks disk space, detects GPU
- Transcribe - GPU-accelerated transcription with word-level timestamps (faster-whisper)
- Detect Content - Auto-classifies: talking-head, screen recording, or podcast
- Analyze - Claude reads the full transcript and scores 8-12 candidate segments
- Present - Shows candidates in a formatted table with scores, hooks, and rationale
- Approve - You pick segments, adjust timecodes, choose caption style and platform
- Snap Boundaries - Aligns cut points to word boundaries, sentence endings, and audio silences
- Prepare - Extracts clips (FFmpeg stream copy) and computes reframe coordinates
- Render - Remotion renders 1080x1920 vertical video with animated captions
- Export - Platform-optimized encoding (YouTube Shorts, TikTok, Instagram Reels)
Demo video/GIF coming soon — showing the full pipeline from input to rendered short with Bold-style captions.
- Claude-powered segment scoring - 5-dimension rubric (hook strength, coherence, emotion, value density, payoff) with weighted scoring. No heuristic keyword matching - Claude understands narrative arcs.
- 3 caption styles - Bold (ALL CAPS, yellow highlights), Bounce (bouncy spring, rotating colors), Clean (minimal fade-in)
- Cursor tracking - For screen recordings, detects mouse cursor via frame differencing and smoothly pans the crop to follow it
- Audio-aware boundary snapping - Never cuts mid-word or mid-sentence. Extends to natural sentence endings and silence points.
- Remotion rendering - React-based single-pass rendering with spring animations, word-level karaoke highlighting, hook text overlays, and progress bars
- GPU acceleration - CUDA for transcription, NVENC for export encoding (falls back to CPU gracefully)
- FFmpeg (system package)
- Python 3.10+
- Node.js 18+
- Claude Code (CLI)
- NVIDIA GPU recommended (for CUDA transcription + NVENC encoding)
# Clone the repository
git clone https://github.com/AgriciDaniel/claude-shorts.git
cd claude-shorts
# Install Python + Node.js dependencies
bash setup.sh
# Install as a Claude Code skill
bash install.shclaude-shorts requires Unix tools (FFmpeg, bash). On Windows, use WSL 2:
# Inside WSL
git clone https://github.com/AgriciDaniel/claude-shorts.git
cd claude-shorts
bash setup.sh
bash install.sh- Creates a Python virtual environment at
~/.shorts-skill/(or reuses~/.video-skill/if it exists) - Installs
faster-whisper,mediapipe,numpy,opencv-python, and PyTorch (CUDA or CPU variant) - Runs
npm installin theremotion/directory - Checks for system dependencies (FFmpeg, jq)
In Claude Code, invoke the skill:
/shorts
Then provide your video file when prompted. Claude will:
- Transcribe the video
- Present scored segment candidates
- Ask which segments to render, caption style, and target platform
- Render and export the final shorts
You: /shorts ~/Videos/my-talk.mp4
Claude: [Transcribes, detects content type, scores segments]
| # | Time | Dur | Score | Hook |
|---|---------------|------|-------|--------------------------------|
| 1 | 04:22 - 05:01 | 39s | 87 | "Nobody talks about this..." |
| 2 | 12:45 - 13:28 | 43s | 82 | "Here's the exact framework." |
| 3 | 08:11 - 08:52 | 41s | 79 | "I tested this for 6 months." |
Claude: Which segments? Caption style? Platform?
You: 1 and 3, bounce style, youtube
Claude: [Snaps boundaries, extracts clips, renders, exports]
Output: shorts/short_01_yt.mp4, shorts/short_03_yt.mp4
claude-shorts/
├── SKILL.md # 10-step interactive pipeline (Claude Code skill)
├── CLAUDE.md # Project-level instructions
├── install.sh # Install to ~/.claude/skills/
├── setup.sh # Python + Node dependency installer
│
├── scripts/
│ ├── transcribe.py # faster-whisper GPU transcription
│ ├── detect_content.py # MediaPipe content type classifier
│ ├── compute_reframe.py # Face tracking + cursor tracking + crop
│ ├── snap_boundaries.py # Audio-aware boundary snapping
│ ├── preflight.sh # Input validation + disk space check
│ ├── detect_gpu.sh # NVIDIA NVENC detection
│ └── export.sh # Platform-specific FFmpeg encoding
│
├── remotion/
│ ├── package.json # Remotion v4 + React 19 + Zod
│ ├── render.mjs # Bundle-once-render-many orchestrator
│ ├── remotion.config.ts
│ └── src/
│ ├── Root.tsx # Composition registry
│ ├── ShortVideo.tsx # Main composition
│ ├── types.ts # Zod schemas for props
│ ├── components/
│ │ ├── VideoFrame.tsx # Reframed video with animated crop pan
│ │ ├── Captions.tsx # Style dispatcher
│ │ ├── BoldCaptions.tsx # Bold ALL CAPS, pop-in spring
│ │ ├── BounceCaptions.tsx # Bouncy scale, bright colors
│ │ ├── CleanCaptions.tsx # Minimal fade-in
│ │ ├── HookOverlay.tsx # First 3.5s hook text
│ │ └── ProgressBar.tsx # Bottom progress indicator
│ ├── hooks/
│ │ └── useCaptionPages.ts # @remotion/captions TikTok-style pages
│ └── styles/
│ ├── fonts.ts # @font-face declarations
│ └── theme.ts # Color palettes per style
│
└── references/
├── scoring-rubric.md # 5-dimension scoring criteria
├── caption-styles.md # Visual specs + spring configs
├── platform-specs.md # YouTube/TikTok/Instagram encoding
└── remotion-patterns.md # Remotion best practices
| Style | Font | Animation | Best For |
|---|---|---|---|
| Bold | Montserrat Bold | Pop-in spring, yellow active word | Business, education |
| Bounce | Bangers | Bouncy scale 70-120-100%, rotating colors | Entertainment, energy |
| Clean | Inter Bold | Fade-in opacity, white + shadow | Professional, interviews |
| Platform | Codec | Bitrate | Audio |
|---|---|---|---|
| YouTube Shorts | H.264 High 4.2 | 12 Mbps | AAC 192k |
| TikTok | H.264 | CRF 18, -preset slow | AAC 128k |
| Instagram Reels | H.264 High 4.2 | 4.5 Mbps (max 5000k) | AAC 128k |
| Content Type | Reframe Strategy | Zoom |
|---|---|---|
| Talking-head | Face-tracked center crop (MediaPipe) | 9:16 exact |
| Screen recording | Cursor-tracked pan with moderate zoom | 55% of source width |
| Podcast | Dominant speaker tracking | 9:16 exact |
- faster-whisper - GPU-accelerated Whisper
- mediapipe - Face detection for content classification + reframing
- numpy - Array operations for cursor tracking smoothing
- opencv-python - Frame differencing for cursor detection
- Remotion v4 - React-based video rendering
- @remotion/captions - TikTok-style word-level captions
- React 19 - Component framework
- Zod - Runtime type validation for props
Caption fonts are bundled from Google Fonts under the SIL Open Font License:
- Montserrat Bold (Bold style)
- Bangers Regular (Bounce style)
- Inter Bold (Clean style)
Claude scores each candidate on 5 weighted dimensions:
| Dimension | Weight | What Claude Looks For |
|---|---|---|
| Hook Strength | 0.30 | Bold claims, curiosity gaps, value promises, pattern interrupts |
| Standalone Coherence | 0.25 | Makes complete sense without any context from the rest of the video |
| Emotional Intensity | 0.20 | Strong opinions, surprise reveals, humor, passion |
| Value Density | 0.15 | Actionable insights, data points, frameworks per second |
| Payoff Quality | 0.10 | Satisfying conclusion - punchline, reveal, call-to-action |
Final score = weighted sum, scale 0-100. Minimum threshold: 60.
- Issues: GitHub Issues
- Discussions: GitHub Discussions
