AI Video Breakdown System - Implementation Guide

Overview

This implementation adds AI-powered video breakdown functionality to the VortexHub application using Google Cloud services. The system automatically extracts audio from uploaded videos, transcribes the content using Google Speech-to-Text, and analyzes the transcript using Google Gemini AI to generate structured step-by-step instructions.

Features

Automatic Audio Extraction: Uses FFmpeg to extract high-quality audio from video files
Speech-to-Text Transcription: Leverages Google Speech-to-Text API for accurate transcription
AI-Powered Analysis: Uses Google Gemini Pro to extract:
- Step-by-step instructions
- Required materials and quantities
- Tools and equipment needed
- Safety cautions and warnings
- Validation questions for learning
Progress Tracking: Real-time progress updates during processing
Cost Optimization: Uses the most cost-effective Google Cloud services

Architecture

[User Uploads Video] → [Frontend] → [Backend API] → [Google Cloud Services]
                                                      ↓
[Audio Extraction (FFmpeg)] → [Speech-to-Text] → [Gemini AI Analysis] → [Structured Output]

Frontend Implementation

New Components Added

AI Video Analysis Section (CreateProject.js)
- New button: "Analyze with AI"
- Progress indicator during processing
- Results display showing detected steps, materials, and tools
- Integration with existing project creation flow
Google Cloud API Service (src/services/googleCloudApi.js)
- Centralized service for all Google Cloud API calls
- Error handling and retry logic
- Progress callback support

Key Features

Conditional Submission: Form automatically uses AI data when available
Progress Feedback: Real-time updates during AI processing
Error Handling: Comprehensive error messages and recovery
Data Integration: AI results seamlessly integrated into project creation

Backend Implementation

Required API Endpoints

POST /extract-audio
- Accepts video file upload
- Extracts audio using FFmpeg
- Uploads audio to Google Cloud Storage
- Returns GCS audio URL
POST /transcribe
- Accepts GCS audio URL
- Uses Google Speech-to-Text API
- Returns transcript with timestamps
- Supports multiple languages
POST /analyze-transcript
- Accepts transcript and context
- Uses Google Gemini Pro for analysis
- Returns structured step data
- Includes materials, tools, cautions, and questions

Technology Stack

Framework: FastAPI (Python)
Audio Processing: FFmpeg
Cloud Services: Google Cloud Platform
AI Model: Google Gemini Pro (via Vertex AI)
Storage: Google Cloud Storage

Setup Instructions

1. Google Cloud Project Setup

Create a new Google Cloud Project or use existing

Enable required APIs:

gcloud services enable speech.googleapis.com
gcloud services enable aiplatform.googleapis.com
gcloud services enable storage.googleapis.com

Create a service account with appropriate permissions:

gcloud iam service-accounts create video-breakdown-sa \
  --display-name="Video Breakdown Service Account"

Grant necessary roles:

gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
  --member="serviceAccount:video-breakdown-sa@YOUR_PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/speech.client"

gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
  --member="serviceAccount:video-breakdown-sa@YOUR_PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/aiplatform.user"

gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
  --member="serviceAccount:video-breakdown-sa@YOUR_PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/storage.objectViewer"

Download the service account key:

gcloud iam service-accounts keys create service-account-key.json \
  --iam-account=video-breakdown-sa@YOUR_PROJECT_ID.iam.gserviceaccount.com

2. Environment Variables

Create a .env file in your backend directory:

# Google Cloud Configuration
GOOGLE_CLOUD_PROJECT_ID=your-project-id
GOOGLE_CLOUD_STORAGE_BUCKET=your-storage-bucket
GOOGLE_VERTEX_AI_LOCATION=us-central1
GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account-key.json

# Backend Configuration
API_HOST=0.0.0.0
API_PORT=8000
CORS_ORIGINS=http://localhost:3000,https://your-frontend-domain.com

3. Backend Installation

Install Python dependencies:
```
pip install -r backend_requirements.txt
```

Install FFmpeg (required for audio extraction):

# Ubuntu/Debian
sudo apt update && sudo apt install ffmpeg

# macOS
brew install ffmpeg

# Windows
# Download from https://ffmpeg.org/download.html

Run the backend:

python backend_implementation_example.py

4. Frontend Configuration

Update your frontend .env file:

# Existing Firebase config...
REACT_APP_FIREBASE_API_KEY=your_firebase_api_key
# ... other Firebase variables

# Backend API URL
REACT_APP_API_URL=http://localhost:8000

Usage

1. Upload Videos

Navigate to the Create Project page
Upload one or more video files
Fill in project details (name, description, tags)

2. AI Analysis

Click "Analyze with AI" button
Wait for processing to complete (progress shown in real-time)
Review the generated results

3. Create Project

Click "Save with AI Data & Continue"
Project is created with AI-generated steps
Continue to annotation page for manual refinement

Cost Analysis

Google Cloud Pricing (as of 2024)

Speech-to-Text API:

Standard Model: $0.006 per 15 seconds
Enhanced Model: $0.009 per 15 seconds
Video Model: $0.006 per 15 seconds (recommended)

Gemini Pro (Vertex AI):

Input: $0.0005 per 1K characters
Output: $0.0015 per 1K characters

Storage:

Standard: $0.020 per GB per month
Audio files are typically small (< 10MB per video)

Example Cost Calculation

For a 10-minute video:

Audio extraction: ~$0.24 (Speech-to-Text)
AI analysis: ~$0.05-0.15 (Gemini Pro)
Storage: ~$0.0002 (audio file)
Total per video: ~$0.30-0.40

Best Practices

1. Video Quality

Use clear audio for better transcription
Avoid background noise
Speak clearly and at normal pace

2. Cost Optimization

Use the "video" model for Speech-to-Text (optimized for video content)
Implement caching for repeated transcriptions
Clean up temporary files regularly

3. Error Handling

Implement retry logic for API failures
Provide clear error messages to users
Log errors for debugging

4. Security

Validate uploaded files
Implement rate limiting
Use service accounts with minimal permissions
Secure API keys and credentials

Troubleshooting

Common Issues

FFmpeg not found
- Ensure FFmpeg is installed and in PATH
- Check installation with ffmpeg -version
Google Cloud authentication errors
- Verify service account key is correct
- Check API permissions
- Ensure APIs are enabled
CORS errors
- Update CORS configuration in backend
- Check frontend URL is in allowed origins
Audio extraction failures
- Verify video file format is supported
- Check file size limits
- Ensure sufficient disk space

Debug Steps

Check backend logs for detailed error messages
Verify environment variables are loaded correctly
Test API endpoints individually using tools like Postman
Monitor Google Cloud Console for API usage and errors

Future Enhancements

Batch Processing: Process multiple videos simultaneously
Custom Models: Train custom models for specific domains
Video Analysis: Add computer vision for visual step detection
Multi-language Support: Support for multiple languages
Real-time Processing: Stream processing for live videos
Advanced Analytics: Detailed usage analytics and insights

Support

For issues and questions:

Check the troubleshooting section above
Review Google Cloud documentation
Check application logs for detailed error messages
Monitor Google Cloud Console for API usage and quotas

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AI Video Breakdown System - Implementation Guide

Overview

Features

Architecture

Frontend Implementation

New Components Added

Key Features

Backend Implementation

Required API Endpoints

Technology Stack

Setup Instructions

1. Google Cloud Project Setup

2. Environment Variables

3. Backend Installation

4. Frontend Configuration

Usage

1. Upload Videos

2. AI Analysis

3. Create Project

Cost Analysis

Google Cloud Pricing (as of 2024)

Example Cost Calculation

Best Practices

1. Video Quality

2. Cost Optimization

3. Error Handling

4. Security

Troubleshooting

Common Issues

Debug Steps

Future Enhancements

Support

FilesExpand file tree

AI_VIDEO_BREAKDOWN_README.md

Latest commit

History

AI_VIDEO_BREAKDOWN_README.md

File metadata and controls

AI Video Breakdown System - Implementation Guide

Overview

Features

Architecture

Frontend Implementation

New Components Added

Key Features

Backend Implementation

Required API Endpoints

Technology Stack

Setup Instructions

1. Google Cloud Project Setup

2. Environment Variables

3. Backend Installation

4. Frontend Configuration

Usage

1. Upload Videos

2. AI Analysis

3. Create Project

Cost Analysis

Google Cloud Pricing (as of 2024)

Example Cost Calculation

Best Practices

1. Video Quality

2. Cost Optimization

3. Error Handling

4. Security

Troubleshooting

Common Issues

Debug Steps

Future Enhancements

Support