Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
190 changes: 190 additions & 0 deletions LLM_PROVIDERS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,190 @@
# OpenAI API Compatible LLM Support

Vinci Clips now supports multiple LLM providers through an OpenAI-compatible API interface. This allows you to use various AI services for video transcription and analysis.

## Supported Providers

### 1. Google Gemini (Default)
- **API Key**: `GEMINI_API_KEY`
- **Models**: `gemini-1.5-flash`, `gemini-2.5-flash`, etc.
- **Features**: Audio transcription + Text analysis
- **Cost**: Free tier available

### 2. OpenAI
- **API Key**: `OPENAI_API_KEY`
- **Models**: `gpt-3.5-turbo`, `gpt-4`, `gpt-4-turbo`, etc.
- **Features**: Text analysis only (transcription still requires Gemini)
- **Cost**: Pay per use

### 3. OpenAI-Compatible APIs
- **API Key**: `LLM_API_KEY`
- **Base URL**: `LLM_BASE_URL`
- **Examples**: Perplexity, local APIs, gpt4free proxies
- **Features**: Text analysis only
- **Cost**: Varies by provider

## Configuration

### Environment Variables

Add these to your `backend/.env` file:

```env
# Primary LLM Provider
LLM_PROVIDER=gemini # or 'openai'

# Gemini Configuration (recommended for full functionality)
GEMINI_API_KEY=your_gemini_api_key_here
LLM_MODEL=gemini-1.5-flash

# OR OpenAI Configuration
OPENAI_API_KEY=your_openai_api_key_here
LLM_MODEL=gpt-3.5-turbo

# OR Custom OpenAI-Compatible API
LLM_API_KEY=your_api_key_here
LLM_BASE_URL=https://api.example.com/v1
LLM_MODEL=custom-model-name
```

### Provider Selection Logic

1. **Primary Provider**: Set by `LLM_PROVIDER` environment variable
2. **Automatic Fallback**: If primary provider fails, automatically tries available alternatives
3. **Provider Detection**: Automatically detects which providers are configured based on available API keys

## Usage Examples

### Using Gemini (Recommended)
```env
LLM_PROVIDER=gemini
GEMINI_API_KEY=AIza...your_key
LLM_MODEL=gemini-1.5-flash
```

### Using OpenAI
```env
LLM_PROVIDER=openai
OPENAI_API_KEY=sk-...your_key
LLM_MODEL=gpt-3.5-turbo
```

### Using Perplexity AI
```env
LLM_PROVIDER=openai
LLM_API_KEY=pplx-...your_key
LLM_BASE_URL=https://api.perplexity.ai
LLM_MODEL=llama-3.1-sonar-small-128k-online
```

### Using Local/Custom API
```env
LLM_PROVIDER=openai
LLM_API_KEY=your_local_key
LLM_BASE_URL=http://localhost:1234/v1
LLM_MODEL=local-model
```

## API Endpoints

### Check Provider Status
```bash
curl http://localhost:8080/clips/llm/provider-info
```

Response:
```json
{
"status": "success",
"data": {
"provider": "gemini",
"available": ["gemini", "openai"],
"model": "gemini-1.5-flash"
}
}
```

## Important Notes

### Audio Transcription Limitation
- **Audio transcription currently only works with Gemini** due to its file upload API
- OpenAI Whisper integration is planned for future releases
- For now, you can use OpenAI for analysis while keeping Gemini for transcription

### Fallback Behavior
- If your primary provider fails, the system automatically tries other configured providers
- This ensures high availability even if one service is down

### Cost Optimization
- **Gemini**: Free tier with generous limits, best for getting started
- **OpenAI**: Pay-per-use, higher quality but more expensive
- **Alternatives**: Often cheaper or free options available

## Getting API Keys

### Google Gemini (Free)
1. Visit [Google AI Studio](https://makersuite.google.com/app/apikey)
2. Click "Create API key"
3. Copy the key to your `.env` file

### OpenAI (Paid)
1. Visit [OpenAI API](https://platform.openai.com/api-keys)
2. Create new secret key
3. Copy the key to your `.env` file

### Perplexity AI (Paid)
1. Visit [Perplexity API](https://www.perplexity.ai/settings/api)
2. Generate API key
3. Set `LLM_BASE_URL=https://api.perplexity.ai`

## Troubleshooting

### "No LLM provider configured"
- Ensure you have set at least one of: `GEMINI_API_KEY`, `OPENAI_API_KEY`, or `LLM_API_KEY`
- Check that your API keys are valid and not expired

### Provider Info Shows Empty Available Array
- This means no valid API keys were detected
- Verify your environment variables are loaded correctly
- Check API key format and validity

### Analysis Works But Transcription Fails
- This is expected when using only OpenAI/custom providers
- Keep `GEMINI_API_KEY` for transcription functionality
- The system will use Gemini for transcription and your chosen provider for analysis

## Migration Guide

### From Gemini-Only Setup
Your existing setup will continue working without changes. To add OpenAI:

```env
# Keep existing Gemini configuration
GEMINI_API_KEY=your_existing_key

# Add OpenAI as secondary provider
OPENAI_API_KEY=your_openai_key

# Optional: Switch primary provider
LLM_PROVIDER=openai
```

### For New Installations
Choose your preferred configuration:

**Option 1: Gemini Only (Recommended for beginners)**
```env
LLM_PROVIDER=gemini
GEMINI_API_KEY=your_gemini_key
LLM_MODEL=gemini-1.5-flash
```

**Option 2: Hybrid Setup (Best of both worlds)**
```env
LLM_PROVIDER=openai
GEMINI_API_KEY=your_gemini_key # For transcription
OPENAI_API_KEY=your_openai_key # For analysis
LLM_MODEL=gpt-3.5-turbo
```

This setup gives you free transcription with Gemini and high-quality analysis with OpenAI.
19 changes: 17 additions & 2 deletions backend/.env.example
Original file line number Diff line number Diff line change
@@ -1,6 +1,21 @@
# Port for the backend server
PORT=8080

# Gemini API Key
# LLM Provider Configuration
# Primary provider: 'gemini' (default) or 'openai'
LLM_PROVIDER=gemini

# Gemini API Key (Google)
GEMINI_API_KEY="ENTER YOUR API KEY HERE"
LLM_MODEL=gemini-2.5-flash

# OpenAI API Configuration
# Use OPENAI_API_KEY for official OpenAI API
OPENAI_API_KEY="ENTER YOUR OPENAI API KEY HERE"

# Alternative: Use LLM_API_KEY for OpenAI-compatible APIs (e.g., Perplexity, local APIs)
# LLM_API_KEY="ENTER YOUR API KEY HERE"
# LLM_BASE_URL="https://api.perplexity.ai" # Custom base URL for OpenAI-compatible APIs

# Model selection (provider-specific)
LLM_MODEL=gemini-2.5-flash # For Gemini: gemini-1.5-flash, gemini-2.5-flash, etc.
# LLM_MODEL=gpt-3.5-turbo # For OpenAI: gpt-3.5-turbo, gpt-4, gpt-4-turbo, etc.
22 changes: 22 additions & 0 deletions backend/package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion backend/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,10 @@
"dotenv": "^16.3.1",
"express": "^4.18.2",
"fluent-ffmpeg": "^2.1.2",
"uuid": "^9.0.1",
"multer": "^2.0.1",
"openai": "^5.23.1",
"redis": "^5.6.0",
"uuid": "^9.0.1",
"winston": "^3.17.0",
"winston-daily-rotate-file": "^5.0.0"
},
Expand Down
93 changes: 3 additions & 90 deletions backend/src/routes/analyze.js
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
const express = require('express');
const router = express.Router();
const Transcript = require('../models/Transcript');
const { GoogleGenerativeAI, HarmCategory, HarmBlockThreshold } = require('@google/generative-ai');
const llmService = require('../services/llmService');

router.post('/:transcriptId', async (req, res) => {
try {
Expand All @@ -13,97 +13,10 @@ router.post('/:transcriptId', async (req, res) => {
// Join transcript segments into a single string for analysis by the LLM
const fullTranscriptText = transcriptDoc.transcript.map(segment => segment.text).join(' ');

const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY);
const model = genAI.getGenerativeModel({
model: process.env.LLM_MODEL || 'gemini-1.5-flash',
});

const videoDurationText = transcriptDoc.duration ? ` The video is ${Math.floor(transcriptDoc.duration / 60)}:${String(Math.floor(transcriptDoc.duration % 60)).padStart(2, '0')} long.` : '';

const maxTimeFormatted = Math.floor(transcriptDoc.duration / 60) + ':' + String(Math.floor(transcriptDoc.duration % 60)).padStart(2, '0');

const prompt = `Given the following transcript, propose 3-5 video clips that would make engaging short content.${videoDurationText}

CRITICAL CONSTRAINTS:
- Video duration is EXACTLY ${videoDurationText ? maxTimeFormatted : 'unknown'} - DO NOT suggest any timestamps beyond this
- Each clip should be 30-90 seconds total duration
- All timestamps must be in MM:SS format and within 0:00 to ${maxTimeFormatted}

You can suggest two types of clips:

1. SINGLE SEGMENT clips: One continuous segment from start time to end time
2. MULTI-SEGMENT clips: Multiple segments that when combined tell a coherent story

For single segments: provide 'start' and 'end' times in MM:SS format.
For multi-segments: provide an array of segments in 'segments' field, each with 'start' and 'end' times.

VALIDATION RULES:
- Every timestamp must be ≤ ${maxTimeFormatted}
- Total duration must be 30-90 seconds
- Focus on complete thoughts or exchanges
- Ensure segments make sense when combined

Output format: JSON array where each object has:
- 'title': descriptive title
- For single segments: 'start' and 'end' fields
- For multi-segments: 'segments' array with objects containing 'start' and 'end'

Transcript: ${fullTranscriptText}`;

const result = await model.generateContent({
contents: [{
role: 'user',
parts: [{ text: prompt }],
}],
generationConfig: {
responseMimeType: 'application/json',
responseSchema: {
type: 'ARRAY',
items: {
type: 'OBJECT',
properties: {
title: { type: 'STRING' },
start: { type: 'STRING' },
end: { type: 'STRING' },
segments: {
type: 'ARRAY',
items: {
type: 'OBJECT',
properties: {
start: { type: 'STRING' },
end: { type: 'STRING' },
},
required: ['start', 'end'],
},
},
},
required: ['title'],
propertyOrdering: ['title', 'start', 'end', 'segments'],
},
},
},
safetySettings: [
{
category: HarmCategory.HARM_CATEGORY_HARASSMENT,
threshold: HarmBlockThreshold.BLOCK_NONE,
},
{
category: HarmCategory.HARM_CATEGORY_HATE_SPEECH,
threshold: HarmBlockThreshold.BLOCK_NONE,
},
{
category: HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT,
threshold: HarmBlockThreshold.BLOCK_NONE,
},
{
category: HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,
threshold: HarmBlockThreshold.BLOCK_NONE,
},
],
});

const response = await result.response;
const suggestedClips = JSON.parse(response.text());
// Use the LLM service for analysis
const suggestedClips = await llmService.analyzeTranscript(fullTranscriptText, transcriptDoc.duration, maxTimeFormatted);

// Convert MM:SS time format to seconds for database storage
const convertTimeToSeconds = (timeString) => {
Expand Down
Loading
Loading