Browser-based audio program generator - creates spoken audio from text with optional background music.
- Text-to-Speech: Convert text files to spoken audio
- Multiple TTS Engines: OpenAI TTS, Google Cloud TTS, Web Speech API
- Background Music: Mix speech with background audio (supports MP3, WAV, OGG, M4A, AAC, FLAC, AIFF)
- Audio Export: Download as WAV (uncompressed) or MP3 (7-10x smaller)
- Robust Error Handling: Automatic retry with exponential backoff for API failures (details)
- Smart Caching: IndexedDB-based caching with easy cache management (details)
- Browser-Based: No installation required, runs entirely in your browser
- Modern UI: Clean, responsive interface with Pico.css
- Internet Required: First load requires internet to download styling and MP3 encoder library
- Clone the repository
- Install dependencies:
npm install
- Run a local web server:
npm run serve
- Open
http://localhost:8080in your browser - Upload a phrase file and generate audio!
This is the first stable production release! Major improvements include:
- Automatic Retry Logic: Up to 3 retry attempts with exponential backoff (1s, 2s, 3s) for failed API requests
- Smart Error Detection: Detects and handles empty responses, network issues, and API errors gracefully
- Detailed Logging: Console logs show exactly what's happening during audio generation
- Better Error Messages: Clear, actionable error messages with phrase context
- Clear Cache Button: Easy one-click cache management in the Output section
- Improved Reliability: Successfully handles intermittent API issues and network problems
- Progress Tracking: See which phrases are being generated and cached
- Comprehensive Test Suite: 161 tests covering all major functionality
- 100% Test Pass Rate: All tests passing with proper mocking
- Better Code Quality: ESLint and Prettier enforced via pre-commit hooks
See the full changelog for details.
High quality, pay-as-you-go pricing
- π° Cost: $15/1M characters (standard), $30/1M (HD)
- π Setup Required:
- Create OpenAI account
- Add payment method (pay-as-you-go, no monthly fees)
- Create API key
- β Quality: Excellent (6 natural-sounding voices)
- β Features: Voice selection, speed control (0.25x-4x)
- β Export: Full audio export and mixing support
- β Simplicity: No billing account setup hassles
Pricing Details:
- Pay only for what you use
- No monthly subscription required
- Example: 100K characters = $1.50 (standard) or $3.00 (HD)
- Example: 1000 phrases @ 150 chars = $2.25 (standard)
Best for: Users who want simple setup, predictable per-use costs, and excellent quality
High quality, generous free tier, complex setup
High quality, requires API key and billing setup
- π° Cost: Free tier (1M characters/month), then $16/1M characters
- π Setup Required:
- Create Google Cloud account
- Enable Text-to-Speech API
- Enable billing (credit card required, even for free tier)
- Create API key
- β Quality: Excellent (Neural2/WaveNet voices)
- β Features: Advanced parameters (pitch, rate, volume), dynamic voice discovery
- β Export: Full audio export and mixing support
Billing Details:
- Free tier: 1 million characters/month (WaveNet/Neural2)
- After free tier: $16 per 1 million characters
- Example: 6,600 generations of 150-char program = FREE
- Example: 10,000 generations = ~$8/month
Best for: Most users (generous free tier), professional quality
Requires backend proxy (CORS limitation)
- β Cost: Completely free
- β Status: Not available in browser-only version
β οΈ Issue: Google Translate endpoint blocks browser requests (CORS)- π§ Solution: Requires backend server or CORS proxy
Note: This worked in the Python version because it ran server-side. Browser version needs a proxy server to bypass CORS restrictions.
Free, no setup, limited functionality
- β Cost: Free
- β Setup: None
β οΈ Quality: Browser-dependent (Chrome: good, Firefox/Safari: poor)- β Export: Not supported (playback only)
- β Mixing: Not supported
Best for: Quick previews, testing (not recommended for production)
Time Required: ~2 minutes (one-time setup)
- Go to OpenAI Platform
- Sign up or sign in with your account
- Note: This is separate from ChatGPT Plus subscription
- Click on "Settings" β "Billing"
- Click "Add payment method"
- Enter your credit card information
- Note: You only pay for what you use (no monthly fees)
Setting up usage limits (recommended):
- Go to "Settings" β "Billing" β "Usage limits"
- Set a monthly budget (e.g., $10)
- You'll be notified when you approach the limit
- Go to API Keys page
- Click "Create new secret key"
- Configure the key:
- Name: "Audio Program Generator" (or any name you prefer)
- Owned by: Select "You"
- Permissions: Select "All" (includes TTS access)
- Alternative: Choose "Restricted" and enable "Model capabilities" for TTS-only access
- Click "Create secret key"
- Copy the API key (it will look like:
sk-proj-...) - Important: Save it securely - you won't be able to see it again
Note: The same API key works for all OpenAI services (ChatGPT API, DALL-E, TTS, Whisper, etc.). There's no separate "TTS-only" key type.
- Open the Audio Program Generator in your browser
- Select "OpenAI TTS" from the TTS Engine dropdown
- Paste your API key in the "OpenAI API Key" field
- Click "Save"
- The key is stored locally in your browser (not sent anywhere else)
Done! You can now generate high-quality audio with OpenAI TTS.
Available Voices:
- Nova (Female - warm and friendly) - Recommended for meditation
- Shimmer (Female - soft and gentle) - Great for relaxation
- Alloy (Neutral - balanced)
- Echo (Male)
- Fable (Male - British)
- Onyx (Male - deep)
Time Required: ~5 minutes (one-time setup)
- Go to Google Cloud Console
- Sign in with your Google account
- Accept the terms of service
- Click the project dropdown at the top of the page
- Click "New Project"
- Enter a project name (e.g., "Audio Program Generator")
- Click "Create"
- In the search bar, type "Text-to-Speech API"
- Click on "Cloud Text-to-Speech API"
- Click "Enable"
- Wait for the API to be enabled (~30 seconds)
- Click the hamburger menu (β°) β "Billing"
- Click "Link a billing account" or "Add billing account"
- Enter your credit card information
- Don't worry: You won't be charged unless you exceed the free tier (1M characters/month)
- You can set up billing alerts to notify you before any charges
Setting up billing alerts (recommended):
- Go to "Billing" β "Budgets & alerts"
- Click "Create budget"
- Set budget to $1 (or any amount)
- Set alert threshold to 50%, 90%, 100%
- You'll receive email alerts if you approach the limit
- Click the hamburger menu (β°) β "APIs & Services" β "Credentials"
- Click "Create Credentials" β "API key"
- Copy the API key (it will look like:
AIzaSyC...) - Optional but recommended: Click "Restrict key"
- Under "API restrictions", select "Restrict key"
- Select "Cloud Text-to-Speech API"
- Click "Save"
- Open the Audio Program Generator in your browser
- Select "Google Cloud TTS" from the TTS Engine dropdown
- Paste your API key in the "Google Cloud API Key" field
- Click "Save"
- The key is stored locally in your browser (not sent anywhere else)
Done! You can now generate high-quality audio with Google Cloud TTS.
Free alternative with no API key required
Microsoft Edge TTS is an unofficial API that provides free, high-quality neural voices without requiring an API key or billing setup.
Status: Implementation planned for future release
Advantages:
- β Completely free
- β No API key required
- β No billing setup needed
- β Excellent voice quality (Neural voices)
- β Works from browser
Limitations:
β οΈ Unofficial API (not officially supported by Microsoft)β οΈ Could potentially be rate-limited or discontinuedβ οΈ Fewer voice options than Google Cloud TTS
When available, this will be a great option for users who want high-quality TTS without the API key setup process.
- Make sure you copied the entire API key
- Check that you enabled the Text-to-Speech API
- Verify the API key restrictions (if set) include Text-to-Speech API
- Billing must be enabled even for free tier usage
- Go to Billing section and link a payment method
- Wait a few minutes after enabling billing
- You've exceeded the free tier (1M characters/month)
- Check your usage in Google Cloud Console β APIs & Services β Dashboard
- Either wait for the monthly reset or upgrade to paid tier
- Your API key is stored locally in your browser (localStorage)
- It's never sent to any server except Google's TTS API
- You can clear it anytime by clicking "Clear" in the browser
- For production use, consider restricting the API key to specific domains
Create a text file with phrases and durations:
Welcome to your audio program; 2
Take a deep breath; 3
*; 2
Relax your shoulders; 3
*; 5
You are calm and centered; 2
Format: phrase; duration_in_seconds
- Use
*for silence - One phrase per line
# Clone the repository
git clone https://github.com/jeffwright13/apg-web.git
cd apg-web
# Install dependencies
npm install
# Set up git hooks
npm run prepare# Start local server
npm run serve
# Open browser to http://localhost:8080# Run all tests
npm test
# Watch mode
npm test:watch
# Coverage report
npm test:coverage# Lint code
npm run lint
# Auto-fix linting issues
npm run lint:fix
# Format code
npm run format
# Check formatting
npm run format:checkThe app supports all common audio formats via the Web Audio API:
β Universal Support (All Browsers):
- MP3 (.mp3) - Most common format
- WAV (.wav) - Uncompressed, high quality
β Wide Support (Most Browsers):
- OGG Vorbis (.ogg) - All except Safari
- AAC/M4A (.m4a, .aac) - Most browsers
- WebM (.webm) - Chrome, Firefox, Edge
- FLAC (.flac) - Chrome, Edge only (lossless)
- AIFF (.aiff, .aif) - Safari only
Recommendation: Use MP3 for maximum compatibility and small file sizes.
Generated audio is always exported as WAV (uncompressed) for maximum quality and compatibility with audio editing software.
Create a text file with phrases and pause durations:
First phrase; 2
Second phrase; 5
*; 3
Third phrase; 0
- Each line:
phrase; duration_in_seconds - Use
*for silence - Duration can be decimal (e.g.,
2.5)
Pre-commit hooks automatically run:
- ESLint (code quality)
- Prettier (formatting)
- Jest (tests for changed files)
MIT
Jeff Wright [email protected]