Audio Program Generator (Web)

Browser-based audio program generator - creates spoken audio from text with optional background music.

Features

Text-to-Speech: Convert text files to spoken audio
Multiple TTS Engines: OpenAI TTS, Google Cloud TTS, Web Speech API
Background Music: Mix speech with background audio (supports MP3, WAV, OGG, M4A, AAC, FLAC, AIFF)
Audio Export: Download as WAV (uncompressed) or MP3 (7-10x smaller)
Robust Error Handling: Automatic retry with exponential backoff for API failures (details)
Smart Caching: IndexedDB-based caching with easy cache management (details)
Browser-Based: No installation required, runs entirely in your browser
Modern UI: Clean, responsive interface with Pico.css
Internet Required: First load requires internet to download styling and MP3 encoder library

Quick Start

Clone the repository
Install dependencies:
```
npm install
```
Run a local web server:
```
npm run serve
```
Open http://localhost:8080 in your browser
Upload a phrase file and generate audio!

What's New in v1.0.0 🎉

This is the first stable production release! Major improvements include:

Reliability & Error Handling

Automatic Retry Logic: Up to 3 retry attempts with exponential backoff (1s, 2s, 3s) for failed API requests
Smart Error Detection: Detects and handles empty responses, network issues, and API errors gracefully
Detailed Logging: Console logs show exactly what's happening during audio generation
Better Error Messages: Clear, actionable error messages with phrase context

User Experience

Clear Cache Button: Easy one-click cache management in the Output section
Improved Reliability: Successfully handles intermittent API issues and network problems
Progress Tracking: See which phrases are being generated and cached

Testing & Quality

Comprehensive Test Suite: 161 tests covering all major functionality
100% Test Pass Rate: All tests passing with proper mocking
Better Code Quality: ESLint and Prettier enforced via pre-commit hooks

See the full changelog for details.

Text-to-Speech Engines

OpenAI TTS - Simple & Affordable

High quality, pay-as-you-go pricing

💰 Cost: $15/1M characters (standard), $30/1M (HD)
🔑 Setup Required:
1. Create OpenAI account
2. Add payment method (pay-as-you-go, no monthly fees)
3. Create API key
✅ Quality: Excellent (6 natural-sounding voices)
✅ Features: Voice selection, speed control (0.25x-4x)
✅ Export: Full audio export and mixing support
✅ Simplicity: No billing account setup hassles

Pricing Details:

Pay only for what you use
No monthly subscription required
Example: 100K characters = $1.50 (standard) or $3.00 (HD)
Example: 1000 phrases @ 150 chars = $2.25 (standard)

Best for: Users who want simple setup, predictable per-use costs, and excellent quality

Google Cloud Text-to-Speech

High quality, generous free tier, complex setup

High quality, requires API key and billing setup

💰 Cost: Free tier (1M characters/month), then $16/1M characters
🔑 Setup Required:
1. Create Google Cloud account
2. Enable Text-to-Speech API
3. Enable billing (credit card required, even for free tier)
4. Create API key
✅ Quality: Excellent (Neural2/WaveNet voices)
✅ Features: Advanced parameters (pitch, rate, volume), dynamic voice discovery
✅ Export: Full audio export and mixing support

Billing Details:

Free tier: 1 million characters/month (WaveNet/Neural2)
After free tier: $16 per 1 million characters
Example: 6,600 generations of 150-char program = FREE
Example: 10,000 generations = ~$8/month

Best for: Most users (generous free tier), professional quality

Google Translate TTS (gTTS) - Coming Soon

Requires backend proxy (CORS limitation)

✅ Cost: Completely free
❌ Status: Not available in browser-only version
⚠️ Issue: Google Translate endpoint blocks browser requests (CORS)
🔧 Solution: Requires backend server or CORS proxy

Note: This worked in the Python version because it ran server-side. Browser version needs a proxy server to bypass CORS restrictions.

Web Speech API - Browser Playback Only

Free, no setup, limited functionality

✅ Cost: Free
✅ Setup: None
⚠️ Quality: Browser-dependent (Chrome: good, Firefox/Safari: poor)
❌ Export: Not supported (playback only)
❌ Mixing: Not supported

Best for: Quick previews, testing (not recommended for production)

TTS Engine Setup Guides

OpenAI TTS Setup

Time Required: ~2 minutes (one-time setup)

Step 1: Create OpenAI Account

Go to OpenAI Platform
Sign up or sign in with your account
Note: This is separate from ChatGPT Plus subscription

Step 2: Add Payment Method

Click on "Settings" → "Billing"
Click "Add payment method"
Enter your credit card information
Note: You only pay for what you use (no monthly fees)

Setting up usage limits (recommended):

Go to "Settings" → "Billing" → "Usage limits"
Set a monthly budget (e.g., $10)
You'll be notified when you approach the limit

Step 3: Create API Key

Go to API Keys page
Click "Create new secret key"
Configure the key:
- Name: "Audio Program Generator" (or any name you prefer)
- Owned by: Select "You"
- Permissions: Select "All" (includes TTS access)
  - Alternative: Choose "Restricted" and enable "Model capabilities" for TTS-only access
Click "Create secret key"
Copy the API key (it will look like: sk-proj-...)
Important: Save it securely - you won't be able to see it again

Note: The same API key works for all OpenAI services (ChatGPT API, DALL-E, TTS, Whisper, etc.). There's no separate "TTS-only" key type.

Step 4: Use API Key in App

Open the Audio Program Generator in your browser
Select "OpenAI TTS" from the TTS Engine dropdown
Paste your API key in the "OpenAI API Key" field
Click "Save"
The key is stored locally in your browser (not sent anywhere else)

Done! You can now generate high-quality audio with OpenAI TTS.

Available Voices:

Nova (Female - warm and friendly) - Recommended for meditation
Shimmer (Female - soft and gentle) - Great for relaxation
Alloy (Neutral - balanced)
Echo (Male)
Fable (Male - British)
Onyx (Male - deep)

Google Cloud Text-to-Speech Setup

Time Required: ~5 minutes (one-time setup)

Step 1: Create Google Cloud Account

Go to Google Cloud Console
Sign in with your Google account
Accept the terms of service

Step 2: Create a New Project

Click the project dropdown at the top of the page
Click "New Project"
Enter a project name (e.g., "Audio Program Generator")
Click "Create"

Step 3: Enable Text-to-Speech API

In the search bar, type "Text-to-Speech API"
Click on "Cloud Text-to-Speech API"
Click "Enable"
Wait for the API to be enabled (~30 seconds)

Step 4: Enable Billing

⚠️ Required even for free tier

Click the hamburger menu (☰) → "Billing"
Click "Link a billing account" or "Add billing account"
Enter your credit card information
Don't worry: You won't be charged unless you exceed the free tier (1M characters/month)
You can set up billing alerts to notify you before any charges

Setting up billing alerts (recommended):

Go to "Billing" → "Budgets & alerts"
Click "Create budget"
Set budget to $1 (or any amount)
Set alert threshold to 50%, 90%, 100%
You'll receive email alerts if you approach the limit

Step 5: Create API Key

Click the hamburger menu (☰) → "APIs & Services" → "Credentials"
Click "Create Credentials" → "API key"
Copy the API key (it will look like: AIzaSyC...)
Optional but recommended: Click "Restrict key"
- Under "API restrictions", select "Restrict key"
- Select "Cloud Text-to-Speech API"
- Click "Save"

Step 6: Use API Key in App

Open the Audio Program Generator in your browser
Select "Google Cloud TTS" from the TTS Engine dropdown
Paste your API key in the "Google Cloud API Key" field
Click "Save"
The key is stored locally in your browser (not sent anywhere else)

Done! You can now generate high-quality audio with Google Cloud TTS.

Microsoft Edge TTS Setup (Alternative - Coming Soon)

Free alternative with no API key required

Microsoft Edge TTS is an unofficial API that provides free, high-quality neural voices without requiring an API key or billing setup.

Status: Implementation planned for future release

Advantages:

✅ Completely free
✅ No API key required
✅ No billing setup needed
✅ Excellent voice quality (Neural voices)
✅ Works from browser

Limitations:

⚠️ Unofficial API (not officially supported by Microsoft)
⚠️ Could potentially be rate-limited or discontinued
⚠️ Fewer voice options than Google Cloud TTS

When available, this will be a great option for users who want high-quality TTS without the API key setup process.

Troubleshooting

"Invalid API key" error

Make sure you copied the entire API key
Check that you enabled the Text-to-Speech API
Verify the API key restrictions (if set) include Text-to-Speech API

"Billing not enabled" error

Billing must be enabled even for free tier usage
Go to Billing section and link a payment method
Wait a few minutes after enabling billing

"Quota exceeded" error

You've exceeded the free tier (1M characters/month)
Check your usage in Google Cloud Console → APIs & Services → Dashboard
Either wait for the monthly reset or upgrade to paid tier

API key security

Your API key is stored locally in your browser (localStorage)
It's never sent to any server except Google's TTS API
You can clear it anytime by clicking "Clear" in the browser
For production use, consider restricting the API key to specific domains

Phrase File Format

Create a text file with phrases and durations:

Welcome to your audio program; 2
Take a deep breath; 3
*; 2
Relax your shoulders; 3
*; 5
You are calm and centered; 2

Format: phrase; duration_in_seconds

Use * for silence
One phrase per line

Installation

# Clone the repository
git clone https://github.com/jeffwright13/apg-web.git
cd apg-web

# Install dependencies
npm install

# Set up git hooks
npm run prepare

Usage

Local Development

# Start local server
npm run serve

# Open browser to http://localhost:8080

Running Tests

# Run all tests
npm test

# Watch mode
npm test:watch

# Coverage report
npm test:coverage

Code Quality

# Lint code
npm run lint

# Auto-fix linting issues
npm run lint:fix

# Format code
npm run format

# Check formatting
npm run format:check

Supported Audio Formats

Background Music Files

The app supports all common audio formats via the Web Audio API:

✅ Universal Support (All Browsers):

MP3 (.mp3) - Most common format
WAV (.wav) - Uncompressed, high quality

✅ Wide Support (Most Browsers):

OGG Vorbis (.ogg) - All except Safari
AAC/M4A (.m4a, .aac) - Most browsers
WebM (.webm) - Chrome, Firefox, Edge

⚠️ Limited Support:

FLAC (.flac) - Chrome, Edge only (lossless)
AIFF (.aiff, .aif) - Safari only

Recommendation: Use MP3 for maximum compatibility and small file sizes.

Output Format

Generated audio is always exported as WAV (uncompressed) for maximum quality and compatibility with audio editing software.

Phrase File Format

Create a text file with phrases and pause durations:

First phrase; 2
Second phrase; 5
*; 3
Third phrase; 0

Each line: phrase; duration_in_seconds
Use * for silence
Duration can be decimal (e.g., 2.5)

Development

Pre-commit hooks automatically run:

ESLint (code quality)
Prettier (formatting)
Jest (tests for changed files)

License

MIT

Author

Jeff Wright [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
.husky		.husky
docs		docs
examples		examples
samples		samples
scripts		scripts
styles		styles
tests/unit		tests/unit
.gitignore		.gitignore
.prettierrc		.prettierrc
README.md		README.md
eslint.config.js		eslint.config.js
index.html		index.html
jest.config.js		jest.config.js
package-lock.json		package-lock.json
package.json		package.json
test-manual.html		test-manual.html

jeffwright13/apg-web

Folders and files

Latest commit

History

Repository files navigation

Audio Program Generator (Web)

Features

Quick Start

What's New in v1.0.0 🎉

Reliability & Error Handling

User Experience

Testing & Quality

Text-to-Speech Engines

OpenAI TTS - Simple & Affordable

Google Cloud Text-to-Speech

Google Translate TTS (gTTS) - Coming Soon

Web Speech API - Browser Playback Only

TTS Engine Setup Guides

OpenAI TTS Setup

Step 1: Create OpenAI Account

Step 2: Add Payment Method

Step 3: Create API Key

Step 4: Use API Key in App

Google Cloud Text-to-Speech Setup

Step 1: Create Google Cloud Account

Step 2: Create a New Project

Step 3: Enable Text-to-Speech API

Step 4: Enable Billing

Step 5: Create API Key

Step 6: Use API Key in App

Microsoft Edge TTS Setup (Alternative - Coming Soon)

Troubleshooting

"Invalid API key" error

"Billing not enabled" error

"Quota exceeded" error

API key security

Phrase File Format

Installation

Usage

Local Development

Running Tests

Code Quality

Supported Audio Formats

Background Music Files

Output Format

Phrase File Format

Development

License

Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages