This project implements a low-latency, scalable AI voice agent using a vendor-neutral architecture. It leverages WebRTC for real-time media streaming, Node.js for secure session management, and Python for the AI reasoning engine (the "Brain").
The system is divided into three distinct layers to ensure security and performance:
- Client Layer (Web): A browser-based interface that captures microphone audio and plays back agent responses via WebRTC.
- Signaling Layer (Node.js): A backend service that mints short-lived tokens, keeping sensitive API keys hidden from the client.
- Media & Intelligence Layer (Python): An asynchronous worker that orchestrates Speech-to-Text (STT), Large Language Model (LLM) reasoning, and Text-to-Speech (TTS).
- Low Latency: Optimized WebRTC transport for sub-second conversational response times.
- Secure Token Minting: Server-side credential management to prevent API key exposure.
- Resilient Connectivity: Automatic reconnection logic and exponential backoff for network churn.
- Modular Brain: Easily swap between LLMs (Gemini, GPT-4, etc.) and voice providers.
- Tool Integration: Built-in support for "Client Actions" and server-side tool calling.
- Voice Activity Detection: Silero VAD for natural conversational interruptions.
voice-agent-app/
├── server.js # Node.js Signaling/Token Server
├── package.json # Node dependencies
├── requirements.txt # Python dependencies
├── .env.example # Example environment variables
├── public/ # Frontend assets
│ ├── index.html # UI Layout
│ └── client.js # WebRTC client logic
└── agent/ # AI Logic (The Brain)
├── main.py # Python Agent Worker
├── .env # Agent environment variables
└── requirements.txt # Python dependencies
- Node.js v18 or higher
- Python 3.10 or higher
- LiveKit Cloud account (or self-hosted instance)
- API Keys:
- Google Gemini (for LLM)
- OpenAI (for STT)
- LiveKit credentials
# Install Node.js dependencies
npm install
# Install Python dependencies
pip install -r requirements.txtCreate a .env file in the root directory:
# LiveKit Configuration
LIVEKIT_URL=wss://your-livekit-instance.com
LIVEKIT_API_KEY=your_livekit_api_key
LIVEKIT_API_SECRET=your_livekit_api_secret
# AI/LLM Configuration
GEMINI_API_KEY=your_gemini_api_key
OPENAI_API_KEY=your_openai_api_key
# Server Configuration
PORT=3000
NODE_ENV=developmentnode server.jsThe server will start on http://localhost:3000
In a separate terminal:
# Create and activate virtual environment
python -m venv venv
source venv/bin/activate # On Windows: .\venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Run the agent
python agent/main.pyOpen your browser to http://localhost:3000 and click "Start Call" to begin interacting with the voice agent.
- No secrets in browser (all API keys server-side)
- Voice Activity Detection (VAD) for natural conversations
- Async/concurrent request handling
- Proper resource cleanup on disconnect
- CORS security configuration
- Environment variable validation
- Error handling and logging
| Variable | Description | Default |
|---|---|---|
PORT |
Server port | 3000 |
NODE_ENV |
Environment (development/production) | development |
LIVEKIT_URL |
LiveKit server WebSocket URL | Required |
LIVEKIT_API_KEY |
LiveKit API key | Required |
LIVEKIT_API_SECRET |
LiveKit API secret | Required |
The agent loads configuration from environment variables and can be customized by modifying agent/main.py.
Generates a short-lived token for WebRTC connection.
Request:
{
"room": "room-name",
"username": "user-name"
}Response:
{
"token": "jwt-token-string"
}- Verify LiveKit credentials are correct
- Check the LIVEKIT_URL is accessible
- Ensure firewalls allow WebRTC traffic
- Check microphone permissions in browser
- Verify OpenAI API key for STT
- Check browser console for errors
- Verify Python version is 3.10+
- Ensure all dependencies are installed:
pip install -r requirements.txt - Check environment variables are set correctly
Distributed under the MIT License. See LICENSE for more information.
Contributions are welcome! Please feel free to submit a Pull Request.
For issues or questions, please open a GitHub issue or contact the maintainers.
Last Updated: March 2026