Production-Ready Voice Agent Architecture

This project implements a low-latency, scalable AI voice agent using a vendor-neutral architecture. It leverages WebRTC for real-time media streaming, Node.js for secure session management, and Python for the AI reasoning engine (the "Brain").

🏗️ Architecture Overview

The system is divided into three distinct layers to ensure security and performance:

Client Layer (Web): A browser-based interface that captures microphone audio and plays back agent responses via WebRTC.
Signaling Layer (Node.js): A backend service that mints short-lived tokens, keeping sensitive API keys hidden from the client.
Media & Intelligence Layer (Python): An asynchronous worker that orchestrates Speech-to-Text (STT), Large Language Model (LLM) reasoning, and Text-to-Speech (TTS).

🚀 Key Features

Low Latency: Optimized WebRTC transport for sub-second conversational response times.
Secure Token Minting: Server-side credential management to prevent API key exposure.
Resilient Connectivity: Automatic reconnection logic and exponential backoff for network churn.
Modular Brain: Easily swap between LLMs (Gemini, GPT-4, etc.) and voice providers.
Tool Integration: Built-in support for "Client Actions" and server-side tool calling.
Voice Activity Detection: Silero VAD for natural conversational interruptions.

🛠️ Project Structure

voice-agent-app/
├── server.js                # Node.js Signaling/Token Server
├── package.json             # Node dependencies
├── requirements.txt         # Python dependencies
├── .env.example             # Example environment variables
├── public/                  # Frontend assets
│   ├── index.html          # UI Layout
│   └── client.js           # WebRTC client logic
└── agent/                   # AI Logic (The Brain)
    ├── main.py             # Python Agent Worker
    ├── .env                # Agent environment variables
    └── requirements.txt    # Python dependencies

🚦 Getting Started

Prerequisites

Node.js v18 or higher
Python 3.10 or higher
LiveKit Cloud account (or self-hosted instance)
API Keys:
- Google Gemini (for LLM)
- OpenAI (for STT)
- LiveKit credentials

Installation & Setup

1. Clone and Install Dependencies

# Install Node.js dependencies
npm install

# Install Python dependencies
pip install -r requirements.txt

2. Configure Environment Variables

Create a .env file in the root directory:

# LiveKit Configuration
LIVEKIT_URL=wss://your-livekit-instance.com
LIVEKIT_API_KEY=your_livekit_api_key
LIVEKIT_API_SECRET=your_livekit_api_secret

# AI/LLM Configuration
GEMINI_API_KEY=your_gemini_api_key
OPENAI_API_KEY=your_openai_api_key

# Server Configuration
PORT=3000
NODE_ENV=development

3. Run the Signaling Server (Node.js)

node server.js

The server will start on http://localhost:3000

4. Run the AI Agent (Python)

In a separate terminal:

# Create and activate virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: .\venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Run the agent
python agent/main.py

5. Access the Application

Open your browser to http://localhost:3000 and click "Start Call" to begin interacting with the voice agent.

🛡️ Production Readiness Checklist

No secrets in browser (all API keys server-side)
Voice Activity Detection (VAD) for natural conversations
Async/concurrent request handling
Proper resource cleanup on disconnect
CORS security configuration
Environment variable validation
Error handling and logging

🔧 Configuration

Node.js Server Configuration

Variable	Description	Default
`PORT`	Server port	3000
`NODE_ENV`	Environment (development/production)	development
`LIVEKIT_URL`	LiveKit server WebSocket URL	Required
`LIVEKIT_API_KEY`	LiveKit API key	Required
`LIVEKIT_API_SECRET`	LiveKit API secret	Required

Python Agent Configuration

The agent loads configuration from environment variables and can be customized by modifying agent/main.py.

📚 API Endpoints

POST /token

Generates a short-lived token for WebRTC connection.

Request:

{
  "room": "room-name",
  "username": "user-name"
}

Response:

{
  "token": "jwt-token-string"
}

🐛 Troubleshooting

Connection Issues

Verify LiveKit credentials are correct
Check the LIVEKIT_URL is accessible
Ensure firewalls allow WebRTC traffic

Audio Issues

Check microphone permissions in browser
Verify OpenAI API key for STT
Check browser console for errors

Python Agent Not Starting

Verify Python version is 3.10+
Ensure all dependencies are installed: pip install -r requirements.txt
Check environment variables are set correctly

📝 License

Distributed under the MIT License. See LICENSE for more information.

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📧 Support

For issues or questions, please open a GitHub issue or contact the maintainers.

Last Updated: March 2026

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
public		public
.gitignore		.gitignore
Readme.md		Readme.md
package-lock.json		package-lock.json
package.json		package.json
requirements.txt		requirements.txt
server.js		server.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Production-Ready Voice Agent Architecture

🏗️ Architecture Overview

🚀 Key Features

🛠️ Project Structure

🚦 Getting Started

Prerequisites

Installation & Setup

1. Clone and Install Dependencies

2. Configure Environment Variables

3. Run the Signaling Server (Node.js)

4. Run the AI Agent (Python)

5. Access the Application

🛡️ Production Readiness Checklist

🔧 Configuration

Node.js Server Configuration

Python Agent Configuration

📚 API Endpoints

POST /token

🐛 Troubleshooting

Connection Issues

Audio Issues

Python Agent Not Starting

📝 License

🤝 Contributing

📧 Support

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Production-Ready Voice Agent Architecture

🏗️ Architecture Overview

🚀 Key Features

🛠️ Project Structure

🚦 Getting Started

Prerequisites

Installation & Setup

1. Clone and Install Dependencies

2. Configure Environment Variables

3. Run the Signaling Server (Node.js)

4. Run the AI Agent (Python)

5. Access the Application

🛡️ Production Readiness Checklist

🔧 Configuration

Node.js Server Configuration

Python Agent Configuration

📚 API Endpoints

POST /token

🐛 Troubleshooting

Connection Issues

Audio Issues

Python Agent Not Starting

📝 License

🤝 Contributing

📧 Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages