Skip to content

LEVELING2108/VOICE_AGENT_APP_ARCHITECTURE-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Production-Ready Voice Agent Architecture

This project implements a low-latency, scalable AI voice agent using a vendor-neutral architecture. It leverages WebRTC for real-time media streaming, Node.js for secure session management, and Python for the AI reasoning engine (the "Brain").

🏗️ Architecture Overview

The system is divided into three distinct layers to ensure security and performance:

  • Client Layer (Web): A browser-based interface that captures microphone audio and plays back agent responses via WebRTC.
  • Signaling Layer (Node.js): A backend service that mints short-lived tokens, keeping sensitive API keys hidden from the client.
  • Media & Intelligence Layer (Python): An asynchronous worker that orchestrates Speech-to-Text (STT), Large Language Model (LLM) reasoning, and Text-to-Speech (TTS).

🚀 Key Features

  • Low Latency: Optimized WebRTC transport for sub-second conversational response times.
  • Secure Token Minting: Server-side credential management to prevent API key exposure.
  • Resilient Connectivity: Automatic reconnection logic and exponential backoff for network churn.
  • Modular Brain: Easily swap between LLMs (Gemini, GPT-4, etc.) and voice providers.
  • Tool Integration: Built-in support for "Client Actions" and server-side tool calling.
  • Voice Activity Detection: Silero VAD for natural conversational interruptions.

🛠️ Project Structure

voice-agent-app/
├── server.js                # Node.js Signaling/Token Server
├── package.json             # Node dependencies
├── requirements.txt         # Python dependencies
├── .env.example             # Example environment variables
├── public/                  # Frontend assets
│   ├── index.html          # UI Layout
│   └── client.js           # WebRTC client logic
└── agent/                   # AI Logic (The Brain)
    ├── main.py             # Python Agent Worker
    ├── .env                # Agent environment variables
    └── requirements.txt    # Python dependencies

🚦 Getting Started

Prerequisites

  • Node.js v18 or higher
  • Python 3.10 or higher
  • LiveKit Cloud account (or self-hosted instance)
  • API Keys:
    • Google Gemini (for LLM)
    • OpenAI (for STT)
    • LiveKit credentials

Installation & Setup

1. Clone and Install Dependencies

# Install Node.js dependencies
npm install

# Install Python dependencies
pip install -r requirements.txt

2. Configure Environment Variables

Create a .env file in the root directory:

# LiveKit Configuration
LIVEKIT_URL=wss://your-livekit-instance.com
LIVEKIT_API_KEY=your_livekit_api_key
LIVEKIT_API_SECRET=your_livekit_api_secret

# AI/LLM Configuration
GEMINI_API_KEY=your_gemini_api_key
OPENAI_API_KEY=your_openai_api_key

# Server Configuration
PORT=3000
NODE_ENV=development

3. Run the Signaling Server (Node.js)

node server.js

The server will start on http://localhost:3000

4. Run the AI Agent (Python)

In a separate terminal:

# Create and activate virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: .\venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Run the agent
python agent/main.py

5. Access the Application

Open your browser to http://localhost:3000 and click "Start Call" to begin interacting with the voice agent.

🛡️ Production Readiness Checklist

  • No secrets in browser (all API keys server-side)
  • Voice Activity Detection (VAD) for natural conversations
  • Async/concurrent request handling
  • Proper resource cleanup on disconnect
  • CORS security configuration
  • Environment variable validation
  • Error handling and logging

🔧 Configuration

Node.js Server Configuration

Variable Description Default
PORT Server port 3000
NODE_ENV Environment (development/production) development
LIVEKIT_URL LiveKit server WebSocket URL Required
LIVEKIT_API_KEY LiveKit API key Required
LIVEKIT_API_SECRET LiveKit API secret Required

Python Agent Configuration

The agent loads configuration from environment variables and can be customized by modifying agent/main.py.

📚 API Endpoints

POST /token

Generates a short-lived token for WebRTC connection.

Request:

{
  "room": "room-name",
  "username": "user-name"
}

Response:

{
  "token": "jwt-token-string"
}

🐛 Troubleshooting

Connection Issues

  • Verify LiveKit credentials are correct
  • Check the LIVEKIT_URL is accessible
  • Ensure firewalls allow WebRTC traffic

Audio Issues

  • Check microphone permissions in browser
  • Verify OpenAI API key for STT
  • Check browser console for errors

Python Agent Not Starting

  • Verify Python version is 3.10+
  • Ensure all dependencies are installed: pip install -r requirements.txt
  • Check environment variables are set correctly

📝 License

Distributed under the MIT License. See LICENSE for more information.

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📧 Support

For issues or questions, please open a GitHub issue or contact the maintainers.


Last Updated: March 2026

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors