LiveKit Voice Agent with MCP Integration

A natural-sounding voice assistant built with LiveKit Agents that seamlessly integrates with any Model Context Protocol (MCP) server. Features LLM-generated conversational responses, intelligent progress announcements, and support for long-running operations.

Key Features

Universal MCP Support: Works with any MCP server (SSE, HTTP, stdio transports)
Natural Conversation: LLM-generated tool announcements that sound human
Smart Progress Updates: Queued, non-repetitive progress announcements
Long Operation Support: Handles operations up to 5 minutes with streaming progress
Automatic Result Unwrapping: Clean JSON data extracted from MCP responses
Voice-First Design: Optimized for spoken interaction

How It Works

Natural Language Tool Announcements

Instead of robotic phrases like "Let me fetch that data", the agent uses GPT-4o-mini to generate contextual, conversational responses:

User: "Which store had the best sales last year?"
Agent: "I'll look up last year's sales for you"    ← Natural, contextual
Agent: "Just pulling up the rankings"              ← Different each time
Agent: "Let me find which store performed best"    ← Relevant to question

Intelligent Progress Management

Long-running operations provide streaming progress updates that:

Are spoken naturally ("I'm analyzing the data now")
Never repeat the same phrase twice
Stop immediately when the operation completes
Don't speak stale updates after the answer is ready

Architecture

User Voice Input
    ↓
Deepgram STT
    ↓
GPT-4o LLM (decides to use tools)
    ↓
Tool Announcement (GPT-4o-mini generates natural phrase)
    ↓
MCP Server (via wrapper)
    ↓ (progress updates)
Progress Queue → Natural Rephrasing → Speech
    ↓ (on completion, clear queue)
Clean Result → GPT-4o → OpenAI TTS
    ↓
User Hears Answer

Installation

Prerequisites

Python 3.10+
API keys for OpenAI and Deepgram
An MCP server endpoint

Setup

Clone this repository:

git clone <repository-url>
cd LiveKit_mcp_client

Install dependencies:
```
pip install -r requirements.txt
```

Create a .env file:

# API Keys
OPENAI_API_KEY=your_openai_api_key
DEEPGRAM_API_KEY=your_deepgram_api_key

# LiveKit Configuration
LIVEKIT_URL=wss://your-livekit-server.livekit.cloud
LIVEKIT_API_KEY=your_livekit_api_key
LIVEKIT_API_SECRET=your_livekit_api_secret

# MCP Server
MCP_SERVER_URL=https://your-mcp-server.com/mcp

Usage

Running the Agent

Development mode (with file watching):

python agent.py dev

Production mode:

python agent.py start

Configuration

The agent supports three MCP transport types:

HTTP (Streamable):

MCPServerConfig(
    transport="streamable_http",
    url="https://mcp.example.com/mcp",
    client_session_timeout=300.0  # 5 minutes for slow operations
)

SSE (Server-Sent Events):

MCPServerConfig(
    transport="sse",
    url="https://mcp.example.com/sse",
    sse_read_timeout=300.0
)

Stdio (Local Process):

MCPServerConfig(
    transport="stdio",
    command="uvx",
    args=["mcp-server-sqlite", "--db-path", "/path/to/db.sqlite"]
)

Project Structure

LiveKit_mcp_client/
├── agent.py                    # Main agent with natural response generation
├── mcp_client/
│   ├── __init__.py            # Public API exports
│   ├── server.py              # MCP server configuration and factory
│   └── wrapper.py             # Result unwrapping and progress handling
├── requirements.txt
├── .env.example
└── README.md

Technical Details

MCP Server Wrapper

Wraps LiveKit's MCPServer to:

Unwrap Results: Extracts clean JSON from {"type":"text","text":"data"} format
Progress Callbacks: Intercepts MCP progress updates and queues them
Completion Signals: Notifies when tools finish to clear queues

Natural Response Generation

Uses GPT-4o-mini to generate conversational phrases:

Tool Announcements: Based on user query and tool name
Progress Updates: Rephrases technical messages naturally
Deduplication: Tracks previously used phrases to avoid repetition
Cost: ~$0.000021 per announcement (negligible)
Latency: ~200ms (barely noticeable)

Progress Queue System

Asynchronous queue that:

Stores progress messages per tool
Speaks them one at a time with natural pacing
Stops immediately when tool completes
Clears unspoken messages to prevent stale announcements

Configuration Options

Agent Settings:

FunctionAgent(
    llm=openai.LLM(model="gpt-4o"),         # Main conversation LLM
    max_tool_steps=10,                       # Max tool calls per turn
    client_session_timeout=300.0             # 5 min timeout for tools
)

Natural Response Settings:

generate_natural_response(
    model="gpt-4o-mini",                     # Fast, cheap rephrasing
    temperature=0.7,                         # Natural variety
    max_tokens=30                            # Brief (5-10 words)
)

Progress Settings:

DEDUP_WINDOW_SECONDS = 3.0                  # Don't repeat within 3s

Customization

Adjusting Tool Announcements

Edit the prompt in generate_natural_response() to change the tone:

prompt = f"""You are a [friendly/professional/energetic] voice assistant.
The user just asked: "{user_query}"
You're about to call a tool named "{tool_name}".

Generate a brief, [casual/formal] phrase..."""

Changing Progress Pacing

Adjust the delay between progress messages:

await asyncio.sleep(0.1)  # Default: 100ms between messages

Using Different LLMs

The agent supports any OpenAI-compatible LLM:

llm=openai.LLM(model="gpt-4o-mini")  # Faster, cheaper
# or
llm=openai.LLM(model="gpt-4")        # More capable

Troubleshooting

"Tool execution timed out"

Increase client_session_timeout:

MCPServerConfig(client_session_timeout=600.0)  # 10 minutes

"Maximum tool steps reached"

Increase max_tool_steps:

AgentSession(max_tool_steps=20)

Repetitive announcements

The agent now tracks phrases automatically, but you can adjust deduplication:

DEDUP_WINDOW_SECONDS = 5.0  # Longer window

Progress updates continue after answer

This should not happen with the queue system. Check logs for:

Tool X completed, setting stop flag
Cleared N unspoken progress messages

Performance

Tool Announcement Latency: ~200ms (GPT-4o-mini generation)
Progress Update Latency: ~200ms per message
Cost per Announcement: ~$0.000021
Cost per 1000 conversations: ~$0.05 (assuming 2-3 announcements each)

Dependencies

livekit-agents - Voice agent framework
livekit-plugins-openai - OpenAI LLM and TTS
livekit-plugins-deepgram - Speech-to-text
livekit-plugins-silero - Voice activity detection
openai - For natural response generation
mcp - Model Context Protocol client
python-dotenv - Environment variable management

Acknowledgements

LiveKit - Real-time voice infrastructure
OpenAI - GPT-4o and GPT-4o-mini
Deepgram - Speech-to-text
Silero VAD - Voice activity detection
Model Context Protocol - Tool integration standard

License

MIT License - see LICENSE file for details

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
mcp_client		mcp_client
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
agent.py		agent.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

LiveKit Voice Agent with MCP Integration

Key Features

How It Works

Natural Language Tool Announcements

Intelligent Progress Management

Architecture

Installation

Prerequisites

Setup

Usage

Running the Agent

Configuration

Project Structure

Technical Details

MCP Server Wrapper

Natural Response Generation

Progress Queue System

Configuration Options

Customization

Adjusting Tool Announcements

Changing Progress Pacing

Using Different LLMs

Troubleshooting

"Tool execution timed out"

"Maximum tool steps reached"

Repetitive announcements

Progress updates continue after answer

Performance

Dependencies

Acknowledgements

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages