A natural-sounding voice assistant built with LiveKit Agents that seamlessly integrates with any Model Context Protocol (MCP) server. Features LLM-generated conversational responses, intelligent progress announcements, and support for long-running operations.
- Universal MCP Support: Works with any MCP server (SSE, HTTP, stdio transports)
- Natural Conversation: LLM-generated tool announcements that sound human
- Smart Progress Updates: Queued, non-repetitive progress announcements
- Long Operation Support: Handles operations up to 5 minutes with streaming progress
- Automatic Result Unwrapping: Clean JSON data extracted from MCP responses
- Voice-First Design: Optimized for spoken interaction
Instead of robotic phrases like "Let me fetch that data", the agent uses GPT-4o-mini to generate contextual, conversational responses:
User: "Which store had the best sales last year?"
Agent: "I'll look up last year's sales for you" ← Natural, contextual
Agent: "Just pulling up the rankings" ← Different each time
Agent: "Let me find which store performed best" ← Relevant to question
Long-running operations provide streaming progress updates that:
- Are spoken naturally ("I'm analyzing the data now")
- Never repeat the same phrase twice
- Stop immediately when the operation completes
- Don't speak stale updates after the answer is ready
User Voice Input
↓
Deepgram STT
↓
GPT-4o LLM (decides to use tools)
↓
Tool Announcement (GPT-4o-mini generates natural phrase)
↓
MCP Server (via wrapper)
↓ (progress updates)
Progress Queue → Natural Rephrasing → Speech
↓ (on completion, clear queue)
Clean Result → GPT-4o → OpenAI TTS
↓
User Hears Answer
- Python 3.10+
- API keys for OpenAI and Deepgram
- An MCP server endpoint
-
Clone this repository:
git clone <repository-url> cd LiveKit_mcp_client
-
Install dependencies:
pip install -r requirements.txt
-
Create a
.envfile:# API Keys OPENAI_API_KEY=your_openai_api_key DEEPGRAM_API_KEY=your_deepgram_api_key # LiveKit Configuration LIVEKIT_URL=wss://your-livekit-server.livekit.cloud LIVEKIT_API_KEY=your_livekit_api_key LIVEKIT_API_SECRET=your_livekit_api_secret # MCP Server MCP_SERVER_URL=https://your-mcp-server.com/mcp
Development mode (with file watching):
python agent.py devProduction mode:
python agent.py startThe agent supports three MCP transport types:
HTTP (Streamable):
MCPServerConfig(
transport="streamable_http",
url="https://mcp.example.com/mcp",
client_session_timeout=300.0 # 5 minutes for slow operations
)SSE (Server-Sent Events):
MCPServerConfig(
transport="sse",
url="https://mcp.example.com/sse",
sse_read_timeout=300.0
)Stdio (Local Process):
MCPServerConfig(
transport="stdio",
command="uvx",
args=["mcp-server-sqlite", "--db-path", "/path/to/db.sqlite"]
)LiveKit_mcp_client/
├── agent.py # Main agent with natural response generation
├── mcp_client/
│ ├── __init__.py # Public API exports
│ ├── server.py # MCP server configuration and factory
│ └── wrapper.py # Result unwrapping and progress handling
├── requirements.txt
├── .env.example
└── README.md
Wraps LiveKit's MCPServer to:
- Unwrap Results: Extracts clean JSON from
{"type":"text","text":"data"}format - Progress Callbacks: Intercepts MCP progress updates and queues them
- Completion Signals: Notifies when tools finish to clear queues
Uses GPT-4o-mini to generate conversational phrases:
- Tool Announcements: Based on user query and tool name
- Progress Updates: Rephrases technical messages naturally
- Deduplication: Tracks previously used phrases to avoid repetition
- Cost: ~$0.000021 per announcement (negligible)
- Latency: ~200ms (barely noticeable)
Asynchronous queue that:
- Stores progress messages per tool
- Speaks them one at a time with natural pacing
- Stops immediately when tool completes
- Clears unspoken messages to prevent stale announcements
Agent Settings:
FunctionAgent(
llm=openai.LLM(model="gpt-4o"), # Main conversation LLM
max_tool_steps=10, # Max tool calls per turn
client_session_timeout=300.0 # 5 min timeout for tools
)Natural Response Settings:
generate_natural_response(
model="gpt-4o-mini", # Fast, cheap rephrasing
temperature=0.7, # Natural variety
max_tokens=30 # Brief (5-10 words)
)Progress Settings:
DEDUP_WINDOW_SECONDS = 3.0 # Don't repeat within 3sEdit the prompt in generate_natural_response() to change the tone:
prompt = f"""You are a [friendly/professional/energetic] voice assistant.
The user just asked: "{user_query}"
You're about to call a tool named "{tool_name}".
Generate a brief, [casual/formal] phrase..."""Adjust the delay between progress messages:
await asyncio.sleep(0.1) # Default: 100ms between messagesThe agent supports any OpenAI-compatible LLM:
llm=openai.LLM(model="gpt-4o-mini") # Faster, cheaper
# or
llm=openai.LLM(model="gpt-4") # More capableIncrease client_session_timeout:
MCPServerConfig(client_session_timeout=600.0) # 10 minutesIncrease max_tool_steps:
AgentSession(max_tool_steps=20)The agent now tracks phrases automatically, but you can adjust deduplication:
DEDUP_WINDOW_SECONDS = 5.0 # Longer windowThis should not happen with the queue system. Check logs for:
Tool X completed, setting stop flag
Cleared N unspoken progress messages
- Tool Announcement Latency: ~200ms (GPT-4o-mini generation)
- Progress Update Latency: ~200ms per message
- Cost per Announcement: ~$0.000021
- Cost per 1000 conversations: ~$0.05 (assuming 2-3 announcements each)
livekit-agents- Voice agent frameworklivekit-plugins-openai- OpenAI LLM and TTSlivekit-plugins-deepgram- Speech-to-textlivekit-plugins-silero- Voice activity detectionopenai- For natural response generationmcp- Model Context Protocol clientpython-dotenv- Environment variable management
- LiveKit - Real-time voice infrastructure
- OpenAI - GPT-4o and GPT-4o-mini
- Deepgram - Speech-to-text
- Silero VAD - Voice activity detection
- Model Context Protocol - Tool integration standard
MIT License - see LICENSE file for details