An advanced AI pipeline that converts voice messages into Slack DMs using LiveKit, Deepgram STT, Gemini AI processing, and Zapier MCP integration.
- π€ Voice Recording: Capture real-time voice input from your microphone
- π Speech-to-Text: High-quality transcription using Deepgram API
- π€ AI Processing: Message cleanup and enhancement using Google Gemini
- π€ Slack Integration: Direct message delivery via Zapier MCP
- π LiveKit Agent: Full voice agent capabilities (optional)
Voice Input β Deepgram STT β Gemini Processing β Zapier MCP β Slack DM
# Run the automated setup script
python setup.pyThis will:
- β Check Python version compatibility
- β Create virtual environment
- β Install all dependencies
- β Create .env file from template
- β Test microphone access
# Clone the repository (if from git)
git clone <your-repo>
cd livekit-mcp
# Or navigate to your existing project directory
cd d:\AI\agent\livekit-mcp
# Create virtual environment
python -m venv venv
# Activate virtual environment
venv\Scripts\activate # Windows
# source venv/bin/activate # Linux/Mac
# Install dependencies
pip install -r requirements.txtCopy the example environment file:
cp .env.example .envFill in your API keys in .env:
- LiveKit: Get from LiveKit Cloud Dashboard
- Google Gemini: Get from Google AI Studio
- Deepgram: Get from Deepgram Console
- Zapier MCP: Set up Slack integration at Zapier MCP
- Go to Zapier MCP
- Connect your Slack workspace
- Set up "Send Direct Message" action
- Copy the MCP URL to your
.envfile
python voice_pipeline.py- Speak your message when prompted (default 5-10 seconds)
- Review the processed transcript
- Copy the message and tell the AI assistant:
"Send this to Kiruthivarma: [your message]" - Message delivered! π
python starter_agent.py devRun the full LiveKit agent with voice capabilities and Slack integration.
python starter_server.pyTest LiveKit room management via MCP protocol.
livekit-mcp/
βββ voice_pipeline.py # π― Main voice-to-Slack pipeline
βββ starter_agent.py # π€ LiveKit agent with Slack functions
βββ starter_server.py # π LiveKit MCP server
βββ test_credentials.py # π§ API credentials testing
βββ setup.py # π Automated setup script
βββ requirements.txt # π¦ Python dependencies
βββ .env.example # π Environment template
βββ .env # π Your API keys (keep private!)
βββ README.md # π This file
- Duration: Configurable (default 5 seconds)
- Quality: 16kHz, mono, 16-bit
- Format: WAV (temporary file, auto-cleaned)
- Model: Nova-2 (high accuracy)
- Language: English
- Real-time: Yes
- Model: gemini-2.0-flash-exp
- Purpose: Clean up transcription, fix grammar
- Output: Natural, readable message
- Method: Direct Message
- Target: Configurable user (e.g., Kiruthivarma)
- Format: Text with microphone emoji ποΈ
# Create and activate virtual environment
python -m venv venv
venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Copy environment template
copy .env.example .env
# Edit .env with your API keys
notepad .env# Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Copy environment template
cp .env.example .env
# Edit .env with your API keys
nano .env# In voice_pipeline.py
duration = 5 # Recording duration in seconds
sample_rate = 16000 # Audio quality# Customize the processing prompt
prompt = "Clean up this transcription and format as a clear message..."# Change target user in the MCP call
channel="username_here"- No microphone detected: Check
sounddeviceinstallation - Permission denied: Grant microphone access to terminal/IDE
- Poor quality: Adjust
sample_rateorduration
- 401 Unauthorized: Check API keys in
.env - Quota exceeded: Verify API usage limits
- Network errors: Check internet connection
- Messages not appearing:
- Check if user exists in workspace
- Verify Zapier app permissions
- Look for messages from "Zapier" bot
- Wrong recipient: Update username in MCP call
- 401 on LiveKit: Verify
LIVEKIT_API_KEYandLIVEKIT_API_SECRET - Connection refused: Check
LIVEKIT_URLformat - Module not found: Install missing dependencies
| Service | Usage | Typical Cost |
|---|---|---|
| Deepgram STT | ~5 sec audio | ~$0.0004 |
| Google Gemini | Text processing | ~$0.0001 |
| LiveKit | Agent hosting | Variable |
| Zapier | Message sending | Free tier available |
- Never commit
.envto version control - Rotate API keys regularly
- Use environment-specific configurations
- Limit API key permissions where possible
Extend voice_pipeline.py to recognize specific commands:
if "urgent" in transcript.lower():
priority = "π¨ URGENT: "Add support for different Slack users:
recipients = {
"kiruthivarma": "Kiruthivarma",
"team": ["user1", "user2", "user3"]
}Integrate with LiveKit agent for two-way conversations:
await context.session.say("Message sent to Kiruthivarma!")- Fork the repository
- Create a feature branch
- Add your improvements
- Test thoroughly
- Submit a pull request
This project is licensed under the MIT License.
- LiveKit for real-time communication infrastructure
- Deepgram for high-quality speech recognition
- Google Gemini for AI text processing
- Zapier for seamless integrations
"Just built a voice-to-Slack pipeline that lets me send messages by just speaking! The AI cleans up my speech and delivers it perfectly. FUCKING LEGENDARY! ππ₯πͺ" - Happy User
Ready to revolutionize your communication workflow? Start speaking to Slack today! ποΈβ¨
- Python 3.8+ installed
- Virtual environment created and activated
- Dependencies installed from requirements.txt
- .env file created and configured with all API keys
- Microphone permissions granted
- Zapier MCP Slack integration configured
- Test run completed successfully
If you encounter issues:
- Check the troubleshooting section above
- Verify all API keys are correctly set
- Ensure microphone permissions are granted
- Test individual components (Deepgram, Gemini, Zapier) separately
# Automated setup (first time only)
python setup.py
# Test API credentials
python test_credentials.py
# Test voice recording only
python -c "import sounddevice as sd, numpy as np; print('Recording...'); audio = sd.rec(int(5 * 16000), samplerate=16000, channels=1); sd.wait(); print('Recording complete!')"
# Run main pipeline
python voice_pipeline.py