This example demonstrates an end-to-end (E2E) intelligent voice assistant using NeMo Agent toolkit with WebRTC for real-time speech-to-speech interaction. It showcases a flower shop assistant with custom function registration, voice processing pipeline using NVIDIA Pipecat, and comprehensive observability with Phoenix tracing.
- Custom Function Registration: Demonstrates custom function creation using the NeMo Agent toolkit registration system
- Flower Shop Assistant: Interactive menu browsing, pricing, and cart management functionality
- ReWOO Agent: Uses planning-based approach for efficient task decomposition and execution
- Voice-to-Voice Pipeline: Real-time WebRTC-based speech interaction using NVIDIA Pipecat
- RESTful API Deployment: Production-ready API deployment using
nat serve - Phoenix Tracing: Comprehensive observability with Phoenix tracing and monitoring
- Workflow Profiling: Built-in profiling capabilities to analyze performance bottlenecks and optimize workflows
- Evaluation System: Comprehensive evaluation tools to validate and maintain accuracy of agentic workflows
-
Clone the voice-agent-examples repository:
git clone https://github.com/NVIDIA/voice-agent-examples.git
-
Navigate to the example directory:
cd voice-agent-examples/examples/nat_agent -
Copy and configure the environment file:
cp env.example .env # and add your credentials -
Setup API keys in .env file:
Ensure you have the required API keys:
- NVIDIA_API_KEY - Required for accessing NIM ASR, TTS and LLM models
Refer to https://build.nvidia.com/ for generating your API keys.
Edit the .env file to add your keys or export using:
export NVIDIA_API_KEY=<YOUR_API_KEY>
-
Deploy Coturn Server if required
If you want to share widely or want to deploy on cloud platforms, you will need to setup coturn server. Follow instructions below for modifications required in example code for using coturn:
Update HOST_IP_EXTERNAL with your machine IP and run the below command:
docker run -d --network=host instrumentisto/coturn -n --verbose --log-file=stdout --external-ip=<HOST_IP_EXTERNAL> --listening-ip=0.0.0.0 --lt-cred-mech --fingerprint --user=admin:admin --no-multicast-peers --realm=tokkio.realm.org --min-port=51000 --max-port=52000
Add the following configuration to your
bot.pyfile to use the coturn server:ice_servers = [ IceServer( urls="turn:<HOST_IP_EXTERNAL>:3478", username="admin", credential="admin" ) ]
Add the following configuration to your
../webrtc_ui/src/config.tsfile to use the coturn server:export const RTC_CONFIG: ConstructorParameters<typeof RTCPeerConnection>[0] = { iceServers: [ { urls: "turn:<HOST_IP_EXTERNAL>:3478", username: "admin", credential: "admin", }, ], };
For more information, see the turn-server documentation at https://webrtc.org/getting-started/turn-server.
-
Deploy the application with either of the options (note: Phoenix tracing is enabled by default; see Advanced: Phoenix Deployment for details).
-
You have access and are logged into NVIDIA NGC. For step-by-step instructions, refer to the NGC Getting Started Guide.
-
You have access to an NVIDIA Turing™, NVIDIA Ampere (e.g., A100), NVIDIA Hopper (e.g., H100), NVIDIA Ada (e.g., L40S), or the latest NVIDIA GPU architectures. For more information, refer to the Support Matrix.
-
You have Docker installed with support for NVIDIA GPUs. For more information, refer to the Support Matrix.
export NGC_API_KEY=nvapi-... # <insert your key>
docker login nvcr.ioFrom the examples/nat_agent directory, run below commands:
docker compose up --build -dDocker deployment might take 30-45 minutes first time. Once all services are up and running, visit http://<machine-ip>:9000/ in your browser to start interacting with the application. See the next sections for detailed instructions on interacting with the app.
- Python (>=3.11, <3.13)
- uv
All Python dependencies are listed in separate pyproject.toml files for agent and bot components.
# Navigate to agent directory
cd agent
# Create a virtual environment
uv venv
# Install agent dependencies
source .venv/bin/activate
uv sync
uv pip install -e .
# Edit configs/config.yml to update LLM endpoints as per your deployment
# Update the 'llm' section with your specific model endpoints, API keys, and model names
# Start the NAT agent service
nat serve --config_file configs/config.yml --host 0.0.0.0 --port 8000The agent service will start and be available at http://localhost:8000. You can view the auto-generated API documentation at http://localhost:8000/docs.
Agent Deployment: See agent/README.md for comprehensive agent configuration, deployment options, troubleshooting, and advanced features
In a new terminal, from the main nat_agent directory:
# Install bot dependencies
source .venv/bin/activate
uv sync
uv pip install -e .
# Start the voice bot server
python bot.pyConnect through the voice bot interface for real-time speech interaction. For detailed setup instructions, see WebRTC UI README
visit http://localhost:5173/ in your browser to start interacting with the application. See the next sections for detailed instructions on interacting with the app.
Note: To enable microphone access in Chrome, go to chrome://flags/, enable "Insecure origins treated as secure", add http://<machine-ip>:9000 (for docker method), http://localhost:5173/ (for python method) to the list, and restart Chrome.
You can interact with the application through:
- REST API: Use HTTP requests to interact with the agent directly
- Console Interface: Use
nat runfor text-based interaction
Use curl to test the deployed NAT agent service:
curl -X 'POST' \
'http://localhost:8000/generate' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"input_message": "What flowers do you have available and what are your prices?"
}'Before testing, update the LLM endpoints in your configuration file to match your deployment:
cd agent
# Edit configs/config.yml to update LLM endpoints as per your deployment
# Update the 'llm' section with your specific model endpoints, API keys, and model namesTest your workflow using the nat run command:
nat run --config_file configs/config.yml --input "Hello, can you show me the menu?"This example is organized into two main components:
-
agent/- Contains all files required for NAT agent deployment as a standalone service- Agent source code, configurations, and deployment files
- Separate
pyproject.tomlwith agent-specific dependencies (NAT, LangChain, etc.) - Docker setup for containerized deployment
- Complete documentation in agent/README.md
-
Root directory - Contains the voice bot interface (
bot.py,bot_websocket.py)- Use
bot.pyfor WebRTC-based voice interface for real-time interaction - Use
bot_websocket.pyfor websocket based voice pipeline, recommended for evaluation and performance testing - Separate
pyproject.tomlwith bot-specific dependencies (FastAPI, Pipecat, etc.) - Integration with NVIDIA Pipecat for voice processing
- Use
Phoenix tracing is already configured in your workflow. Start the Phoenix server to view traces:
# In a new terminal
phoenix servePhoenix will be available at:
- Phoenix UI: http://0.0.0.0:6006
- Trace Endpoint: http://0.0.0.0:6006/v1/traces
Troubleshooting: If phoenix serve shows any errors, consider deleting the Phoenix database file.
To run performance scripts and voice based evaluation, it will be better to use websocket transport based voice agent pipeline. You can make the switch by uncommenting code in docker-compose.yml.
- Change the command for the
python-appservice to use the websocket based NAT agent pipeline frombot_websocket.py - Mount the websocket based UI page and update the
STATIC_PATHenvironment variable inpython-app - Comment out the
ui-appservice used for WebRTC UI - Deploy service using updated
docker-compose.ymland access websocket UI page athttp://HOST_IP:8100/static/index.html.
To run the profiler and evaluator, use the nat eval command with the workflow configuration file. The profiler will collect usage statistics and the evaluator will assess workflow accuracy, storing results in the output directory specified in the configuration file.
cd agent
nat eval --config_file configs/config.ymlSpeculative speech processing reduces bot response latency by working directly on Nemotron Speech ASR early interim user transcripts instead of waiting for final transcripts. This feature only works when using Nemotron Speech ASR. Currently set to true.
- Toggle using the environment variable
ENABLE_SPECULATIVE_SPEECH.- Docker Compose: set in
python-app.environment(default istrue)environment: - ENABLE_SPECULATIVE_SPEECH=${ENABLE_SPECULATIVE_SPEECH:-false}
- Local run: export before launching
export ENABLE_SPECULATIVE_SPEECH=false # or true python bot.py
- Docker Compose: set in
- The application will automatically switch processors based on this flag; no code edits needed.
- See the Documentation on Speculative Speech Processing for more details.
You may customize ASR (Automatic Speech Recognition), Agent (Patient Front Desk Assistant), and TTS (Text-to-Speech) services by configuring environment variables. This allows you to switch between NIM cloud-hosted models and locally deployed models.
The following environment variables control the endpoints and models:
ASR_SERVER_URL: Address of the Nemotron Speech ASR (speech-to-text) service (e.g.,localhost:50051for local, "grpc.nvcf.nvidia.com:443" for cloud endpoint).TTS_SERVER_URL: Address of the Nemotron Speech TTS (text-to-speech) service. (e.g.,localhost:50051for local, "grpc.nvcf.nvidia.com:443" for cloud endpoint).
You can set model, language, and voice using the ASR_MODEL_NAME, TTS_MODEL_NAME, ASR_LANGUAGE, TTS_LANGUAGE, and TTS_VOICE_ID environment variables.
Update these variables in your Docker Compose configuration to match your deployment and desired models. For more details on available models and configuration options, refer to the NIM NVIDIA Magpie, NIM NVIDIA Parakeet documentation.
Follow these steps to configure and use the latest Zero-shot Magpie TTS model:
-
Update environment variables
Set
TTS_DOCKER_IMAGEto actual image tag<magpie-tts-zeroshot-image:version>.Then, configure the settings found in the
Zero-shot TTS Magpie Modelsection of your env file.Make sure your NVIDIA_API_KEY, with access to the zero-shot model, is correctly entered in your
.envfile. -
Configuring Zero-shot Audio Prompt
To use your own custom voice with zero-shot TTS:
-
Place your desired audio sample in the workspace directory.
-
Mount the audio file into your container by adding a volume in your
docker-compose.ymlunder thepython-appservice:services: python-app: # ... existing code ... volumes: - ./audio_prompts:/app/audio_prompts
-
In your
.envfile, set theZERO_SHOT_AUDIO_PROMPTvariable to its path (relative to your application's root):ZERO_SHOT_AUDIO_PROMPT=audio_prompts/voice_sample.wav# Path relative to app root
Note: The zero-shot audio prompt is only required when using the Magpie Zero-shot model. For standard Magpie multilingual models, this configuration should be omitted.
-
-
Set TTS Environment Variables
In
.env(forpython-app), update:TTS_VOICE_ID=Magpie-ZeroShot.Female-1 TTS_MODEL_NAME=magpie_tts_ensemble-Magpie-ZeroShot
To capture raw audio streams for debugging ASR/TTS quality issues:
# In .env file
ENABLE_ASR_AUDIO_DUMP=true # Save input audio
ENABLE_TTS_AUDIO_DUMP=true # Save output audio
AUDIO_DUMP_PATH=./audio_dumpsAudio files are saved as WAV format with stream ID for correlation.
Permission Issues: If Docker creates the audio_dumps folder with different user permissions, accessing it later via Python deployment or another Docker container may fail. To resolve:
- Pre-create the folder before enabling:
mkdir -p ./audio_dumps - Or fix existing permissions:
sudo chown -R $(id -u):$(id -g) ./audio_dumps