Go2 Visitor Assistant is a reception workflow for a Unitree Go2 robot. It combines face recognition, RAI/LangChain-based dialogue orchestration, rosbridge navigation commands, and text or voice interaction.
The current runtime uses one assistant workflow for both text and voice. Voice mode performs OpenAI Realtime transcription, routes the transcript through the local assistant and tools, synthesizes the assistant response with OpenAI TTS, and plays it through the Go2 speaker backend.
- Visitor identification from RealSense or Unitree front camera input
- One-image-per-person face enrollment with safe augmentation
- Local WebSocket recognition service for assistant tool calls
- RAI/LangChain reception agent with phase-gated tools
- Host and destination resolution from YAML configuration
- Navigation goal publishing to
/goal_posethrough rosbridge - Nav2 action status monitoring through a configurable status topic
- Optional follow verification during escort using YOLO person detection
- Text terminal mode and direct Go2 speaker voice mode
.
├── config/ # destinations, visitor registry, rosbridge presets
├── data/
│ ├── enrolled_images/ # one subfolder per enrolled visitor
│ └── gallery/ # generated gallery artifacts
├── scripts/ # launchers for assistant, recognition, and rosbridge
├── src/go2_visitor_assistant/
│ ├── assistant/ # text and voice assistant runtimes
│ ├── domain/ # shared dataclasses and session models
│ ├── infra/ # rosbridge connector
│ ├── recognition/ # enrollment, identification, camera, and WS service
│ └── tools/ # agent tool runtimes
├── requirements.txt # assistant runtime dependencies
└── requirements-recognition.txt # recognition runtime dependencies
Runtime data, debug captures, .env, virtual environments, generated galleries, and enrolled face images should stay local and are ignored by Git.
- The visitor starts a conversation through text input or microphone speech.
- The assistant agent decides when to call the recognition tool.
- Recognition runs locally or through the WebSocket recognition service.
- If the visitor is recognized, the assistant loads the visitor profile and confirms the host or destination.
- If the visitor is unknown, the assistant asks for the target host or destination.
- The destination resolver maps the host or location to a configured navigation target.
- The navigation runtime publishes a
geometry_msgs/PoseStampedgoal to/goal_pose. - During escort, optional follow verification can warn the visitor if they stop following.
- After the visit flow completes, the assistant can return the robot to the configured arrival pose.
- Python 3.11 for the assistant runtime
- System Python with RealSense and GPU-compatible recognition dependencies for the recognition runtime
- ROS 2 Foxy and
rosbridge_serveron the robot - A reachable rosbridge WebSocket endpoint
- OpenAI API key
- RAI source checkout or installed RAI dependencies
- Unitree Go2 speaker backend for voice mode
- RealSense camera or Unitree front camera access for face capture
On Jetson or other aarch64 systems, install a compatible onnxruntime-gpu wheel manually if PyPI does not provide one for the target platform.
Create the assistant environment:
cd <project-root>
python3.11 -m venv .venv
source .venv/bin/activate
pip install -U pip
pip install -r requirements.txt
cp .env.example .envInstall the recognition dependencies in the Python runtime that has camera and GPU access:
cd <project-root>
<recognition-python> -m pip install --user -U pip
<recognition-python> -m pip install --user -r requirements-recognition.txtSet at least these values in .env:
OPENAI_API_KEY=<your-api-key>
ROSBRIDGE_HOST=<robot-or-rosbridge-host>
ROSBRIDGE_PORT=9090
RECOGNITION_SERVICE_PYTHON=<recognition-python>If RAI is not installed in the assistant environment, point RAI_CORE_SRC at the local RAI source tree:
RAI_CORE_SRC=<path-to-rai-core-source>For voice mode, configure the Go2 speaker backend and microphone:
GO2_IP=<robot-ip>
UNITREE_GO2_CONTROLLER_SRC=<path-to-go2-controller-src>
UNITREE_GO2_CONTROLLER_SITE_PACKAGES=<path-to-go2-controller-site-packages>
VOICE_INPUT_DEVICE=<microphone-name-or-index>Main configuration files:
config/nav_destinations.yamldefines named navigation poses, including the arrival pose.config/visitor_registry.yamlmaps enrolled visitor IDs to names, hosts, and default destinations.config/rosbridge_nav.yamlandconfig/rosbridge_slam.yamlprovide rosbridge parameter presets..envcontrols API credentials, model names, camera source, recognition transport, navigation timeouts, and voice settings.
Path variables in .env may be relative to the project root when launched through the provided scripts.
Enrollment expects exactly one image per person folder:
data/enrolled_images/
alice/
photo.jpg
bob/
photo.png
Build the recognition gallery:
./scripts/run_enroll.sh \
--images-dir data/enrolled_images \
--gallery-out data/gallery/gallery_arcface.npz \
--jsonThis also writes gallery metadata next to the .npz file.
Run the recognition WebSocket service directly:
./scripts/run_recognition_service.shThe service supports:
identify_current_visitorfollow_check
Text and voice launchers start this service automatically when RECOGNITION_WS_HOST is local and the configured port is not already listening.
Capture and identify a visitor without starting the assistant:
./scripts/run_identify.sh \
--gallery data/gallery/gallery_arcface.npz \
--burst-frames 5 \
--jsonTo save captured frames for debugging:
./scripts/run_identify.sh \
--gallery data/gallery/gallery_arcface.npz \
--save-burst-dir debug/bursts/run_001 \
--jsonSet FACE_CAPTURE_SOURCE=realsense or FACE_CAPTURE_SOURCE=unitree_front in .env to choose the camera backend.
Start text mode:
./scripts/run_assistant_text.shStart voice mode:
./scripts/run_assistant_voice.shVoice mode uses:
- OpenAI Realtime transcription for microphone input
- The same local assistant and tool workflow as text mode
- OpenAI TTS for assistant responses
- Direct Go2 speaker playback through the Unitree controller backend
Start rosbridge for navigation:
./scripts/start_rosbridge_nav.shStart rosbridge for SLAM:
./scripts/start_rosbridge_slam.shBoth scripts source the ROS 2 Foxy environment, force system Python, and launch rosbridge_websocket with the matching parameter file.
The agent is limited to these deterministic tools:
identify_current_visitorget_visitor_profileresolve_host_or_destinationnavigate_to_destinationreturn_to_arrival_poseget_session_contextend_session
The model controls the conversation wording. Tool execution remains constrained by the session phase and local configuration.
- Generate
data/gallery/gallery_arcface.npzbefore starting the assistant. - Keep enrolled face images and generated gallery files out of Git unless your deployment policy explicitly allows storing biometric data.
- Keep
.envlocal; it may contain credentials, robot addresses, and machine-specific paths. - Navigation assumes one active goal at a time.
- Voice mode requires a working microphone device and Go2 speaker backend imports.
- Follow verification requires the configured YOLO model file and camera access during navigation.