Go2 Visitor Assistant

Go2 Visitor Assistant is a reception workflow for a Unitree Go2 robot. It combines face recognition, RAI/LangChain-based dialogue orchestration, rosbridge navigation commands, and text or voice interaction.

The current runtime uses one assistant workflow for both text and voice. Voice mode performs OpenAI Realtime transcription, routes the transcript through the local assistant and tools, synthesizes the assistant response with OpenAI TTS, and plays it through the Go2 speaker backend.

Capabilities

Visitor identification from RealSense or Unitree front camera input
One-image-per-person face enrollment with safe augmentation
Local WebSocket recognition service for assistant tool calls
RAI/LangChain reception agent with phase-gated tools
Host and destination resolution from YAML configuration
Navigation goal publishing to /goal_pose through rosbridge
Nav2 action status monitoring through a configurable status topic
Optional follow verification during escort using YOLO person detection
Text terminal mode and direct Go2 speaker voice mode

Project Layout

.
├── config/                    # destinations, visitor registry, rosbridge presets
├── data/
│   ├── enrolled_images/        # one subfolder per enrolled visitor
│   └── gallery/                # generated gallery artifacts
├── scripts/                    # launchers for assistant, recognition, and rosbridge
├── src/go2_visitor_assistant/
│   ├── assistant/              # text and voice assistant runtimes
│   ├── domain/                 # shared dataclasses and session models
│   ├── infra/                  # rosbridge connector
│   ├── recognition/            # enrollment, identification, camera, and WS service
│   └── tools/                  # agent tool runtimes
├── requirements.txt            # assistant runtime dependencies
└── requirements-recognition.txt # recognition runtime dependencies

Runtime data, debug captures, .env, virtual environments, generated galleries, and enrolled face images should stay local and are ignored by Git.

Runtime Architecture

The visitor starts a conversation through text input or microphone speech.
The assistant agent decides when to call the recognition tool.
Recognition runs locally or through the WebSocket recognition service.
If the visitor is recognized, the assistant loads the visitor profile and confirms the host or destination.
If the visitor is unknown, the assistant asks for the target host or destination.
The destination resolver maps the host or location to a configured navigation target.
The navigation runtime publishes a geometry_msgs/PoseStamped goal to /goal_pose.
During escort, optional follow verification can warn the visitor if they stop following.
After the visit flow completes, the assistant can return the robot to the configured arrival pose.

Requirements

Python 3.11 for the assistant runtime
System Python with RealSense and GPU-compatible recognition dependencies for the recognition runtime
ROS 2 Foxy and rosbridge_server on the robot
A reachable rosbridge WebSocket endpoint
OpenAI API key
RAI source checkout or installed RAI dependencies
Unitree Go2 speaker backend for voice mode
RealSense camera or Unitree front camera access for face capture

On Jetson or other aarch64 systems, install a compatible onnxruntime-gpu wheel manually if PyPI does not provide one for the target platform.

Setup

Create the assistant environment:

cd <project-root>
python3.11 -m venv .venv
source .venv/bin/activate
pip install -U pip
pip install -r requirements.txt
cp .env.example .env

Install the recognition dependencies in the Python runtime that has camera and GPU access:

cd <project-root>
<recognition-python> -m pip install --user -U pip
<recognition-python> -m pip install --user -r requirements-recognition.txt

Set at least these values in .env:

OPENAI_API_KEY=<your-api-key>
ROSBRIDGE_HOST=<robot-or-rosbridge-host>
ROSBRIDGE_PORT=9090
RECOGNITION_SERVICE_PYTHON=<recognition-python>

If RAI is not installed in the assistant environment, point RAI_CORE_SRC at the local RAI source tree:

RAI_CORE_SRC=<path-to-rai-core-source>

For voice mode, configure the Go2 speaker backend and microphone:

GO2_IP=<robot-ip>
UNITREE_GO2_CONTROLLER_SRC=<path-to-go2-controller-src>
UNITREE_GO2_CONTROLLER_SITE_PACKAGES=<path-to-go2-controller-site-packages>
VOICE_INPUT_DEVICE=<microphone-name-or-index>

Configuration

Main configuration files:

config/nav_destinations.yaml defines named navigation poses, including the arrival pose.
config/visitor_registry.yaml maps enrolled visitor IDs to names, hosts, and default destinations.
config/rosbridge_nav.yaml and config/rosbridge_slam.yaml provide rosbridge parameter presets.
.env controls API credentials, model names, camera source, recognition transport, navigation timeouts, and voice settings.

Path variables in .env may be relative to the project root when launched through the provided scripts.

Enrollment

Enrollment expects exactly one image per person folder:

data/enrolled_images/
  alice/
    photo.jpg
  bob/
    photo.png

Build the recognition gallery:

./scripts/run_enroll.sh \
  --images-dir data/enrolled_images \
  --gallery-out data/gallery/gallery_arcface.npz \
  --json

This also writes gallery metadata next to the .npz file.

Recognition Service

Run the recognition WebSocket service directly:

./scripts/run_recognition_service.sh

The service supports:

identify_current_visitor
follow_check

Text and voice launchers start this service automatically when RECOGNITION_WS_HOST is local and the configured port is not already listening.

Direct Identification

Capture and identify a visitor without starting the assistant:

./scripts/run_identify.sh \
  --gallery data/gallery/gallery_arcface.npz \
  --burst-frames 5 \
  --json

To save captured frames for debugging:

./scripts/run_identify.sh \
  --gallery data/gallery/gallery_arcface.npz \
  --save-burst-dir debug/bursts/run_001 \
  --json

Set FACE_CAPTURE_SOURCE=realsense or FACE_CAPTURE_SOURCE=unitree_front in .env to choose the camera backend.

Running The Assistant

Start text mode:

./scripts/run_assistant_text.sh

Start voice mode:

./scripts/run_assistant_voice.sh

Voice mode uses:

OpenAI Realtime transcription for microphone input
The same local assistant and tool workflow as text mode
OpenAI TTS for assistant responses
Direct Go2 speaker playback through the Unitree controller backend

Rosbridge Helpers

Start rosbridge for navigation:

./scripts/start_rosbridge_nav.sh

Start rosbridge for SLAM:

./scripts/start_rosbridge_slam.sh

Both scripts source the ROS 2 Foxy environment, force system Python, and launch rosbridge_websocket with the matching parameter file.

Assistant Tool Surface

The agent is limited to these deterministic tools:

identify_current_visitor
get_visitor_profile
resolve_host_or_destination
navigate_to_destination
return_to_arrival_pose
get_session_context
end_session

The model controls the conversation wording. Tool execution remains constrained by the session phase and local configuration.

Operational Notes

Generate data/gallery/gallery_arcface.npz before starting the assistant.
Keep enrolled face images and generated gallery files out of Git unless your deployment policy explicitly allows storing biometric data.
Keep .env local; it may contain credentials, robot addresses, and machine-specific paths.
Navigation assumes one active goal at a time.
Voice mode requires a working microphone device and Go2 speaker backend imports.
Follow verification requires the configured YOLO model file and camera access during navigation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Go2 Visitor Assistant

Capabilities

Project Layout

Runtime Architecture

Requirements

Setup

Configuration

Enrollment

Recognition Service

Direct Identification

Running The Assistant

Rosbridge Helpers

Assistant Tool Surface

Operational Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
config		config
data		data
scripts		scripts
src/go2_visitor_assistant		src/go2_visitor_assistant
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
requirements-recognition.txt		requirements-recognition.txt
requirements.txt		requirements.txt
yolov8n.pt		yolov8n.pt

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Go2 Visitor Assistant

Capabilities

Project Layout

Runtime Architecture

Requirements

Setup

Configuration

Enrollment

Recognition Service

Direct Identification

Running The Assistant

Rosbridge Helpers

Assistant Tool Surface

Operational Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages