Skip to content

Unitree-Go2-Physical-AI-Integration/Go2-Visitor-Assistant

Repository files navigation

Go2 Visitor Assistant

Go2 Visitor Assistant is a reception workflow for a Unitree Go2 robot. It combines face recognition, RAI/LangChain-based dialogue orchestration, rosbridge navigation commands, and text or voice interaction.

The current runtime uses one assistant workflow for both text and voice. Voice mode performs OpenAI Realtime transcription, routes the transcript through the local assistant and tools, synthesizes the assistant response with OpenAI TTS, and plays it through the Go2 speaker backend.

Capabilities

  • Visitor identification from RealSense or Unitree front camera input
  • One-image-per-person face enrollment with safe augmentation
  • Local WebSocket recognition service for assistant tool calls
  • RAI/LangChain reception agent with phase-gated tools
  • Host and destination resolution from YAML configuration
  • Navigation goal publishing to /goal_pose through rosbridge
  • Nav2 action status monitoring through a configurable status topic
  • Optional follow verification during escort using YOLO person detection
  • Text terminal mode and direct Go2 speaker voice mode

Project Layout

.
├── config/                    # destinations, visitor registry, rosbridge presets
├── data/
│   ├── enrolled_images/        # one subfolder per enrolled visitor
│   └── gallery/                # generated gallery artifacts
├── scripts/                    # launchers for assistant, recognition, and rosbridge
├── src/go2_visitor_assistant/
│   ├── assistant/              # text and voice assistant runtimes
│   ├── domain/                 # shared dataclasses and session models
│   ├── infra/                  # rosbridge connector
│   ├── recognition/            # enrollment, identification, camera, and WS service
│   └── tools/                  # agent tool runtimes
├── requirements.txt            # assistant runtime dependencies
└── requirements-recognition.txt # recognition runtime dependencies

Runtime data, debug captures, .env, virtual environments, generated galleries, and enrolled face images should stay local and are ignored by Git.

Runtime Architecture

  1. The visitor starts a conversation through text input or microphone speech.
  2. The assistant agent decides when to call the recognition tool.
  3. Recognition runs locally or through the WebSocket recognition service.
  4. If the visitor is recognized, the assistant loads the visitor profile and confirms the host or destination.
  5. If the visitor is unknown, the assistant asks for the target host or destination.
  6. The destination resolver maps the host or location to a configured navigation target.
  7. The navigation runtime publishes a geometry_msgs/PoseStamped goal to /goal_pose.
  8. During escort, optional follow verification can warn the visitor if they stop following.
  9. After the visit flow completes, the assistant can return the robot to the configured arrival pose.

Requirements

  • Python 3.11 for the assistant runtime
  • System Python with RealSense and GPU-compatible recognition dependencies for the recognition runtime
  • ROS 2 Foxy and rosbridge_server on the robot
  • A reachable rosbridge WebSocket endpoint
  • OpenAI API key
  • RAI source checkout or installed RAI dependencies
  • Unitree Go2 speaker backend for voice mode
  • RealSense camera or Unitree front camera access for face capture

On Jetson or other aarch64 systems, install a compatible onnxruntime-gpu wheel manually if PyPI does not provide one for the target platform.

Setup

Create the assistant environment:

cd <project-root>
python3.11 -m venv .venv
source .venv/bin/activate
pip install -U pip
pip install -r requirements.txt
cp .env.example .env

Install the recognition dependencies in the Python runtime that has camera and GPU access:

cd <project-root>
<recognition-python> -m pip install --user -U pip
<recognition-python> -m pip install --user -r requirements-recognition.txt

Set at least these values in .env:

OPENAI_API_KEY=<your-api-key>
ROSBRIDGE_HOST=<robot-or-rosbridge-host>
ROSBRIDGE_PORT=9090
RECOGNITION_SERVICE_PYTHON=<recognition-python>

If RAI is not installed in the assistant environment, point RAI_CORE_SRC at the local RAI source tree:

RAI_CORE_SRC=<path-to-rai-core-source>

For voice mode, configure the Go2 speaker backend and microphone:

GO2_IP=<robot-ip>
UNITREE_GO2_CONTROLLER_SRC=<path-to-go2-controller-src>
UNITREE_GO2_CONTROLLER_SITE_PACKAGES=<path-to-go2-controller-site-packages>
VOICE_INPUT_DEVICE=<microphone-name-or-index>

Configuration

Main configuration files:

  • config/nav_destinations.yaml defines named navigation poses, including the arrival pose.
  • config/visitor_registry.yaml maps enrolled visitor IDs to names, hosts, and default destinations.
  • config/rosbridge_nav.yaml and config/rosbridge_slam.yaml provide rosbridge parameter presets.
  • .env controls API credentials, model names, camera source, recognition transport, navigation timeouts, and voice settings.

Path variables in .env may be relative to the project root when launched through the provided scripts.

Enrollment

Enrollment expects exactly one image per person folder:

data/enrolled_images/
  alice/
    photo.jpg
  bob/
    photo.png

Build the recognition gallery:

./scripts/run_enroll.sh \
  --images-dir data/enrolled_images \
  --gallery-out data/gallery/gallery_arcface.npz \
  --json

This also writes gallery metadata next to the .npz file.

Recognition Service

Run the recognition WebSocket service directly:

./scripts/run_recognition_service.sh

The service supports:

  • identify_current_visitor
  • follow_check

Text and voice launchers start this service automatically when RECOGNITION_WS_HOST is local and the configured port is not already listening.

Direct Identification

Capture and identify a visitor without starting the assistant:

./scripts/run_identify.sh \
  --gallery data/gallery/gallery_arcface.npz \
  --burst-frames 5 \
  --json

To save captured frames for debugging:

./scripts/run_identify.sh \
  --gallery data/gallery/gallery_arcface.npz \
  --save-burst-dir debug/bursts/run_001 \
  --json

Set FACE_CAPTURE_SOURCE=realsense or FACE_CAPTURE_SOURCE=unitree_front in .env to choose the camera backend.

Running The Assistant

Start text mode:

./scripts/run_assistant_text.sh

Start voice mode:

./scripts/run_assistant_voice.sh

Voice mode uses:

  • OpenAI Realtime transcription for microphone input
  • The same local assistant and tool workflow as text mode
  • OpenAI TTS for assistant responses
  • Direct Go2 speaker playback through the Unitree controller backend

Rosbridge Helpers

Start rosbridge for navigation:

./scripts/start_rosbridge_nav.sh

Start rosbridge for SLAM:

./scripts/start_rosbridge_slam.sh

Both scripts source the ROS 2 Foxy environment, force system Python, and launch rosbridge_websocket with the matching parameter file.

Assistant Tool Surface

The agent is limited to these deterministic tools:

  • identify_current_visitor
  • get_visitor_profile
  • resolve_host_or_destination
  • navigate_to_destination
  • return_to_arrival_pose
  • get_session_context
  • end_session

The model controls the conversation wording. Tool execution remains constrained by the session phase and local configuration.

Operational Notes

  • Generate data/gallery/gallery_arcface.npz before starting the assistant.
  • Keep enrolled face images and generated gallery files out of Git unless your deployment policy explicitly allows storing biometric data.
  • Keep .env local; it may contain credentials, robot addresses, and machine-specific paths.
  • Navigation assumes one active goal at a time.
  • Voice mode requires a working microphone device and Go2 speaker backend imports.
  • Follow verification requires the configured YOLO model file and camera access during navigation.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors