A.I.R.A.

A voice-first personal assistant that feels like JARVIS — at home and on the go.

A.I.R.A. (AI Responsive Assistant) is a two-target voice assistant designed to live with you. Walk into your house and say "Hey AIRA, call mom" — the desktop hub picks up your voice and handles the call. Step outside and the same identity follows you onto a pocket-sized Raspberry Pi: hold a button, speak, get the same answers and actions. One brain, two bodies.

                         ┌────────────────────────────────────────┐
                         │  Backend (FastAPI · Python · async)    │
                         │  ┌────────────┐  ┌─────────────────┐   │
   ┌─────────────┐       │  │  Realtime  │  │ Agent Router    │   │
   │  Home       │ ◄───► │  │  Session   │◄─┤ + Approval Svc  │   │
   │  (wake-word)│  WS   │  │ (OpenAI)   │  └────────┬────────┘   │
   └─────────────┘       │  └────────────┘           │            │
                         │                           ▼            │
   ┌─────────────┐       │  ┌────────────┐  ┌─────────────────┐   │
   │  Pi (PTT)   │ ◄───► │  │  Memory    │  │ Tools           │   │
   │  Whisplay   │  WS   │  │  Service   │  │ (telephony,     │   │
   └─────────────┘       │  │ (PG+Redis) │  │  email, search, │   │
                         │  └────────────┘  │  calendar)      │   │
                         │                  └─────────────────┘   │
                         └────────────────────────────────────────┘

Why this exists

Most voice assistants are either toys (smart speakers that can't do much) or hostile (locked into one ecosystem, recording everything, surfacing ads). A.I.R.A. is built around three beliefs:

Voice should be the primary interface, not a feature. Sub-second latency, natural turn-taking, and interruption handling — anything slower kills the loop.
External actions need explicit consent. Placing a call, sending an email, or spending money requires a verbal "yes" — every time, with a 30-second window. No surprise actions.
It should follow you. The home device and the portable Pi share one identity, one memory, one set of contacts. Continue a conversation across rooms or across town.

It's a personal project — not a product, not a startup. The goal is to ship a JARVIS that feels like JARVIS: fast, deferential, deeply integrated with the APIs I actually use.

How it works

A turn, end-to-end

Activation — wake-word "hey aira" (home) or button press (Pi).
Voice capture streams to the backend over a WebSocket as 24 kHz PCM frames.
OpenAI Realtime API transcribes, reasons, and decides whether to call a tool.
Agent Router validates the tool call, classifies its safety tier, and routes it.
Approval Service intercepts Tier 2 actions (call, email) and asks for verbal confirmation.
Tool Executor invokes Telnyx / Gmail / Maps / Calendar with retry, rate limiting, and timeouts.
Memory Service persists the result, the context, and any relevant updates.
TTS response streams back to the device and plays through the speaker.

Typical time-to-first-audio over home wifi: ~350-500ms.

Safety tiers

Every action is classified up front; the tier determines whether it can run automatically.

Tier	Examples	Policy
0 — Read	Look up a business, check the weather	Allow
1 — Draft	Compose an email, generate a call script	Allow + log
2 — Action	Place a call, send an email	Require verbal approval
3 — High risk	Spend money, bulk operations	Blocked in v1

Approval prompts in ambient (wake-word) mode require a strict "yes" — no fuzzy matching — because the cost of a false positive in an always-on environment is higher than a missed intent.

The two devices

Home — ambient wake-word

A desktop or mini-PC (currently targeting an old laptop or NVIDIA DGX Spark) running the device client with a far-field USB mic array and a powered speaker. Wake on "hey aira", respond in conversation. The wake-word model is custom-trained via openWakeWord's TTS-augmented pipeline since "hey aira" isn't a stock keyword.

Pocket Pi — press-to-talk

A Raspberry Pi 5 with the PiSugar Whisplay HAT (integrated LCD, mic, speaker, button, battery). Tethered to a phone hotspot when out of the house. Press to talk, release to send — short presses (<500ms) are rejected as accidental, matching the proven gesture from PiSugar/whisplay-ai-chatbot.

The activation layer is a Protocol-based abstraction (activation.py), so any new hardware (mobile app, smart watch, kiosk) is a single class away.

Tech stack and decisions

Layer	Choice	Why
Voice model	OpenAI Realtime API	Best speech-to-speech latency + native function calling. No open-weight model matches it on the full bundle (latency + reasoning + tool calls + interruption handling) as of early 2026.
Wake word	openWakeWord	Open source, runs on CPU, supports custom training.
Telephony	Telnyx	~60% cheaper than Twilio for the same call quality.
Email	Gmail API	User-owned accounts, OAuth, no SMTP relay.
Backend	FastAPI + asyncpg + Redis	Async all the way down. Predictable latency under load.
Database	PostgreSQL 16 (Alembic migrations)	Honest persistence. Migration history not stub schemas.
Cache	Redis 7	Rate limit counters, session locks, token cache.
Workspace	uv + pyproject monorepo	Cross-package editable installs without setup.py rituals.
Logs	structlog (JSON)	Searchable, parseable, no ad-hoc print debugging.

The cloud-vs-local question

I evaluated running a local voice-to-voice model on the DGX Spark — Moshi, Llama-Omni, NVIDIA Nemotron-Audio, etc. The honest finding: no open model matches GPT-4o Realtime on the full bundle of latency + reasoning + tool calls + interruption handling as of early 2026. The closest on latency (Moshi) lags badly on reasoning and lacks function calling, which the entire router/approval flow depends on.

The realtime-session package is built as "OpenAI + fallback" so the backend can be swapped without architectural rework. Local revisits in 6-12 months when the open-weight space catches up.

Project structure

apps/
├── backend-api/          # FastAPI service, WebSocket session, auth, metrics
└── device-client/        # Edge client: audio I/O, activation, status display
    └── hardware/         # Pi GPIO button (gpiod), Whisplay LCD/LED (planned)

packages/
├── realtime-session/     # OpenAI Realtime adapter + state machine + cost guard
├── agent-router/         # Intent → tool → workflow with safety-tier classification
├── approval-service/     # Verbal approval flow, allow/blocklist, 30s timeout
├── memory-service/       # Users, contacts, preferences, conversation history
├── tools-core/           # Tool registry + executor (retry, rate limit, quotas)
├── tools-telephony/      # Telnyx adapter
├── tools-email/          # Gmail adapter
├── tools-search/         # Google Maps adapter
├── tools-calendar/       # Google Calendar adapter
└── shared/               # Types, audit logger, common utilities

infra/
├── docker/               # docker-compose for Postgres + Redis (dev)
└── terraform/            # Cloud provisioning (planned)

Quickstart

Prerequisites

Python 3.11+
uv package manager
Docker (for local Postgres + Redis), or your own Postgres 15+ / Redis 7+
An OpenAI API key with Realtime API access

Setup

git clone https://github.com/Alex0420W/A.I.R.A.git
cd A.I.R.A

# Bring up Postgres + Redis
docker compose -f infra/docker/docker-compose.yml up -d

# Install all packages in editable mode
uv sync --all-extras

# Configure secrets
cp .env.example .env
# Edit .env: at minimum set OPENAI_API_KEY

# Apply database migrations
uv run alembic upgrade head

# Start the backend
uv run python run-api.py

The API comes up at http://localhost:8000. Hit /health to confirm Postgres, Redis, and OpenAI credentials are wired correctly.

Run the device client

# In another terminal
uv run python -m aira_device_client

Default activation is wake-word. Override per device with environment variables — useful when you want the Pi to default to button mode and the home machine to default to wake-word:

# Pi
export AIRA_DEFAULT_ACTIVATION_MODE=button
export AIRA_DEVICE_ID=pi-pocket-01

# Home
export AIRA_DEFAULT_ACTIVATION_MODE=wake_word
export AIRA_WAKE_WORD=hey_aira
export AIRA_DEVICE_ID=home-hub-01

Pi-only setup

# On the Raspberry Pi only — adds gpiod for the Whisplay HAT button
uv sync --extra pi

Status

This is a personal-scale project under active development. The honest current state:

Area	State
Backend API	Wired end-to-end. Health, metrics, WebSocket session, auth middleware, encryption.
Realtime session	OpenAI Realtime adapter, state machine, cost guard, audio buffering.
Agent router	Tool registry, safety-tier classification, parameter validation.
Approval service	Verbal approval flow, 30s timeout, allow/blocklist.
Memory service	Postgres schema, contacts, preferences, conversation history (Alembic migrations).
Tools	Telephony, email, search, calendar — all implemented against real APIs (not yet live-tested end-to-end).
Device client	Audio I/O, WebSocket reconnection, status indicators, press-to-talk + wake-word activation.
Pi GPIO	`GpioButtonHandler` (gpiod, BCM 17, <500ms reject). Pending hardware to test.
Whisplay LCD	Not yet ported. `StatusManager` is abstracted to plug in.
Wake word	`openWakeWord` integrated; "hey aira" custom model not yet trained.
Tests	56 passing. Structural coverage; integration tests are the next priority.

The code is ~95% scaffolded, the testing is ~30% there. Shipping path is: train wake word → live-test the OpenAI path on home hardware → wire and verify backend audio_commit / audio_cancel routing → end-to-end voice test → port Whisplay LCD when Pi hardware arrives.

Roadmap

v1 — Magic loop

Custom-trained hey_aira wake-word model
Live end-to-end test: home wifi, real OpenAI Realtime, real Telnyx call placed by voice
Pi build with Whisplay HAT (button + LCD + RGB feedback)
Backend audio_commit / audio_cancel routing
Integration tests covering the full voice → tool → response loop

v2 — Personalization

Contact disambiguation that actually learns ("call mom" → which mom?)
Multi-user voice ID
Per-device preferences (Pi defaults vs. home defaults)
Conversation memory pruning that respects what's worth remembering

v3 — Connectors

Calendar event creation from voice
Note-taking and reminder workflows
Voice-driven home automation (Home Assistant bridge)

Development

# Run tests
uv run pytest

# Lint
uv run ruff check .

# Format
uv run ruff format .

# Type check
uv run mypy .

# Pre-commit hooks
uv run pre-commit install

Acknowledgments

OpenAI for the Realtime API
Telnyx for telephony
openWakeWord for open-source wake-word detection
PiSugar/whisplay-ai-chatbot — the reference design the Pi build is modeled on

License

MIT — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
apps		apps
assets/wake-words		assets/wake-words
docs		docs
infra		infra
packages		packages
scripts		scripts
src/aira		src/aira
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
AIRA_FEATURES.md		AIRA_FEATURES.md
LICENSE		LICENSE
PI_BUILD.md		PI_BUILD.md
README.md		README.md
pyproject.toml		pyproject.toml
run-api.py		run-api.py
start-api.ps1		start-api.ps1
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A.I.R.A.

Why this exists

How it works

A turn, end-to-end

Safety tiers

The two devices

Home — ambient wake-word

Pocket Pi — press-to-talk

Tech stack and decisions

The cloud-vs-local question

Project structure

Quickstart

Prerequisites

Setup

Run the device client

Pi-only setup

Status

Roadmap

Development

Acknowledgments

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

A.I.R.A.

Why this exists

How it works

A turn, end-to-end

Safety tiers

The two devices

Home — ambient wake-word

Pocket Pi — press-to-talk

Tech stack and decisions

The cloud-vs-local question

Project structure

Quickstart

Prerequisites

Setup

Run the device client

Pi-only setup

Status

Roadmap

Development

Acknowledgments

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages