⚠️ Early stage / work in progress. Naka is a personal project under active development. The architecture below is in place and working, but the surface is small, things move fast, and many more commands are on the way. Treat this as a foundation to build on, not a finished product.
Naka is a personal voice assistant designed to run on a Raspberry Pi. It pairs on-device wake-word detection with Gemini Live (cloud) for speech-to-text, reasoning, and text-to-speech inside a single WebSocket session so the Pi stays near-idle and the heavy lifting happens server-side.
Wake word (on-device, openWakeWord ONNX)
↓
Gemini Live WebSocket session
├── Audio IN → mic stream, PCM 16-bit 16 kHz
├── STT + LLM → Gemini Flash Live
├── TTS → Gemini native audio output, PCM 16-bit 24 kHz
└── Function calling → CommandRegistry → BaseCommand.execute()
Design choices:
- Everything cloud-side (STT + LLM + TTS) to keep the Pi's load minimal.
- The only on-device model is the wake word (openWakeWord, ONNX).
- Audio playback is decoupled from the receive loop via an
asyncio.Queuedrained with a sentinel, so PortAudio never cuts speech off mid-sentence. - Each session runs three concurrent async tasks (
send_audio,receive_responses,watchdog) coordinated withasyncio.Event— no polling.
main.py Entry point — discovers commands, starts the engine
engines/
gemini_live_engine.py Orchestrates wake word + Gemini session + audio I/O
gemini_session.py The Gemini Live WebSocket session
wake_word.py On-device wake-word detection
commands/ Skills the assistant can call (auto-discovered)
base_command.py Abstract base: name, description, schema, execute()
light_control.py Turn lights on/off
weather.py Current weather (Open-Meteo, no API key)
system_info.py CPU + RAM via psutil
media/ Music playback (Spotify player + generic controls)
registry.py Holds commands, builds Gemini function declarations
configs/ Typed TOML config (Pydantic) + .env resolver
training/ Wake-word recording + training pipeline (see below)
models/wakeword/ The wake-word model loaded at startup
utils/ Logger, HTTP client
| Command | What it does |
|---|---|
| Light control | Turn lights on/off (kitchen, bedroom, living room) |
| Weather | Current conditions via Open-Meteo (no API key needed) |
| System info | Reports CPU and RAM usage |
| Media | Play / control music (Spotify, with a generic player layer) |
This list is intentionally short — more commands are coming. Adding one is
deliberately simple: subclass BaseCommand, implement name / description /
parameters_schema / execute(), drop it under commands/, and the registry
discovers it automatically.
This is the part still being figured out. The default openWakeWord models are
trained on synthetic English voices and recognize a non-English "naka" poorly.
The fix is a model trained on your own voice, and that pipeline lives in
training/.
A first example model ships in the repo at
models/wakeword/naka.onnx so Naka runs out of the
box — but consider it a rough first pass, not a finished, reliable detector.
Expect to retrain it on your own voice/mic/language for solid results.
👉 See training/README.md for the full record → train →
deploy → tune walkthrough.
- ✅ Wake word → Gemini Live session → spoken response loop working
- ✅ Function-calling command system with auto-discovery
- ✅ A handful of commands (lights, weather, system info, media)
- 🚧 Wake-word model quality — example provided, needs proper training
- 🚧 Many more commands planned
- 🚧 APIs and config layout may still change
google-genai · openwakeword · sounddevice · numpy · pydantic ·
python-dotenv · psutil — managed with uv.