Skip to content

Wenjix/mira

 
 

Repository files navigation

Mira — TreeHacks 2026

An AI companion on smart glasses that understands your home in 3D. When a dementia patient says "where are my pills?", Mira actually finds them.

Voice-first assisted living platform combining 3D scene understanding, medical knowledge, and real-time caregiver alerts. Runs on Ray-Ban Meta glasses.

Tech Stack: Next.js · React · Tailwind CSS · Supabase (Postgres + Realtime) · OpenRouter · Whisper STT · OpenAI TTS · Modal (A100 GPU) · Viser · ONNX Runtime Web · Python · FastAPI

System Architecture

flowchart_dataflow.mp4

Inspiration

Three of us have grandparents with Alzheimer's or dementia. If you've been around it, you know the loop: "Have you seen my pills?" five times in an afternoon. It's not just forgetting — it's the loss of autonomy, the slow erosion of someone's confidence that they can manage their own life.

Many dementia patients refuse traditional care, insisting family handle everything, which puts enormous strain on people who can't be present 24/7. Mira is designed for this gap: asynchronous monitoring without pulling out a phone, assistance at 3 AM, and a caregiver dashboard that keeps family informed.

For someone with Alzheimer's, a searchable spatial memory isn't a convenience. It's dignity.

Modules

3D Reconstruction

Module What it does
scenegraph/ Video → 3D scene graph. End-to-end pipeline on Modal (A100): structure-from-motion, Grounding DINO object detection per frame, backprojection into 3D via depth maps + camera poses, geometric overlap + CLIP similarity merging into a Scene Object Graph. See scenegraph/README.md.
reconstruction/ Dense 3D reconstruction — video → dense depth, camera poses, point clouds.

Camera Localization

Module What it does
hloc_localization/ Visual localization via SuperPoint + LightGlue + PnP. Builds an SfM reference map from video (Modal GPU), then localizes new frames against it in 6DoF. Live pose tracking uses DPVO for continuous 6DoF with periodic HLoc re-localization to correct drift.

Object Localization

Module What it does
segmentation/ Image + language query → object segmentation. Open-vocabulary detection and mask generation.
depthanything/ Monocular depth estimation per frame. Used to get depth of segmented objects — median/trimmed mean since mask edges bleed into background.

Apps & Services

Module What it does
explorer/ Interactive 3D point cloud viewer (Viser + FastAPI). Natural language query → Gemini multi-view detection → raycast into point cloud → animated camera fly-to. Auto-detects scene orientation via Gemini rotation voting.
mira-chat/ Full-stack Next.js app — resident voice/text chat (LLM + function calling for spatial nav, medical Q&A, medication lookup), supervisor dashboard with real-time event timeline, caregiver escalation via email. Supabase Realtime for zero-polling updates. See mira-chat/README.md.
android/ Android client app — receives Bluetooth audio + JPEG frames from Ray-Ban Meta glasses, bridges to backend via Tailscale.
securitycam/ Security camera streaming — mediamtx config for RTMP/RTSP/HLS/WebRTC ingestion.
vic-backend/ Backend services.

Team

Member Contributions
Nathan ML pipeline & GPU compute on Modal — object localization, 3D scene reconstruction, camera pose tracking with visual odometry. Security cam integration, point cloud viewer.
Victor Streaming video and audio from the Ray-Ban Meta SDK, linking to frontend, delivering audio feedback back to the user.
Madhuhaas Agentic system and tool-calling pipeline — pulling patient info, web search for medication lookup. Frontend development.
Antonio Online object localization, animated camera fly-to in the 3D viewer, high-level pitch direction, demo video editing.

What's Next

  • Continuous scene updates. The 3D model is currently from a one-time walkthrough. Objects move. The system should update incrementally as the resident goes about their day.
  • Predictive object tracking. If the event log shows someone leaves their glasses in the fridge every Tuesday, the system should learn that and check there first.
  • Dedicated sensor hardware. Ray-Bans don't expose IMU data, so fall detection needs external hardware — a wearable with accelerometer and gyroscope combined with 3D scene context.
  • Health device integration. Apple Watch vitals, cameras in common areas, other health peripherals feeding into the same event stream.
  • Scaling point clouds. One room works. A full facility requires procedural rendering and level-of-detail management for very large point clouds.

About

AI eldercare assistant that reconstructs 3D scenes, localizes lost objects, and alerts caregivers—all through voice and vision.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 43.7%
  • Kotlin 26.8%
  • TypeScript 25.8%
  • HTML 2.5%
  • CSS 0.6%
  • PLpgSQL 0.5%
  • Other 0.1%