Ohana AI - Family Tree Intelligence

A web application that uses AI to predict missing family relationships in GEDCOM files. Built with Next.js, TypeScript, and TensorFlow.js (server-side via tfjs-node), featuring Graph Neural Networks (GNN) and Graph Attention Networks (GAT) for parent prediction.

Features

User Authentication: Secure login and registration system
GEDCOM File Upload: Support for standard genealogy file formats
Interactive Family Trees: Visual representation with vis-network
AI-Powered Predictions: Missing parent prediction using GNN/GAT models
Data Privacy: Users control their data with full deletion capabilities
Model Training Pipeline: Continuous learning from user data
Export Capabilities: Download enhanced family tree data

Tech Stack

Frontend: Next.js 14, TypeScript, Tailwind CSS
Backend: Next.js API Routes, NextAuth.js
Database: PostgreSQL with Drizzle ORM
ML: TensorFlow.js (tfjs-node on the server), Python training pipeline
Visualization: vis-network for family trees
Deployment: Vercel

Quick Start

Prerequisites

Node.js 18+
PostgreSQL database
Python 3.8+ (for ML training)

Installation

Clone the repository ```bash git clone cd OhanaAI ```
Install dependencies ```bash npm install ```
Set up environment variables ```bash cp .env.example .env.local ```

Edit `.env.local` with at least:
- DATABASE_URL
- NEXTAUTH_SECRET
- NEXTAUTH_URL (e.g., http://localhost:3000)
- EXPORT_SECRET (used by ML export endpoint)
- ML_EXPORT_API_KEY (for training scripts that reference it)
Set up the database ```bash npm run db:migrate npm run db:push ```
Start development server ```bash npm run dev ```

Visit http://localhost:3000

ML Training Pipeline

Initial Setup (No Model Available)

When first deployed, the application will show "No trained model available" for predictions. To train your first model:

Collect Training Data
- Users upload GEDCOM files through the web interface
- Data is automatically processed and prepared for training
Export Training Data ```bash

Ensure EXPORT_SECRET is set in .env.local and server is running

curl -X POST http://localhost:3000/api/ml/export-training-data \ -H "Content-Type: application/json" \ -d "{"authorization": "$EXPORT_SECRET"}" ```

Validate export (example response): ```json { "message": "Training data exported successfully", "count": 123, "batches": 2, "directory": "/abs/path/to/training_data" } ```
Train the Model ```bash pip install -r requirements_mlx.txt # install MLX/ONNX tooling once python3 train_model_mlx.py --data-dir training_data --output-dir models/parent_predictor ```
Deploy the Model Training writes `models/parent_predictor/model.onnx`. The Next.js API loads this ONNX via onnxruntime-node; restart the server (or redeploy) after replacing the file.

Alternative: Automated Training

setup_ml_environment.sh now installs MLX packages (instead of TensorFlow) and prepares the local virtualenv.
scripts/auto_train.py can be scheduled to fetch exports and run train_model_mlx.py automatically.

Model Files Checklist

Place the following under `models/parent_predictor/`:

ONNX model: `model.onnx`
Optional metadata: `training_metadata.json`, `training_history.json`

After placing files, restart the server. The predict API will load the ONNX artifact from this directory.

Continuous Training

Set up a cron job or GitHub Action to periodically:

Export new training data
Retrain the model with updated data
Deploy the improved model

Architecture

Data Flow

User uploads GEDCOM → Parsed and stored in database
Family tree created → Relationships extracted and visualized
ML data prepared → Graph structure created for training
Model inference → Predictions generated for missing parents
Results displayed → Interactive family tree with predictions

Database Schema

`users`: User accounts and authentication
`gedcom_files`: Uploaded files and metadata
`family_trees`: Parsed family relationships
`ml_training_data`: Processed data for model training

ML Pipeline

Graph Construction: Convert family trees to graph structures
Feature Engineering: Extract per-individual feature vectors (12 dims)
Model Training: MLX MLP exported to ONNX
Inference: Server-side parent prediction via onnxruntime-node in /api/ml/predict
- Confidence filtering is controlled via PREDICTION_CONFIDENCE_THRESHOLD (default 0.4) and PREDICTION_MAX_SUGGESTIONS (optional).

API Endpoints

Authentication

`POST /api/auth/register` - User registration
`POST /api/auth/[...nextauth]` - NextAuth.js endpoints

GEDCOM Management

`POST /api/gedcom/upload` - Upload GEDCOM file
`GET /api/gedcom/[id]` - Get file details
`DELETE /api/gedcom/[id]` - Delete file and all data

ML Operations

`POST /api/ml/predict` - Generate parent predictions
`POST /api/ml/export-training-data` - Export training data

Local Hosting

Run locally with all data, models, and emails hosted on this device:

Environment
- Copy env: cp .env.example .env.local
- Set DATABASE_URL to your local Postgres
- Set NEXTAUTH_SECRET and NEXTAUTH_URL (e.g., http://ohana.local or http://localhost:3000)
- Optionally set email SMTP: EMAIL_HOST, EMAIL_PORT, EMAIL_USER, EMAIL_PASS, EMAIL_FROM, ADMIN_EMAIL
- Optional auto-train: AUTO_TRAIN=true, TRAINING_SCRIPT=train_model_m1.py
Start app
- npm run dev (or npm run build && npm start)
- Optional: add ohana.local to /etc/hosts and front with a local proxy (Caddy/Nginx) for a custom URL.

Data storage

Uploaded GEDCOM files are saved under uploads/ (configurable via STORAGE_ROOT/UPLOADS_DIR). Each upload is hashed per-user so duplicate GEDCOMs are rejected with a helpful message.
Models are loaded from models/parent_predictor/
Training exports written to training_data/ (configurable via TRAINING_DATA_DIR)

Override the defaults in .env.local:

STORAGE_ROOT=/Volumes/OhanaData     # optional absolute path (e.g., external drive)
UPLOADS_DIR=uploads                 # relative to STORAGE_ROOT if not absolute
TRAINING_DATA_DIR=training_data     # relative to STORAGE_ROOT if not absolute
ML_EXPORTS_DIR=exports/ml_training  # where export-user-data writes files
SYNC_GEDCOM_PROCESSING=true         # block upload response until parsing + inference complete
PREDICTION_CONFIDENCE_THRESHOLD=0.4 # minimum confidence for suggested parents
PREDICTION_MAX_SUGGESTIONS=5        # cap suggestions per missing parent (set 0 for unlimited)

If you leave these blank the server stores files alongside the app directory (or /tmp/ohana-ai on Vercel). All paths are created automatically at runtime.

GEDCOM processing
- By default (SYNC_GEDCOM_PROCESSING=true) uploads stay open until parsing, ML data prep, and ONNX inference finish so users can wait for predictions.
- Set the env to false to fall back to the asynchronous queue (lib/jobs/gedcomProcessor.ts).
Notifications (optional)
- On user signup, account deletion, and new parent predictions, an email is sent to ADMIN_EMAIL (and a welcome email to the user) if SMTP is configured.

Deployment

Vercel Deployment

For custom domains via Cloudflare, see DOMAIN_SETUP.md.

Connect to Vercel ```bash npx vercel ```
Set Environment Variables
- Add database URL, NextAuth secret, etc.
Deploy ```bash npx vercel --prod ```

Database Setup

Create PostgreSQL database (recommended: Neon, Supabase, or Vercel Postgres)
Run migrations
Update environment variables

Privacy & Security

Data Encryption: All sensitive data is encrypted
User Control: Complete data deletion capabilities
Access Control: Users can only access their own data
Secure Authentication: NextAuth.js with secure sessions

Development

Database Operations

```bash npm run db:studio # Open Drizzle Studio npm run db:migrate # Generate migrations npm run db:push # Push schema changes ```

Testing

```bash npm run lint # ESLint npm run build # Production build ```

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request

License

MIT License - see LICENSE file for details

Support

For issues and questions:

Open a GitHub issue
Check the documentation
Review the API endpoints

Note: This application is designed for genealogical research and family history. All predictions should be verified through traditional genealogical methods.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
app		app
components		components
lib		lib
models/parent_predictor		models/parent_predictor
scripts		scripts
training_data		training_data
types		types
.env.example		.env.example
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
.next-dev-3001.log		.next-dev-3001.log
.next-dev.log		.next-dev.log
.next-dev.pid		.next-dev.pid
AGENTS.md		AGENTS.md
DEPLOYMENT.md		DEPLOYMENT.md
DOMAIN_SETUP.md		DOMAIN_SETUP.md
Hussey Ohana.ged.txt		Hussey Ohana.ged.txt
README.md		README.md
add_images.sh		add_images.sh
deploy.sh		deploy.sh
drizzle.config.ts		drizzle.config.ts
next-env.d.ts		next-env.d.ts
next.config.js		next.config.js
package-lock.json		package-lock.json
package.json		package.json
postcss.config.js		postcss.config.js
requirements_m1.txt		requirements_m1.txt
requirements_mlx.txt		requirements_mlx.txt
setup_ml_environment.sh		setup_ml_environment.sh
tailwind.config.ts		tailwind.config.ts
train_model.py		train_model.py
train_model_m1.py		train_model_m1.py
train_model_mlx.py		train_model_mlx.py
training_config.json		training_config.json
tsconfig.drizzle.json		tsconfig.drizzle.json
tsconfig.json		tsconfig.json
vercel.json		vercel.json

Ikaikaalika/OhanaAI

Folders and files

Latest commit

History

Repository files navigation