A web application that uses AI to predict missing family relationships in GEDCOM files. Built with Next.js, TypeScript, and TensorFlow.js (server-side via tfjs-node), featuring Graph Neural Networks (GNN) and Graph Attention Networks (GAT) for parent prediction.
- User Authentication: Secure login and registration system
- GEDCOM File Upload: Support for standard genealogy file formats
- Interactive Family Trees: Visual representation with vis-network
- AI-Powered Predictions: Missing parent prediction using GNN/GAT models
- Data Privacy: Users control their data with full deletion capabilities
- Model Training Pipeline: Continuous learning from user data
- Export Capabilities: Download enhanced family tree data
- Frontend: Next.js 14, TypeScript, Tailwind CSS
- Backend: Next.js API Routes, NextAuth.js
- Database: PostgreSQL with Drizzle ORM
- ML: TensorFlow.js (tfjs-node on the server), Python training pipeline
- Visualization: vis-network for family trees
- Deployment: Vercel
- Node.js 18+
- PostgreSQL database
- Python 3.8+ (for ML training)
-
Clone the repository ```bash git clone cd OhanaAI ```
-
Install dependencies ```bash npm install ```
-
Set up environment variables ```bash cp .env.example .env.local ```
Edit `.env.local` with at least:
DATABASE_URLNEXTAUTH_SECRETNEXTAUTH_URL(e.g., http://localhost:3000)EXPORT_SECRET(used by ML export endpoint)ML_EXPORT_API_KEY(for training scripts that reference it)
-
Set up the database ```bash npm run db:migrate npm run db:push ```
-
Start development server ```bash npm run dev ```
Visit http://localhost:3000
When first deployed, the application will show "No trained model available" for predictions. To train your first model:
-
Collect Training Data
- Users upload GEDCOM files through the web interface
- Data is automatically processed and prepared for training
-
Export Training Data ```bash
curl -X POST http://localhost:3000/api/ml/export-training-data \ -H "Content-Type: application/json" \ -d "{"authorization": "$EXPORT_SECRET"}" ```
Validate export (example response): ```json { "message": "Training data exported successfully", "count": 123, "batches": 2, "directory": "/abs/path/to/training_data" } ```
-
Train the Model ```bash pip install -r requirements_mlx.txt # install MLX/ONNX tooling once python3 train_model_mlx.py --data-dir training_data --output-dir models/parent_predictor ```
-
Deploy the Model Training writes `models/parent_predictor/model.onnx`. The Next.js API loads this ONNX via
onnxruntime-node; restart the server (or redeploy) after replacing the file.
setup_ml_environment.shnow installs MLX packages (instead of TensorFlow) and prepares the local virtualenv.scripts/auto_train.pycan be scheduled to fetch exports and runtrain_model_mlx.pyautomatically.
Place the following under `models/parent_predictor/`:
- ONNX model: `model.onnx`
- Optional metadata: `training_metadata.json`, `training_history.json`
After placing files, restart the server. The predict API will load the ONNX artifact from this directory.
Set up a cron job or GitHub Action to periodically:
- Export new training data
- Retrain the model with updated data
- Deploy the improved model
- User uploads GEDCOM → Parsed and stored in database
- Family tree created → Relationships extracted and visualized
- ML data prepared → Graph structure created for training
- Model inference → Predictions generated for missing parents
- Results displayed → Interactive family tree with predictions
- `users`: User accounts and authentication
- `gedcom_files`: Uploaded files and metadata
- `family_trees`: Parsed family relationships
- `ml_training_data`: Processed data for model training
- Graph Construction: Convert family trees to graph structures
- Feature Engineering: Extract per-individual feature vectors (12 dims)
- Model Training: MLX MLP exported to ONNX
- Inference: Server-side parent prediction via
onnxruntime-nodein/api/ml/predict- Confidence filtering is controlled via
PREDICTION_CONFIDENCE_THRESHOLD(default 0.4) andPREDICTION_MAX_SUGGESTIONS(optional).
- Confidence filtering is controlled via
- `POST /api/auth/register` - User registration
- `POST /api/auth/[...nextauth]` - NextAuth.js endpoints
- `POST /api/gedcom/upload` - Upload GEDCOM file
- `GET /api/gedcom/[id]` - Get file details
- `DELETE /api/gedcom/[id]` - Delete file and all data
- `POST /api/ml/predict` - Generate parent predictions
- `POST /api/ml/export-training-data` - Export training data
Run locally with all data, models, and emails hosted on this device:
-
Environment
- Copy env:
cp .env.example .env.local - Set
DATABASE_URLto your local Postgres - Set
NEXTAUTH_SECRETandNEXTAUTH_URL(e.g., http://ohana.local or http://localhost:3000) - Optionally set email SMTP:
EMAIL_HOST,EMAIL_PORT,EMAIL_USER,EMAIL_PASS,EMAIL_FROM,ADMIN_EMAIL - Optional auto-train:
AUTO_TRAIN=true,TRAINING_SCRIPT=train_model_m1.py
- Copy env:
-
Start app
npm run dev(ornpm run build && npm start)- Optional: add
ohana.localto/etc/hostsand front with a local proxy (Caddy/Nginx) for a custom URL.
-
Data storage
- Uploaded GEDCOM files are saved under
uploads/(configurable viaSTORAGE_ROOT/UPLOADS_DIR). Each upload is hashed per-user so duplicate GEDCOMs are rejected with a helpful message. - Models are loaded from
models/parent_predictor/ - Training exports written to
training_data/(configurable viaTRAINING_DATA_DIR)
Override the defaults in
.env.local:STORAGE_ROOT=/Volumes/OhanaData # optional absolute path (e.g., external drive) UPLOADS_DIR=uploads # relative to STORAGE_ROOT if not absolute TRAINING_DATA_DIR=training_data # relative to STORAGE_ROOT if not absolute ML_EXPORTS_DIR=exports/ml_training # where export-user-data writes files SYNC_GEDCOM_PROCESSING=true # block upload response until parsing + inference complete PREDICTION_CONFIDENCE_THRESHOLD=0.4 # minimum confidence for suggested parents PREDICTION_MAX_SUGGESTIONS=5 # cap suggestions per missing parent (set 0 for unlimited)If you leave these blank the server stores files alongside the app directory (or
/tmp/ohana-aion Vercel). All paths are created automatically at runtime. - Uploaded GEDCOM files are saved under
-
GEDCOM processing
- By default (
SYNC_GEDCOM_PROCESSING=true) uploads stay open until parsing, ML data prep, and ONNX inference finish so users can wait for predictions. - Set the env to
falseto fall back to the asynchronous queue (lib/jobs/gedcomProcessor.ts).
- By default (
-
Notifications (optional)
- On user signup, account deletion, and new parent predictions, an email is sent to
ADMIN_EMAIL(and a welcome email to the user) if SMTP is configured.
- On user signup, account deletion, and new parent predictions, an email is sent to
For custom domains via Cloudflare, see DOMAIN_SETUP.md.
-
Connect to Vercel ```bash npx vercel ```
-
Set Environment Variables
- Add database URL, NextAuth secret, etc.
-
Deploy ```bash npx vercel --prod ```
- Create PostgreSQL database (recommended: Neon, Supabase, or Vercel Postgres)
- Run migrations
- Update environment variables
- Data Encryption: All sensitive data is encrypted
- User Control: Complete data deletion capabilities
- Access Control: Users can only access their own data
- Secure Authentication: NextAuth.js with secure sessions
```bash npm run db:studio # Open Drizzle Studio npm run db:migrate # Generate migrations npm run db:push # Push schema changes ```
```bash npm run lint # ESLint npm run build # Production build ```
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
MIT License - see LICENSE file for details
For issues and questions:
- Open a GitHub issue
- Check the documentation
- Review the API endpoints
Note: This application is designed for genealogical research and family history. All predictions should be verified through traditional genealogical methods.