Skip to content

deepc0py/SnapScribe

Repository files navigation

SnapScribe

A macOS menu bar app that captures screenshots and converts them to searchable, formatted text using 100% local ML inference. No cloud APIs, no subscriptions, no data leaving your machine.

Built for academics, developers, researchers, and knowledge workers who deal with technical content — code, math, structured documents — and value privacy.

How It Works

Capture → Process → Store → Search
  1. Capture a region, window, or full screen via ScreenCaptureKit
  2. Process with local ML — Apple Vision OCR, IBM Docling, or Google Gemma
  3. Store automatically in a local SQLite database with full-text search
  4. Search across all your captures instantly

Processing Modes

Hold modifier keys during capture to select a pipeline:

Mode Modifier Pipeline Best For
Vision (none) Apple Vision OCR Fast narrative text
Docling ⌥ Option granite-docling-258M Tables, forms, structured docs
Vision + Gemma ⇧ Shift Vision OCR → Gemma LLM Enhanced formatting, LaTeX
Docling + Gemma ⌥⇧ Both Docling → Gemma LLM Academic papers, complex LaTeX

Gemma is available in two sizes (4B and 12B) — selectable in Settings.

Comparison Mode

Enable in Settings to run all 6 pipelines in parallel on a single capture (Vision, Docling, each with no LLM / Gemma 4B / Gemma 12B) and compare results side-by-side.

Requirements

  • macOS 14.0+ (Sonoma)
  • Apple Silicon (M1 or later) — required for MLX inference
  • Xcode 15+ with Swift 5.9
  • Python 3.12+ (via pyenv)
  • ~8 GB disk for ML models
  • ~6.5 GB RAM if using comparison mode (both Gemma models loaded)

Setup

1. Python Environment

pyenv install 3.12.10
pyenv shell 3.12.10
pip install mlx-lm mlx-vlm pillow docling-core huggingface-hub

Or use the setup script:

./scripts/dev_setup.sh

2. Download Models

cd models/

# Document understanding (~500MB)
huggingface-cli download ibm-granite/granite-docling-258M-mlx \
  --local-dir granite-docling-258M-mlx

# Gemma 3 4B — lighter, faster (~2GB)
huggingface-cli download mlx-community/gemma-3-4b-it-4bit \
  --local-dir gemma-3-4b-it-4bit

# Gemma 3 12B — higher quality (~4GB)
huggingface-cli download mlx-community/gemma-3-12b-it-4bit \
  --local-dir gemma-3-12b-it-4bit

Or use the download script:

./scripts/download_model.sh

3. Build & Run

xcodebuild -scheme SnapScribe -configuration Debug build

Then open the built app:

open ~/Library/Developer/Xcode/DerivedData/SnapScribe-*/Build/Products/Debug/SnapScribe.app

Or open SnapScribe.xcodeproj in Xcode and hit ⌘R.

Note: Debug builds use hardcoded paths to ~/.pyenv/versions/3.12.10/bin/python3.12 and the local models/ directory. See DoclingProcessor.swift, GemmaProcessor.swift, and GemmaModel.swift.

Architecture

┌─────────────────────────────────────────────────────────┐
│                  SwiftUI Menu Bar App                    │
│                                                         │
│   MenuBarContentView ── AppState ── LibraryWindow       │
│        (Capture/History)     │      (3-column browser)  │
│                              │                          │
│                     ProcessorManager                    │
│                      (Singleton)                        │
└──────────────────────────┬──────────────────────────────┘
                           │
              ┌────────────┼────────────────┐
              │            │                │
        ┌─────▼────┐  ┌───▼──────────┐  ┌──▼──────────┐
        │ VisionOCR │  │ Docling      │  │ Gemma       │
        │ (native)  │  │ (Python IPC) │  │ (Python IPC)│
        └─────┬─────┘  └──────────────┘  └─────────────┘
              │
        ┌─────▼──────┐
        │CaptureStore │ ← SQLite + FTS5
        └────────────┘

Key Directories

Directory Contents
SnapScribe/App/ UI views — menu bar, history, library, comparison
SnapScribe/Capture/ ScreenCaptureKit integration, region selection
SnapScribe/Inference/ ML processors — Vision, Docling, Gemma, embeddings
SnapScribe/Storage/ SQLite database, data models, FTS5 search
SnapScribe/Settings/ Settings window
SnapScribe/Resources/ Python inference servers
models/ ML model weights (git-ignored, ~8GB)
scripts/ Dev setup, model download, bundling, signing

Python IPC

Swift communicates with Python inference servers over stdin/stdout JSON:

Request:  {"id": "uuid", "action": "convert|enhance|ping", "image": "base64", ...}
Response: {"id": "uuid", "status": "success|error", "markdown": "...", ...}

Servers emit {"status": "ready"} once the model is loaded into memory. Models stay resident via ProcessorManager — first call takes 5-10s, subsequent calls are fast.

Database

Stored at ~/Library/Application Support/SnapScribe/captures.db (SQLite, auto-migrating schema).

  • captures — screenshot metadata, OCR text, enhanced text, user edits, notes, tags, thumbnails
  • folders — hierarchical organization
  • comparison_results — parallel pipeline comparison data
  • captures_fts — FTS5 virtual table for full-text search across all text fields

Scripts

Script Purpose
scripts/dev_setup.sh Full dev environment setup (validates Apple Silicon, macOS, Python)
scripts/download_model.sh Download ML models from HuggingFace
scripts/bundle_python.sh Bundle Python runtime for distribution
scripts/create_dmg.sh Create distributable DMG
scripts/sign_app.sh Code signing

Known Limitations

  • Debug paths are hardcoded — Python and model paths point to the dev machine. Release builds will need a bundled runtime.
  • App Sandbox disabled in Debug to allow Python subprocess access.
  • Not App Store distributable — requires screen recording permission (direct distribution only).
  • No global keyboard shortcuts yet — KeyboardShortcuts package is integrated but not wired up.

License

All rights reserved.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors