AGENTS.md

Project Overview

stash is a local-first, Pocket-like CLI app built with TypeScript and SQLite.

Primary goal:

Save links quickly.
Organize links with tags.
Query data in deterministic, machine-friendly JSON for AI agent workflows.

Current implementation status:

Implemented: save, list, tags list, tag add, tag rm, mark read, mark unread, plus read/unread aliases.
Implemented: async tts job system (stash tts, stash tts status, stash tts doctor, stash jobs worker) with durable SQLite state.
Implemented: web/API TTS playback metadata (item_audio, latest-per-item) and extracted-content availability flags.
Implemented: migration tooling (db migrate, db doctor) and baseline schema.
Implemented: automatic migration application for normal data commands.
Implemented: content extraction on save using Mozilla Readability (stores in notes table).
Implemented: thumbnail extraction (metadata-first with content-image fallback) persisted on items.thumbnail_url.
Implemented: automatic X/Twitter status/<id> extraction via Playwright Chromium rendered DOM (public-only, single-status only, no generic fallback for X URLs).
Implemented: extract command to extract or re-extract content for existing items.
Implemented: optional local auto-tagging on save/extract (--auto-tags) with deterministic rules + Python sentence-transformers backend fallback behavior.
Implemented: auto-tag provenance tracking in item_tags (is_manual, is_auto, auto_score, auto_source, auto_model, auto_updated_at).
Implemented: stash tags doctor diagnostics for local auto-tags runtime checks.
Not implemented yet: archive, delete, open, search command.

Stack

Node.js 22.x (see .nvmrc; .node-version is included for tool compatibility)
TypeScript
Commander (CLI framework)
Fastify (API server framework)
SQLite via better-sqlite3
dotenv for local script .env loading
Drizzle ORM + Drizzle Kit (schema/migrations)
Package manager: pnpm
Content extraction: Mozilla Readability + linkedom
X/Twitter extraction: Playwright Chromium (headless browser) for public status/<id> URLs
Auto-tags embeddings backend: Python sentence-transformers helper (scripts/auto-tags-embed.py)
TTS provider (default): Coqui TTS (Python 3.11 + espeak-ng)
Default Coqui voice: tts_models/en/vctk/vits|p241
CLI discovery standardized across providers: PATH first, optional env overrides (STASH_GTTS_CLI, STASH_COQUI_TTS_CLI, STASH_FFMPEG_CLI, STASH_SAY_CLI, STASH_AFCONVERT_CLI, STASH_ESPEAK_CLI)
Fallback providers available: Google TTS (gtts), macOS say
Web frontend stack: React + Vite + Material UI
Installed design-review skills:
- web-design-guidelines for baseline UI/accessibility audits
- ui-ux-pro-max for design pattern exploration and ideation

Repository Layout

apps/cli/src/cli.ts: Main CLI command handlers (including stash web).
apps/api/: Local REST API server (Fastify) + PWA static/proxy split-stack orchestration.
apps/web/: React web frontend (feature-centered structure).
packages/core/: Shared DB/domain logic used by CLI + API app.
scripts/with-env.mjs: Script wrapper to auto-load .env for local npm scripts.
drizzle/: SQL migration files.
drizzle.config.ts: Drizzle config.
dist/: compiled output.

Setup

Select the repo Node version (Node 22):

nvm use

Install dependencies:

pnpm install

Create local env file:

cp .env.example .env

Bootstrap:

pnpm run setup

If SQLite native binding errors appear (Could not locate the bindings file) or you recently switched Node versions (ABI mismatch), allow native builds and rebuild/reinstall after nvm use:

nvm use
pnpm approve-builds
pnpm rebuild better-sqlite3
pnpm install

Run the local web app (single command):

pnpm run web

For frontend development with Vite hot reload, use:

pnpm run dev:stack

dev:stack runs API on 4173 and Vite HMR on 5173.

pnpm run setup also installs Playwright Chromium for X/Twitter status/<id> extraction.

Manual recovery (for browser cache cleanup or Playwright updates):

pnpm exec playwright install chromium

Web server defaults (overridable in .env or CLI flags):

STASH_WEB_HOST=0.0.0.0 is the recommended daemon/Tailnet-friendly override.
When STASH_WEB_HOST is unset, foreground stash web defaults to 127.0.0.1 and stash web --daemon defaults to 0.0.0.0.
STASH_API_PORT=4173
STASH_PWA_PORT=5173

Database and Migrations

Default DB path:

~/.stash/stash.db

Override path:

CLI flag: --db-path <path>
or env var: STASH_DB_PATH=<path>

Local development default:

.env.example sets STASH_DB_PATH=.db/stash.db
scripts dev, setup, start, db:migrate, and db:doctor auto-load .env

Run migration status check:

pnpm run db:doctor -- --json

Apply migrations (usually optional, CLI auto-applies pending migrations on normal commands):

pnpm run db:migrate -- --json

Generate new migrations from packages/core/src/db/schema.ts changes:

pnpm run db:generate

Content Extraction

When saving URLs, stash automatically:

Fetches the web page
Extracts readable content using Mozilla Readability
Stores the text in the notes table
Extracts a thumbnail URL (og:image/twitter:image first, article image fallback) into items.thumbnail_url
Updates the item title if extraction finds a better one
For public X/Twitter status/<id> URLs, renders the page in Playwright Chromium and extracts from the rendered DOM
X extraction is public-only, single-status only, and does not fall back to generic Readability on failure (strict no-partial-text behavior)

To skip extraction (for faster saves or non-article URLs):

stash save https://example.com --no-extract

Core Commands

Save URL:

stash save https://example.com --title "Example" --tag ai --tag typescript --json
stash save https://example.com --auto-tags --json

Save without content extraction:

stash save https://example.com --title "Example" --tag ai --no-extract --json

List items:

stash list --status unread --tag ai --tag-mode all --limit 20 --offset 0 --json

List available tags:

stash tags list --limit 50 --offset 0 --json

Add/remove item tag:

stash tag add 1 ai --json
stash tag rm 1 ai --json

Mark read/unread:

stash mark read 1 --json
stash mark unread 1 --json

Aliases:

stash read 1 --json
stash unread 1 --json

Generate TTS audio:

stash tts 1 --json
stash tts 1 --wait --json
stash tts status 12 --json
stash tts doctor --json
stash jobs worker --once --json

Extract or re-extract content:

stash extract 1 --json
stash extract 1 --force --json
stash extract 1 --auto-tags --json
stash tags doctor --json

UI/UX Workflow for Agents

When making web UI changes, use this sequence:

Implement feature/layout change in apps/web.
Run build checks:

pnpm run build
pnpm --dir apps/web build

Run a quick review with web-design-guidelines (primary quality gate).
Use ui-ux-pro-max only when extra pattern/style exploration is needed.

Guidance:

Treat web-design-guidelines as the baseline checklist.
Treat ui-ux-pro-max as inspiration/reference, not strict rules.
Preserve stash visual identity while improving mobile usability.

Agent-Friendly Interface Contract

Determinism

list sort order is fixed: created_at DESC, id DESC.
Tag normalization: trim + lowercase.
Repeated --tag values are de-duplicated after normalization.

JSON Mode

All major read/mutation commands support --json.

Typical success shape:

{
  "ok": true
}

Typical error shape:

{
  "ok": false,
  "error": {
    "code": "VALIDATION_ERROR",
    "message": "Item id must be a positive integer."
  }
}

Exit Codes

0: success
1: internal/unexpected error
2: validation or usage/migration-required errors
3: not found
4: reserved for conflict (not currently emitted by all commands yet)

Schema Summary (Current)

items
- Core bookmark record and state: status, timestamps, URL metadata, extracted thumbnail URL (thumbnail_url).
tags
- Unique tag names.
item_tags
- Many-to-many link between items and tags with provenance metadata:
  - is_manual, is_auto, auto_score, auto_source, auto_model, auto_updated_at.
notes
- Optional per-item note content.
item_audio
- Latest generated TTS artifact metadata per item (file_name, provider/voice/format, bytes, timestamp).
tts_jobs
- Durable async TTS queue records (queued|running|succeeded|failed) with per-job error/output metadata.

Initial SQL migration:

drizzle/0000_init.sql

Development Workflow

Edit code in apps/cli/src/ and packages/.
Build with pnpm run build.
Run DB checks:

pnpm run db:doctor -- --json

Test CLI behavior from compiled output:

node apps/cli/dist/cli.js --help
node apps/cli/dist/cli.js list --help

Pre-PR Handoff Gate

Before opening a PR/MR, run local quality gates and keep them green.

Code changes: must run pnpm run check and fix all failures before PR/MR creation.
Docs-only changes: must run pnpm run format:check before PR/MR creation.
PR flow: use existing gh tooling (gh pr create), then watch checks with gh pr checks --watch.
If CI fails, fix locally, re-run the local gate, push, and re-check until all required checks are green.

Documentation Maintenance Rule

⚠️ When adding new features or modifying existing CLI behavior, update ALL THREE documentation files in the same change:

AGENTS.md - Technical details and implementation notes
README.md - User-facing overview and quick start
docs/CLI_REFERENCE.md - Detailed command reference

This ensures all documentation stays in sync.

Updates should include:

New commands/subcommands
New flags/options/defaults
Output shape changes (especially --json)
Error/exit-code behavior changes
Dependencies or stack changes
Architectural decisions

Implementation Notes

The CLI strips a standalone -- separator in argv parsing to keep pnpm run <script> -- --json working.
setup builds and runs migrations for first-run convenience.
setup also installs Playwright Chromium (pnpm exec playwright install chromium) so X/Twitter headless extraction works after bootstrap.
Normal data commands auto-run pending migrations.
.env is git-ignored; .env.example is committed as the local template.
.db/ is git-ignored local runtime data for repository-local development.
Local npm scripts load .env using dotenv via scripts/with-env.mjs.
CLI DB path precedence remains: --db-path > STASH_DB_PATH > ~/.stash/stash.db.
Auto-tags env controls:
- STASH_AUTO_TAGS_ENABLED, STASH_AUTO_TAGS_MAX, STASH_AUTO_TAGS_MIN_SCORE
- STASH_AUTO_TAGS_MODEL, STASH_AUTO_TAGS_BACKEND, STASH_AUTO_TAGS_PYTHON, STASH_AUTO_TAGS_HELPER
stash web now supports foreground and daemon control flags:
- stash web / stash web --foreground run attached in the current terminal.
- stash web --daemon starts a detached supervisor that restarts the combined web runner on unexpected exit.
- stash web --status reads daemon state/log metadata.
- stash web --stop sends SIGTERM to the daemon supervisor and waits for shutdown.
stash web accepts --host, --api-port, --pwa-port; --status/--stop reject host/port overrides.
stash web --daemon is idempotent for agent callers: a second start returns the existing daemon state instead of failing.
stash web keeps API + PWA in one runtime process; the daemon supervises that single runner rather than separate API/PWA children.
stash web persists daemon files under ~/.stash/ with workspace fallback .stash/ (web-daemon.pid, web-daemon.log, web-daemon.state.json).
stash web startup/status output includes local URLs and best-effort Tailnet URLs from tailscale status --json; loopback binds warn that Tailnet access is unavailable.
stash web still fails fast on identical API/PWA ports.
Web dev (apps/web Vite) reads the same root .env port variables and uses strict port binding.
pnpm run dev:stack starts API (pnpm run dev:api, default API port 4173) and Vite HMR (pnpm run dev:web, default 5173) together for frontend hot reload.
Web UI uses a single mobile-first, single-column layout path across all viewport sizes; desktop split-pane rendering paths were intentionally removed.
Async TTS defaults:
- stash tts <id> enqueues and returns immediately.
- stash tts <id> --wait waits for terminal status.
- stash tts doctor validates local Coqui/espeak/ffmpeg dependencies and reports CLI flag compatibility.
- stash jobs worker runs queue processing loop (--once for one job).
Web/API item payloads now include:
- has_extracted_content: boolean
- tts_audio: null | { file_name, format, provider, voice, bytes, generated_at }
POST /api/items/:id/tts is enqueue-first and returns job metadata (job, poll_url, poll_interval_ms).
POST /api/items and POST /api/items/:id/extract accept optional autoTags: boolean.
Web save UX requests auto-tags when the tags field is left empty.
GET /api/tts-jobs/:id and GET /api/items/:id/tts-jobs expose job status/history for polling/recovery.
tts auto-generated filenames use friendly slugs + timestamp + short random suffix and collision fallback (_2, _3, ...).
Vitest sets STASH_TTS_MOCK_BASE64 in test/vitest.setup.ts for deterministic TTS tests; default pnpm test does not require local Coqui/espeak binaries.
extractContent() routes supported X/Twitter status/<id> URLs to a Playwright-based renderer/parser and may throw typed extraction errors (dependency/launch/timeout/render-blocked) so CLI/API layers can surface actionable diagnostics while preserving EXTRACTION_FAILED.

Near-Term Roadmap

Add archive, delete, open.
Add search command (full-text search leveraging extracted content).
Add PDF export for offline reading.
Add import/export.

Agent notes

Every time you learn something new, or how to do something in the codebase, if you make a mistake that the user corrects, if you find yourself running commands that are often wrong and have to tweak them: write all of this down in .agents/notes.md. This is a file just for you that your user won't read.
If you're about to write to it, first check if what you're writing (the idea, not 1:1) is already present. If so, increment the counter in the prefix (eg from [0] to [1]). If it's completely new, prefix it with [0]. Once a comment hits the count of 3, codify it into this AGENTS.md file in the ## Misc section.

Misc

Prefer Drizzle ORM for database access in application/runtime code.
Use raw SQL string queries only when Drizzle does not support the required functionality clearly or safely (for example, specialized migration-runner behavior).
Prefer explicit return types on functions and methods (especially exported/public APIs and non-trivial helpers).
Prefer explicit named types over inferred meta-types like ReturnType<typeof ...> in app/core code and tests.
Dependency injection pattern: prefer createXService/createCoreServices factory objects over classes, and inject least-privilege service slices into API route plugins and command handlers.
In this sandbox, local listener startup (for example Vite/dev servers on 127.0.0.1) may fail with listen EPERM; validate listener-bound flows in a less restricted environment when needed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AGENTS.md

Project Overview

Stack

Repository Layout

Setup

Database and Migrations

Content Extraction

Core Commands

UI/UX Workflow for Agents

Agent-Friendly Interface Contract

Determinism

JSON Mode

Exit Codes

Schema Summary (Current)

Development Workflow

Pre-PR Handoff Gate

Documentation Maintenance Rule

Implementation Notes

Near-Term Roadmap

Agent notes

Misc

FilesExpand file tree

AGENTS.md

Latest commit

History

AGENTS.md

File metadata and controls

AGENTS.md

Project Overview

Stack

Repository Layout

Setup

Database and Migrations

Content Extraction

Core Commands

UI/UX Workflow for Agents

Agent-Friendly Interface Contract

Determinism

JSON Mode

Exit Codes

Schema Summary (Current)

Development Workflow

Pre-PR Handoff Gate

Documentation Maintenance Rule

Implementation Notes

Near-Term Roadmap

Agent notes

Misc