Skip to content

feat: auto-ingest loop for /vault/raw#2

Open
tbrownio wants to merge 2 commits intomainfrom
worktree-auto-ingest
Open

feat: auto-ingest loop for /vault/raw#2
tbrownio wants to merge 2 commits intomainfrom
worktree-auto-ingest

Conversation

@tbrownio
Copy link
Copy Markdown
Collaborator

Summary

  • Adds a supervised ingest-loop.sh daemon inside the autoblog container that scans /vault/raw/ on an interval, filters by mtime stability + size + status: draft frontmatter + sha dedup, batches by byte budget, and spawns one fresh claude -p per batch invoking the existing ingest-source skill.
  • Extends log.md ingest entries with a sha:<12hex> suffix for dedup; migrate-log-sha.sh backfills pre-existing entries on first boot (idempotent, marker-gated).
  • Drive-by fix: bootstrap-volumes.sh now cd / before removing /tmp/vault-seed so the subsequent git clone /vault-remote.git /vault has a valid cwd (previously left /vault with no checkout).

Implementation notes

  • Loop launched from entrypoint.sh via start-stop-daemon --chuid autoblog alongside astro dev and sshd. Logs to /var/log/ingest-loop.log.
  • Tunables exposed in .env.example: INGEST_ENABLED, INGEST_POLL_INTERVAL (300), INGEST_STABILITY_SECS (60), INGEST_BATCH_BYTES (200000), INGEST_MAX_BATCHES_PER_WAKE (50). Default CLAUDE_PERM_FLAGS="--dangerously-skip-permissions" (verified against installed CLI).
  • Skill prose extended with an "Auto-ingest contract" section instructing the LLM to use the explicit file list and append the sha: suffix; verify_and_patch_log patches the log in place if the LLM forgets.
  • Failure handling: any failed batch flips a flat 1-hour backoff until the next success.
  • Auto-loop re-ingest reuses the existing wiki/sources/<date>-<slug>.md filename — date reflects first ingest, not re-ingest.

Test plan

  • docker compose build && docker compose up -d clean boot, container healthy.
  • ingest-loop.sh running as autoblog; /var/log/ingest-loop.log writable.
  • Push a markdown file to the vault remote via SSH; within one poll the loop detects it, runs claude -p, creates a wiki/sources/<date>-<slug>.md page (verified frontmatter + wikilinks), updates index.md, appends ## [YYYY-MM-DD] ingest | raw/<path> sha:<12hex> to log.md, commits and pushes.
  • Re-ingest path: change content, push; confirm re-ingest | ... sha:<new> log line and source page updated in place.
  • Draft skip: file with status: draft frontmatter is not ingested; flipping to status: ready triggers ingest.
  • Stability window: file pushed inside STABILITY_SECS is deferred to the next poll.
  • Oversize-batch: 300KB file + small file produce two distinct batches.
  • INGEST_ENABLED=0 disables without breaking the container.
  • First-run migration: pre-existing ingest | raw/... entries gain sha: suffix on first boot.
  • Manual ingest via SSH/claude still works unchanged.

Notes / follow-ups

  • The plan assumed printf "\0" in awk; Debian bookworm ships mawk which silently drops \0. Build_batches uses printf "%c", 0 instead — works in mawk + gawk + nawk.
  • .ingest-migration-done marker lives at /vault/.ingest-migration-done and gets committed to the vault by the skill's git add -A. Probably belongs in a .gitignore or outside the working tree — minor follow-up.
  • Per-batch latency is dominated by claude -p; loop-side overhead is negligible. Consider porting the batch builder + log-patching logic to TypeScript if the loop grows state (queues, metrics) — Node 20 is already in the image.

git clone in the seed block runs from a working directory that was just
removed, leaving /vault with no checkout ("this operation must be run in
a work tree"). Switch to / before the rm so the subsequent clone has a
valid cwd.
Adds a supervised background loop inside the autoblog container that scans
/vault/raw every INGEST_POLL_INTERVAL seconds, filters by mtime stability,
size, draft frontmatter, and sha-based dedup, then spawns claude -p once
per byte-budgeted batch so each ingest pass runs in a fresh Claude Code
context. Log entries gain a sha:<prefix> suffix; a one-shot migration
backfills pre-existing entries on first boot.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant