Open
Conversation
git clone in the seed block runs from a working directory that was just
removed, leaving /vault with no checkout ("this operation must be run in
a work tree"). Switch to / before the rm so the subsequent clone has a
valid cwd.
Adds a supervised background loop inside the autoblog container that scans /vault/raw every INGEST_POLL_INTERVAL seconds, filters by mtime stability, size, draft frontmatter, and sha-based dedup, then spawns claude -p once per byte-budgeted batch so each ingest pass runs in a fresh Claude Code context. Log entries gain a sha:<prefix> suffix; a one-shot migration backfills pre-existing entries on first boot.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ingest-loop.shdaemon inside the autoblog container that scans/vault/raw/on an interval, filters by mtime stability + size +status: draftfrontmatter + sha dedup, batches by byte budget, and spawns one freshclaude -pper batch invoking the existingingest-sourceskill.log.mdingest entries with asha:<12hex>suffix for dedup;migrate-log-sha.shbackfills pre-existing entries on first boot (idempotent, marker-gated).bootstrap-volumes.shnowcd /before removing/tmp/vault-seedso the subsequentgit clone /vault-remote.git /vaulthas a valid cwd (previously left/vaultwith no checkout).Implementation notes
entrypoint.shviastart-stop-daemon --chuid autoblogalongside astro dev and sshd. Logs to/var/log/ingest-loop.log..env.example:INGEST_ENABLED,INGEST_POLL_INTERVAL(300),INGEST_STABILITY_SECS(60),INGEST_BATCH_BYTES(200000),INGEST_MAX_BATCHES_PER_WAKE(50). DefaultCLAUDE_PERM_FLAGS="--dangerously-skip-permissions"(verified against installed CLI).sha:suffix;verify_and_patch_logpatches the log in place if the LLM forgets.wiki/sources/<date>-<slug>.mdfilename — date reflects first ingest, not re-ingest.Test plan
docker compose build && docker compose up -dclean boot, container healthy.ingest-loop.shrunning asautoblog;/var/log/ingest-loop.logwritable.claude -p, creates awiki/sources/<date>-<slug>.mdpage (verified frontmatter + wikilinks), updatesindex.md, appends## [YYYY-MM-DD] ingest | raw/<path> sha:<12hex>tolog.md, commits and pushes.re-ingest | ... sha:<new>log line and source page updated in place.status: draftfrontmatter is not ingested; flipping tostatus: readytriggers ingest.STABILITY_SECSis deferred to the next poll.INGEST_ENABLED=0disables without breaking the container.ingest | raw/...entries gainsha:suffix on first boot.ingestvia SSH/claude still works unchanged.Notes / follow-ups
printf "\0"in awk; Debian bookworm ships mawk which silently drops\0. Build_batches usesprintf "%c", 0instead — works in mawk + gawk + nawk..ingest-migration-donemarker lives at/vault/.ingest-migration-doneand gets committed to the vault by the skill'sgit add -A. Probably belongs in a.gitignoreor outside the working tree — minor follow-up.claude -p; loop-side overhead is negligible. Consider porting the batch builder + log-patching logic to TypeScript if the loop grows state (queues, metrics) — Node 20 is already in the image.