You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
P0 substantially complete. Embedding-shim deliverable (D4) is superseded by ADR-098 / PR #899 — the shim is deleted; embeddings run in-process via sentence_transformers on the host. Cross-VPS curl acceptance items (GHA runner / drill VPS / prod VPS) moved to #813 and #814 since they're gated on those phases existing.
Tonight's session landed: Tailscale SSH end-to-end via the new tailnet ACL (operator → tag:dgx-llm-host:22), Ollama bound to tailnet (OLLAMA_HOST=0.0.0.0 systemd drop-in), three baseline LLMs pulled (llama3.3:70b, qwen2.5:72b-instruct, gemma2:27b), and make dgx-verify green over tailnet. Pinning the model SHAs in docs/operations/DGX_MODEL_CATALOG.md is the remaining close-the-loop step and ships with the follow-up bundle PR.
Context
Phase P0 of RFC-089. Hardware-bring-up phase; no code changes in the podcast_scraper repo yet. Pure infra setup. Child of umbrella #809.
Deliverables
D1 — Hardware on tailnet ✓
✓ DGX Spark powered on, OS updated, NVIDIA drivers verified (nvidia-smi shows GB10).
✓ MagicDNS hostname dgx-llm-1.tail6d0ed4.ts.net (renamed from spark-2c14 so the resolver stem ^dgx-llm(-N)?$ accepts it). Resolver script at scripts/ops/resolve_dgx_tailnet_host.sh.
D2 — Tailscale ACL update ✓
Original network ACL + ADR-098 follow-up: split autogroup:admin from tag:gha-deployer and add :22 for the operator so Tailscale SSH works (autogroup:admin → tag:dgx-llm-host:22,11434,8001). Plus a new top-level ssh block (autogroup:admin → tag:dgx-llm-host for users markodragoljevic + root). Live ACL synced via admin console; repo tailscale/policy.hujson follows in the bundle PR alongside the catalog SHA update.
D3 — Ollama install + 4 baseline models — partial
✓ Ollama installed + bound to tailnet (Environment="OLLAMA_HOST=0.0.0.0" systemd drop-in at /etc/systemd/system/ollama.service.d/override.conf).
Pull baseline models:
✓ llama3.3:70b (42.5 GB)
✓ qwen2.5:72b-instruct (47.4 GB) — RFC-089 originally said qwen2.5:72b-instruct, correct
✓ gemma2:27b (15.6 GB) — RFC-089 said gemma2:27b-instruct but that tag doesn't exist; gemma2:27b IS the instruct-tuned default in Ollama
✓ /api/tags returns the 3 LLMs (+ operator-pulled gpt-oss:120b).
☐ Pin SHAs in docs/operations/DGX_MODEL_CATALOG.md — done in the bundle PR.
D4 — Embedding shim
Small FastAPI service on port 8001 wrapping sentence-transformers/all-MiniLM-L6-v2...
Superseded by ADR-098 (PR #899). Empirical A/B on the operator's corpus showed MiniLM beats nomic-embed-text under production-realistic chunking; running embeddings in-process on the host has lower latency (~7 ms p50 vs ~33 ms p50 over HTTP) and removes the DGX availability dependency. The shim is deleted; the architectural option (vector_embedding_provider: ollama) remains for future evaluation.
D5 — DGX runbook ✓
docs/guides/DGX_RUNBOOK.md exists with the 4 sections.
Acceptance criteria
curl http://dgx-llm-1.<tailnet>:11434/api/tags from operator laptop returns the model list (3 baselines + gpt-oss present; verified make dgx-verify green tonight)
Status — 2026-06-06
P0 substantially complete. Embedding-shim deliverable (D4) is superseded by ADR-098 / PR #899 — the shim is deleted; embeddings run in-process via
sentence_transformerson the host. Cross-VPS curl acceptance items (GHA runner / drill VPS / prod VPS) moved to #813 and #814 since they're gated on those phases existing.Tonight's session landed: Tailscale SSH end-to-end via the new tailnet ACL (operator →
tag:dgx-llm-host:22), Ollama bound to tailnet (OLLAMA_HOST=0.0.0.0systemd drop-in), three baseline LLMs pulled (llama3.3:70b,qwen2.5:72b-instruct,gemma2:27b), andmake dgx-verifygreen over tailnet. Pinning the model SHAs indocs/operations/DGX_MODEL_CATALOG.mdis the remaining close-the-loop step and ships with the follow-up bundle PR.Context
Phase P0 of RFC-089. Hardware-bring-up phase; no code changes in the podcast_scraper repo yet. Pure infra setup. Child of umbrella #809.
Deliverables
D1 — Hardware on tailnet ✓
nvidia-smishows GB10).tag:dgx-llm-host.dgx-llm-1.tail6d0ed4.ts.net(renamed fromspark-2c14so the resolver stem^dgx-llm(-N)?$accepts it). Resolver script atscripts/ops/resolve_dgx_tailnet_host.sh.D2 — Tailscale ACL update ✓
Original network ACL + ADR-098 follow-up: split
autogroup:adminfromtag:gha-deployerand add:22for the operator so Tailscale SSH works (autogroup:admin → tag:dgx-llm-host:22,11434,8001). Plus a new top-levelsshblock (autogroup:admin → tag:dgx-llm-hostfor usersmarkodragoljevic+root). Live ACL synced via admin console; repotailscale/policy.hujsonfollows in the bundle PR alongside the catalog SHA update.D3 — Ollama install + 4 baseline models — partial
Environment="OLLAMA_HOST=0.0.0.0"systemd drop-in at/etc/systemd/system/ollama.service.d/override.conf).llama3.3:70b(42.5 GB)qwen2.5:72b-instruct(47.4 GB) — RFC-089 originally saidqwen2.5:72b-instruct, correctgemma2:27b(15.6 GB) — RFC-089 saidgemma2:27b-instructbut that tag doesn't exist;gemma2:27bIS the instruct-tuned default in Ollama/api/tagsreturns the 3 LLMs (+ operator-pulledgpt-oss:120b).docs/operations/DGX_MODEL_CATALOG.md— done in the bundle PR.D4 — Embedding shimSmall FastAPI service on port 8001 wrappingsentence-transformers/all-MiniLM-L6-v2...Superseded by ADR-098 (PR #899). Empirical A/B on the operator's corpus showed MiniLM beats nomic-embed-text under production-realistic chunking; running embeddings in-process on the host has lower latency (~7 ms p50 vs ~33 ms p50 over HTTP) and removes the DGX availability dependency. The shim is deleted; the architectural option (
vector_embedding_provider: ollama) remains for future evaluation.D5 — DGX runbook ✓
docs/guides/DGX_RUNBOOK.mdexists with the 4 sections.Acceptance criteria
curl http://dgx-llm-1.<tailnet>:11434/api/tagsfrom operator laptop returns the model list (3 baselines + gpt-oss present; verifiedmake dgx-verifygreen tonight)— moot (ADR-098)curl http://dgx-llm-1.<tailnet>:8001/healthreturns 200— moved to RFC-089 P3 — GHA self-hosted runner + pre-prod uses DGX (heavier infra) #813 (depends on runner registration)curlfrom a GHA tailnet runner (tag:gha-deployer) succeeds— moved to RFC-089 P4 — Prod Whisper primary on DGX with cloud fallback (ADR-096 contract) #814 prereq (drill exercises prod profile, requires RFC-089 P4 — Prod Whisper primary on DGX with cloud fallback (ADR-096 contract) #814'scurlfrom the drill VPS (when up) succeedscloud_with_dgx_whisper_primaryto be active)— moved to RFC-089 P4 — Prod Whisper primary on DGX with cloud fallback (ADR-096 contract) #814 (ACL permits per ADR-096; activation happens when prod profile flips)curlfrom the prod VPS succeedsscripts/ops/resolve_dgx_tailnet_host.shexists + functional parity with prod / drill resolversdocs/operations/DGX_MODEL_CATALOG.mdwith pinned SHAs (catalog file exists; SHAs pending bundle PR)docs/guides/DGX_RUNBOOK.mdexists with the 4 sectionssrc/podcast_scraper/in this phase (the feat(search): profile-configurable embedding provider — supersede RFC-089 §D4 shim #897 work was its own phase / belongs to RFC-089 P2 — tailnet_dgx provider + local_dgx_* profiles + AI comparison guide update #812)Remaining to close: the bundle PR with catalog SHA update +
tailscale/policy.hujsonSSH block +infra/dgx/converge/inventory.pyparamiko kwargs.Out of scope (deferred to later phases)
tailnet_dgxprovider implementation — P2 / RFC-089 P2 — tailnet_dgx provider + local_dgx_* profiles + AI comparison guide update #812local_dgx_*) — delivered partial in feat(search): profile-configurable embedding provider — supersede RFC-089 §D4 shim #897 (RFC-089 P2 — tailnet_dgx provider + local_dgx_* profiles + AI comparison guide update #812 closes the rest)cloud_with_dgx_whisper_primaryWhisper provider — RFC-089 P4 — Prod Whisper primary on DGX with cloud fallback (ADR-096 contract) #814References