RFC-089 P0 — DGX Spark bring-up + tailnet join + Ollama install

## Status — 2026-06-06

P0 substantially complete. Embedding-shim deliverable (D4) is **superseded by [ADR-098](../blob/main/docs/adr/ADR-098-embedding-provider-profile-axis.md) / PR #899** — the shim is deleted; embeddings run in-process via `sentence_transformers` on the host. Cross-VPS curl acceptance items (GHA runner / drill VPS / prod VPS) **moved to #813 and #814** since they're gated on those phases existing.

Tonight's session landed: Tailscale SSH end-to-end via the new tailnet ACL (operator → `tag:dgx-llm-host:22`), Ollama bound to tailnet (`OLLAMA_HOST=0.0.0.0` systemd drop-in), three baseline LLMs pulled (`llama3.3:70b`, `qwen2.5:72b-instruct`, `gemma2:27b`), and `make dgx-verify` green over tailnet. Pinning the model SHAs in `docs/operations/DGX_MODEL_CATALOG.md` is the remaining close-the-loop step and ships with the follow-up bundle PR.

---

## Context

Phase P0 of [RFC-089](../blob/main/docs/rfc/RFC-089-dgx-spark-tailnet-integration.md). Hardware-bring-up phase; no code changes in the podcast_scraper repo yet. Pure infra setup. Child of umbrella **#809**.

## Deliverables

### D1 — Hardware on tailnet ✓

1. ✓ DGX Spark powered on, OS updated, NVIDIA drivers verified (`nvidia-smi` shows GB10).
2. ✓ Tailscale installed; operator-account authentication.
3. ✓ Tag set: `tag:dgx-llm-host`.
4. ✓ MagicDNS hostname `dgx-llm-1.tail6d0ed4.ts.net` (renamed from `spark-2c14` so the resolver stem `^dgx-llm(-N)?$` accepts it). Resolver script at `scripts/ops/resolve_dgx_tailnet_host.sh`.

### D2 — Tailscale ACL update ✓

Original network ACL + ADR-098 follow-up: split `autogroup:admin` from `tag:gha-deployer` and add `:22` for the operator so Tailscale SSH works (`autogroup:admin → tag:dgx-llm-host:22,11434,8001`). Plus a new top-level `ssh` block (`autogroup:admin → tag:dgx-llm-host` for users `markodragoljevic` + `root`). Live ACL synced via admin console; repo `tailscale/policy.hujson` follows in the bundle PR alongside the catalog SHA update.

### D3 — Ollama install + 4 baseline models — **partial**

1. ✓ Ollama installed + bound to tailnet (`Environment="OLLAMA_HOST=0.0.0.0"` systemd drop-in at `/etc/systemd/system/ollama.service.d/override.conf`).
2. Pull baseline models:
   - ✓ `llama3.3:70b` (42.5 GB)
   - ✓ `qwen2.5:72b-instruct` (47.4 GB) — RFC-089 originally said `qwen2.5:72b-instruct`, correct
   - ✓ `gemma2:27b` (15.6 GB) — RFC-089 said `gemma2:27b-instruct` but that tag doesn't exist; `gemma2:27b` IS the instruct-tuned default in Ollama
   - **Deferred to #814:** Whisper Large v3 — not in the Ollama library; mechanism choice (Ollama vision/audio extension vs separate FastAPI shim vs another path) is the prod-Whisper-via-DGX phase's call
3. ✓ `/api/tags` returns the 3 LLMs (+ operator-pulled `gpt-oss:120b`).
4. ☐ Pin SHAs in `docs/operations/DGX_MODEL_CATALOG.md` — done in the bundle PR.

### ~~D4 — Embedding shim~~

~~Small FastAPI service on port 8001 wrapping `sentence-transformers/all-MiniLM-L6-v2`...~~

**Superseded by [ADR-098](../blob/main/docs/adr/ADR-098-embedding-provider-profile-axis.md) (PR #899).** Empirical A/B on the operator's corpus showed MiniLM beats nomic-embed-text under production-realistic chunking; running embeddings in-process on the host has lower latency (~7 ms p50 vs ~33 ms p50 over HTTP) and removes the DGX availability dependency. The shim is deleted; the architectural option (`vector_embedding_provider: ollama`) remains for future evaluation.

### D5 — DGX runbook ✓

`docs/guides/DGX_RUNBOOK.md` exists with the 4 sections.

## Acceptance criteria

- [x] `curl http://dgx-llm-1.<tailnet>:11434/api/tags` from operator laptop returns the model list (3 baselines + gpt-oss present; verified `make dgx-verify` green tonight)
- [ ] ~~`curl http://dgx-llm-1.<tailnet>:8001/health` returns 200~~ — moot (ADR-098)
- [ ] ~~`curl` from a GHA tailnet runner (`tag:gha-deployer`) succeeds~~ — **moved to #813** (depends on runner registration)
- [ ] ~~`curl` from the drill VPS (when up) succeeds~~ — **moved to #814 prereq** (drill exercises prod profile, requires #814's `cloud_with_dgx_whisper_primary` to be active)
- [ ] ~~`curl` from the prod VPS succeeds~~ — **moved to #814** (ACL permits per ADR-096; activation happens when prod profile flips)
- [x] `scripts/ops/resolve_dgx_tailnet_host.sh` exists + functional parity with prod / drill resolvers
- [ ] `docs/operations/DGX_MODEL_CATALOG.md` with pinned SHAs **(catalog file exists; SHAs pending bundle PR)**
- [x] `docs/guides/DGX_RUNBOOK.md` exists with the 4 sections
- [x] No code changes to `src/podcast_scraper/` in this phase (the #897 work was its own phase / belongs to #812)

**Remaining to close:** the bundle PR with catalog SHA update + `tailscale/policy.hujson` SSH block + `infra/dgx/converge/inventory.py` paramiko kwargs.

## Out of scope (deferred to later phases)

- `tailnet_dgx` provider implementation — **P2 / #812**
- Profile YAMLs (`local_dgx_*`) — **delivered partial in #897 (#812 closes the rest)**
- `cloud_with_dgx_whisper_primary` Whisper provider — **#814**
- GHA self-hosted runner registration — **#813**
- AI comparison guide updates with real measurements — **#812**
- Embedding shim — **superseded by ADR-098 (#897)**

## References

- Umbrella: #809
- Supersession PR: #899 (ADR-098 — embedding shim deleted)
- [RFC-089 §Design](../blob/main/docs/rfc/RFC-089-dgx-spark-tailnet-integration.md)
- [ADR-096](../blob/main/docs/adr/ADR-096-dgx-spark-prod-primary-with-fallback.md)
- [ADR-098](../blob/main/docs/adr/ADR-098-embedding-provider-profile-axis.md)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC-089 P0 — DGX Spark bring-up + tailnet join + Ollama install #810

Status — 2026-06-06

Context

Deliverables

D1 — Hardware on tailnet ✓

D2 — Tailscale ACL update ✓

D3 — Ollama install + 4 baseline models — partial

D4 — Embedding shim

D5 — DGX runbook ✓

Acceptance criteria

Out of scope (deferred to later phases)

References

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

RFC-089 P0 — DGX Spark bring-up + tailnet join + Ollama install #810

Description

Status — 2026-06-06

Context

Deliverables

D1 — Hardware on tailnet ✓

D2 — Tailscale ACL update ✓

D3 — Ollama install + 4 baseline models — partial

D4 — Embedding shim

D5 — DGX runbook ✓

Acceptance criteria

Out of scope (deferred to later phases)

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions