Skip to content

RFC-089 P0 — DGX Spark bring-up + tailnet join + Ollama install #810

@chipi

Description

@chipi

Status — 2026-06-06

P0 substantially complete. Embedding-shim deliverable (D4) is superseded by ADR-098 / PR #899 — the shim is deleted; embeddings run in-process via sentence_transformers on the host. Cross-VPS curl acceptance items (GHA runner / drill VPS / prod VPS) moved to #813 and #814 since they're gated on those phases existing.

Tonight's session landed: Tailscale SSH end-to-end via the new tailnet ACL (operator → tag:dgx-llm-host:22), Ollama bound to tailnet (OLLAMA_HOST=0.0.0.0 systemd drop-in), three baseline LLMs pulled (llama3.3:70b, qwen2.5:72b-instruct, gemma2:27b), and make dgx-verify green over tailnet. Pinning the model SHAs in docs/operations/DGX_MODEL_CATALOG.md is the remaining close-the-loop step and ships with the follow-up bundle PR.


Context

Phase P0 of RFC-089. Hardware-bring-up phase; no code changes in the podcast_scraper repo yet. Pure infra setup. Child of umbrella #809.

Deliverables

D1 — Hardware on tailnet ✓

  1. ✓ DGX Spark powered on, OS updated, NVIDIA drivers verified (nvidia-smi shows GB10).
  2. ✓ Tailscale installed; operator-account authentication.
  3. ✓ Tag set: tag:dgx-llm-host.
  4. ✓ MagicDNS hostname dgx-llm-1.tail6d0ed4.ts.net (renamed from spark-2c14 so the resolver stem ^dgx-llm(-N)?$ accepts it). Resolver script at scripts/ops/resolve_dgx_tailnet_host.sh.

D2 — Tailscale ACL update ✓

Original network ACL + ADR-098 follow-up: split autogroup:admin from tag:gha-deployer and add :22 for the operator so Tailscale SSH works (autogroup:admin → tag:dgx-llm-host:22,11434,8001). Plus a new top-level ssh block (autogroup:admin → tag:dgx-llm-host for users markodragoljevic + root). Live ACL synced via admin console; repo tailscale/policy.hujson follows in the bundle PR alongside the catalog SHA update.

D3 — Ollama install + 4 baseline models — partial

  1. ✓ Ollama installed + bound to tailnet (Environment="OLLAMA_HOST=0.0.0.0" systemd drop-in at /etc/systemd/system/ollama.service.d/override.conf).
  2. Pull baseline models:
    • llama3.3:70b (42.5 GB)
    • qwen2.5:72b-instruct (47.4 GB) — RFC-089 originally said qwen2.5:72b-instruct, correct
    • gemma2:27b (15.6 GB) — RFC-089 said gemma2:27b-instruct but that tag doesn't exist; gemma2:27b IS the instruct-tuned default in Ollama
    • Deferred to RFC-089 P4 — Prod Whisper primary on DGX with cloud fallback (ADR-096 contract) #814: Whisper Large v3 — not in the Ollama library; mechanism choice (Ollama vision/audio extension vs separate FastAPI shim vs another path) is the prod-Whisper-via-DGX phase's call
  3. /api/tags returns the 3 LLMs (+ operator-pulled gpt-oss:120b).
  4. ☐ Pin SHAs in docs/operations/DGX_MODEL_CATALOG.md — done in the bundle PR.

D4 — Embedding shim

Small FastAPI service on port 8001 wrapping sentence-transformers/all-MiniLM-L6-v2...

Superseded by ADR-098 (PR #899). Empirical A/B on the operator's corpus showed MiniLM beats nomic-embed-text under production-realistic chunking; running embeddings in-process on the host has lower latency (~7 ms p50 vs ~33 ms p50 over HTTP) and removes the DGX availability dependency. The shim is deleted; the architectural option (vector_embedding_provider: ollama) remains for future evaluation.

D5 — DGX runbook ✓

docs/guides/DGX_RUNBOOK.md exists with the 4 sections.

Acceptance criteria

Remaining to close: the bundle PR with catalog SHA update + tailscale/policy.hujson SSH block + infra/dgx/converge/inventory.py paramiko kwargs.

Out of scope (deferred to later phases)

References

Metadata

Metadata

Assignees

Labels

dgxDGX Spark tailnet integration (RFC-089) — non-prod LLM + embedding backendinfrastructure

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions