Summary
A central fleet registry inside TangleClaw — a sibling to PortHub. Where PortHub prevents port conflicts across projects, FleetHub is the single source of truth for the AI inference fleet: the inference backend(s) and the consumer nodes that hit them, their wiring, capacity, and keys. Eventually a TangleClaw master-agent session reads FleetHub to coordinate/rewire the fleet.
Why — motivating case study (real, from a 2026-06 Monad-1 session)
Wiring a fleet of OpenClaw nodes (Volta, TiltClaw, RentalClaw) onto a shared Monad-1 inference backend surfaced the gap repeatedly:
- Fleet state lived scattered across individual Claude/TC session memories + per-node configs. Session/context loss forced re-deriving "which node runs what" multiple times in one effort.
- No source of truth for wiring → a node's LiteLLM key was scoped to an old model set and 403'd on the new model (
key_model_access_denied), only discovered at runtime.
- No capacity model → two large models pinned the single GPU at its VRAM ceiling, causing silent eviction/thrashing and minutes-long cold-reload timeouts. A registry aware of model sizes + GPU capacity would have flagged it before wiring.
- Coordination didn't scale → changes were hand-relayed between N per-node sessions, risking two sessions editing the same machine's files (the cross-session write-boundary problem).
As the fleet grows (more OpenClaw nodes), this only gets worse. PortHub solved exactly this shape for ports; the inference fleet needs the same.
Entities to register
- Backends (inference servers): id, host/tailnet, endpoints (e.g. LiteLLM
:4000, Ollama :11434, ollama-tee :11435), GPU/VRAM capacity, currently-resident models.
- Models: alias, backend route (e.g.
ollama_chat/...), size/VRAM, capabilities (tool-calling, context length, reasoning, tok/s), quant.
- Nodes (consumers): id, host/tailnet, type (e.g. OpenClaw Pattern A/B), assigned backend + model, fallback chain, key alias, config location.
- Keys: per-node key alias, allowed-model scope, rate limits (integrate with LiteLLM key mgmt — would have prevented the 403 above).
- Capacity / residency: resident set + VRAM math, contention/headroom warnings.
API (mirror PortHub)
GET /api/fleet — list backends / models / nodes.
POST /api/fleet/nodes — register/update a node's wiring (backend, model, key, config path).
POST /api/fleet/assign — assign a model to a node with a capacity/conflict pre-check (like PortHub's conflict check before claiming a port — refuse/warn if VRAM can't hold it alongside resident models).
POST /api/fleet/release — node decommission.
- Capacity query: "can backend X hold model Y alongside its current residents?"
Integration with TC shared dirs / project groups
TangleClaw already has categorized shared project directories. FleetHub should:
- Register node config locations within a group's shared dir, know what each file is for (compose,
.env, openclaw.json), and offer a suggested structure / conventions for fleet nodes.
- Inject current fleet state into sessions at launch (like shared-docs injection) so any session — and the future master agent — knows the live fleet without re-deriving it. Directly fixes the context-loss problem above.
Coordination (future)
The eventual TC master-agent session reads FleetHub to rewire nodes (swap models, rotate keys, rebalance capacity). FleetHub is the registry; the master agent is the actor. This issue covers the registry; the coordinator is a follow-on.
Relationship to PortHub
Same architectural pattern, different resource:
- PortHub = ports; conflict-prevention; lease/release.
- FleetHub = inference fleet (backends, models, nodes, keys, capacity); assignment + capacity-conflict prevention.
Open questions (to flesh out)
- Scope: just the inference fleet, or any "fleet of agents/services"?
- How tightly to integrate with LiteLLM (keys, model list) vs. keep FleetHub provider-agnostic.
- Capacity model fidelity: static VRAM estimates vs. live
ollama ps / nvidia-smi polling.
- Auth/ownership: who may mutate fleet entries.
Drafted from a live Monad-1 fleet-wiring session (the case study above). Expand as needed.
Summary
A central fleet registry inside TangleClaw — a sibling to PortHub. Where PortHub prevents port conflicts across projects, FleetHub is the single source of truth for the AI inference fleet: the inference backend(s) and the consumer nodes that hit them, their wiring, capacity, and keys. Eventually a TangleClaw master-agent session reads FleetHub to coordinate/rewire the fleet.
Why — motivating case study (real, from a 2026-06 Monad-1 session)
Wiring a fleet of OpenClaw nodes (Volta, TiltClaw, RentalClaw) onto a shared Monad-1 inference backend surfaced the gap repeatedly:
key_model_access_denied), only discovered at runtime.As the fleet grows (more OpenClaw nodes), this only gets worse. PortHub solved exactly this shape for ports; the inference fleet needs the same.
Entities to register
:4000, Ollama:11434, ollama-tee:11435), GPU/VRAM capacity, currently-resident models.ollama_chat/...), size/VRAM, capabilities (tool-calling, context length, reasoning, tok/s), quant.API (mirror PortHub)
GET /api/fleet— list backends / models / nodes.POST /api/fleet/nodes— register/update a node's wiring (backend, model, key, config path).POST /api/fleet/assign— assign a model to a node with a capacity/conflict pre-check (like PortHub's conflict check before claiming a port — refuse/warn if VRAM can't hold it alongside resident models).POST /api/fleet/release— node decommission.Integration with TC shared dirs / project groups
TangleClaw already has categorized shared project directories. FleetHub should:
.env,openclaw.json), and offer a suggested structure / conventions for fleet nodes.Coordination (future)
The eventual TC master-agent session reads FleetHub to rewire nodes (swap models, rotate keys, rebalance capacity). FleetHub is the registry; the master agent is the actor. This issue covers the registry; the coordinator is a follow-on.
Relationship to PortHub
Same architectural pattern, different resource:
Open questions (to flesh out)
ollama ps/nvidia-smipolling.Drafted from a live Monad-1 fleet-wiring session (the case study above). Expand as needed.