[feature] FleetHub — central fleet registry (sibling to PortHub) for the AI inference fleet

## Summary
A central **fleet registry** inside TangleClaw — a sibling to PortHub. Where PortHub prevents port conflicts across projects, **FleetHub** is the single source of truth for the AI inference fleet: the inference backend(s) and the consumer nodes that hit them, their wiring, capacity, and keys. Eventually a TangleClaw master-agent session reads FleetHub to coordinate/rewire the fleet.

## Why — motivating case study (real, from a 2026-06 Monad-1 session)
Wiring a fleet of OpenClaw nodes (Volta, TiltClaw, RentalClaw) onto a shared Monad-1 inference backend surfaced the gap repeatedly:
- **Fleet state lived scattered** across individual Claude/TC session memories + per-node configs. Session/context loss forced re-deriving "which node runs what" multiple times in one effort.
- **No source of truth for wiring** → a node's LiteLLM key was scoped to an old model set and 403'd on the new model (`key_model_access_denied`), only discovered at runtime.
- **No capacity model** → two large models pinned the single GPU at its VRAM ceiling, causing silent eviction/thrashing and minutes-long cold-reload timeouts. A registry aware of model sizes + GPU capacity would have flagged it *before* wiring.
- **Coordination didn't scale** → changes were hand-relayed between N per-node sessions, risking two sessions editing the same machine's files (the cross-session write-boundary problem).

As the fleet grows (more OpenClaw nodes), this only gets worse. PortHub solved exactly this shape for ports; the inference fleet needs the same.

## Entities to register
- **Backends** (inference servers): id, host/tailnet, endpoints (e.g. LiteLLM `:4000`, Ollama `:11434`, ollama-tee `:11435`), GPU/VRAM capacity, currently-resident models.
- **Models**: alias, backend route (e.g. `ollama_chat/...`), size/VRAM, capabilities (tool-calling, context length, reasoning, tok/s), quant.
- **Nodes** (consumers): id, host/tailnet, type (e.g. OpenClaw Pattern A/B), assigned backend + model, fallback chain, key alias, config location.
- **Keys**: per-node key alias, allowed-model scope, rate limits (integrate with LiteLLM key mgmt — would have prevented the 403 above).
- **Capacity / residency**: resident set + VRAM math, contention/headroom warnings.

## API (mirror PortHub)
- `GET /api/fleet` — list backends / models / nodes.
- `POST /api/fleet/nodes` — register/update a node's wiring (backend, model, key, config path).
- `POST /api/fleet/assign` — assign a model to a node **with a capacity/conflict pre-check** (like PortHub's conflict check before claiming a port — refuse/warn if VRAM can't hold it alongside resident models).
- `POST /api/fleet/release` — node decommission.
- Capacity query: "can backend X hold model Y alongside its current residents?"

## Integration with TC shared dirs / project groups
TangleClaw already has categorized shared project directories. FleetHub should:
- Register node config locations within a group's shared dir, know what each file is for (compose, `.env`, `openclaw.json`), and offer a **suggested structure / conventions** for fleet nodes.
- Inject current fleet state into sessions at launch (like shared-docs injection) so any session — and the future master agent — knows the live fleet without re-deriving it. Directly fixes the context-loss problem above.

## Coordination (future)
The eventual TC **master-agent session** reads FleetHub to rewire nodes (swap models, rotate keys, rebalance capacity). FleetHub is the *registry*; the master agent is the *actor*. This issue covers the registry; the coordinator is a follow-on.

## Relationship to PortHub
Same architectural pattern, different resource:
- **PortHub** = ports; conflict-prevention; lease/release.
- **FleetHub** = inference fleet (backends, models, nodes, keys, capacity); assignment + capacity-conflict prevention.

## Open questions (to flesh out)
- Scope: just the inference fleet, or any "fleet of agents/services"?
- How tightly to integrate with LiteLLM (keys, model list) vs. keep FleetHub provider-agnostic.
- Capacity model fidelity: static VRAM estimates vs. live `ollama ps` / `nvidia-smi` polling.
- Auth/ownership: who may mutate fleet entries.

---
_Drafted from a live Monad-1 fleet-wiring session (the case study above). Expand as needed._


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature] FleetHub — central fleet registry (sibling to PortHub) for the AI inference fleet #332

Summary

Why — motivating case study (real, from a 2026-06 Monad-1 session)

Entities to register

API (mirror PortHub)

Integration with TC shared dirs / project groups

Coordination (future)

Relationship to PortHub

Open questions (to flesh out)

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[feature] FleetHub — central fleet registry (sibling to PortHub) for the AI inference fleet #332

Description

Summary

Why — motivating case study (real, from a 2026-06 Monad-1 session)

Entities to register

API (mirror PortHub)

Integration with TC shared dirs / project groups

Coordination (future)

Relationship to PortHub

Open questions (to flesh out)

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions