Skip to content

[feature] FleetHub — central fleet registry (sibling to PortHub) for the AI inference fleet #332

@Jason-Vaughan

Description

@Jason-Vaughan

Summary

A central fleet registry inside TangleClaw — a sibling to PortHub. Where PortHub prevents port conflicts across projects, FleetHub is the single source of truth for the AI inference fleet: the inference backend(s) and the consumer nodes that hit them, their wiring, capacity, and keys. Eventually a TangleClaw master-agent session reads FleetHub to coordinate/rewire the fleet.

Why — motivating case study (real, from a 2026-06 Monad-1 session)

Wiring a fleet of OpenClaw nodes (Volta, TiltClaw, RentalClaw) onto a shared Monad-1 inference backend surfaced the gap repeatedly:

  • Fleet state lived scattered across individual Claude/TC session memories + per-node configs. Session/context loss forced re-deriving "which node runs what" multiple times in one effort.
  • No source of truth for wiring → a node's LiteLLM key was scoped to an old model set and 403'd on the new model (key_model_access_denied), only discovered at runtime.
  • No capacity model → two large models pinned the single GPU at its VRAM ceiling, causing silent eviction/thrashing and minutes-long cold-reload timeouts. A registry aware of model sizes + GPU capacity would have flagged it before wiring.
  • Coordination didn't scale → changes were hand-relayed between N per-node sessions, risking two sessions editing the same machine's files (the cross-session write-boundary problem).

As the fleet grows (more OpenClaw nodes), this only gets worse. PortHub solved exactly this shape for ports; the inference fleet needs the same.

Entities to register

  • Backends (inference servers): id, host/tailnet, endpoints (e.g. LiteLLM :4000, Ollama :11434, ollama-tee :11435), GPU/VRAM capacity, currently-resident models.
  • Models: alias, backend route (e.g. ollama_chat/...), size/VRAM, capabilities (tool-calling, context length, reasoning, tok/s), quant.
  • Nodes (consumers): id, host/tailnet, type (e.g. OpenClaw Pattern A/B), assigned backend + model, fallback chain, key alias, config location.
  • Keys: per-node key alias, allowed-model scope, rate limits (integrate with LiteLLM key mgmt — would have prevented the 403 above).
  • Capacity / residency: resident set + VRAM math, contention/headroom warnings.

API (mirror PortHub)

  • GET /api/fleet — list backends / models / nodes.
  • POST /api/fleet/nodes — register/update a node's wiring (backend, model, key, config path).
  • POST /api/fleet/assign — assign a model to a node with a capacity/conflict pre-check (like PortHub's conflict check before claiming a port — refuse/warn if VRAM can't hold it alongside resident models).
  • POST /api/fleet/release — node decommission.
  • Capacity query: "can backend X hold model Y alongside its current residents?"

Integration with TC shared dirs / project groups

TangleClaw already has categorized shared project directories. FleetHub should:

  • Register node config locations within a group's shared dir, know what each file is for (compose, .env, openclaw.json), and offer a suggested structure / conventions for fleet nodes.
  • Inject current fleet state into sessions at launch (like shared-docs injection) so any session — and the future master agent — knows the live fleet without re-deriving it. Directly fixes the context-loss problem above.

Coordination (future)

The eventual TC master-agent session reads FleetHub to rewire nodes (swap models, rotate keys, rebalance capacity). FleetHub is the registry; the master agent is the actor. This issue covers the registry; the coordinator is a follow-on.

Relationship to PortHub

Same architectural pattern, different resource:

  • PortHub = ports; conflict-prevention; lease/release.
  • FleetHub = inference fleet (backends, models, nodes, keys, capacity); assignment + capacity-conflict prevention.

Open questions (to flesh out)

  • Scope: just the inference fleet, or any "fleet of agents/services"?
  • How tightly to integrate with LiteLLM (keys, model list) vs. keep FleetHub provider-agnostic.
  • Capacity model fidelity: static VRAM estimates vs. live ollama ps / nvidia-smi polling.
  • Auth/ownership: who may mutate fleet entries.

Drafted from a live Monad-1 fleet-wiring session (the case study above). Expand as needed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions