Releases · buckster123/LocalRouter

What's New

✏️ Recipe Editor TUI

Browse, create, edit, duplicate, and delete recipes without touching TOML files. Manage GPU tiers and docker images from the same menu. Proper TOML round-trip (tomllib + tomli_w), validation, auto-backup on save.

⚡ vLLM Serving Backend

New provider type for models too large for llama.cpp. Tensor-parallel serving across multi-GPU clusters with automatic GPU detection, FlashInfer attention, FP8 KV cache, and reasoning parser support. Based on the official vllm/vllm-openai:v0.20.1 image.

🧠 DeepSeek V4 Support

V4-Flash (284B, 13B active): 7 GGUF recipes via llama.cpp + 2 vLLM recipes

V4-Pro (1.6T, 49B active): 5 vLLM recipes across datacenter clusters

Custom llama.cpp branch support for models with unmerged upstream PRs

Split-file GGUF discovery for large sharded models

🖥️ Multi-GPU Cluster Tiers

19 GPU tiers (up from 10), including:

2×/4× H100 SXM (160–320 GB)

2×/4×/5× H200 SXM (282–705 GB)

2×/4× B200 SXM (384–768 GB)

8× H100 and 8× A100 clusters

📊 By the Numbers

70 recipes across 4 providers (vast_gguf, vLLM, Together AI, local)

19 GPU tiers from RTX 4090 to 8×B200 SXM

~5,000 lines of Python across 18 modules

4 docker images: prebuilt, builder, vLLM, legacy

Dependencies

Added tomli_w>=1.0.0 for recipe editor TOML write-back

Full Changelog

What's New

🏗️ Architecture Overhaul

The monolithic vast_manager.py (3,064 lines) has been split into a clean 16-module Python package (localrouter/):

localrouter/
├── config.py           # paths, TOML loading, presets
├── helpers.py          # shell wrappers, formatting
├── providers.py        # Together AI, endpoint management
├── cost.py             # cost estimation, usage tracking
├── local_endpoint.py   # llama-server lifecycle
├── vast_ops.py         # SSH diagnostics, offer browsing
├── hf_browser.py       # HuggingFace model browser
├── proxy.py            # proxy lifecycle helpers
└── menus/              # TUI menu system
    ├── main.py         # entry point, banner
    ├── provider_menus.py
    ├── local_menus.py
    ├── vast_menus.py
    └── tool_menus.py

📦 Now pip-installable

pip install -e '.[all]'
localrouter              # new CLI command
python -m localrouter    # module entry
./vast_manager.py        # backward compatible

🐛 12 Bug Fixes

ID	Severity	What
C1	CRITICAL	Vast launch crash — `gpu_choices` undefined after provider selection refactor
C2	CRITICAL	Broken f-string in proxy status (Qwen sanitization artifact)
C3	CRITICAL	Proxy streaming broken — `StreamResponse.prepare()` missing request arg
C4	CRITICAL	`ClientTimeout` positional args (aiohttp 3.x keyword-only)
M1	MAJOR	Duplicate `capture()`/`run()` dead code removed
M2	MAJOR	`menu_diagnose()` no longer crashes on local/Together endpoints
M3	MAJOR	PUT/PATCH requests no longer silently converted to POST in proxy
M6	MAJOR	`urllib.error` import missing in usage tracker
M7	MAJOR	Per-GPU disk size defaults in `vast_up.sh` were dead code (always 60GB)
M8	MAJOR	KV cache type for 256K context preset was dead code (always q8_0, should be q4_0)
M9	MAJOR	`--flash-attn on` → `--flash-attn` (bare flag) in `launch.sh`
M10	MAJOR	Nonthinking mode broken — JSON word-splitting in template kwargs

🔧 Additional Improvements

Proxy strips hop-by-hop headers before forwarding
Streaming error handling for client disconnect
UTC timestamps (was using local time with Z suffix)
Fixed vast_up.sh HF token handling (Qwen *** sanitization artifacts)
Fixed recipe labels (H200 slot count, Qwen3 vs Qwen3.5)

Upgrade

git pull origin main
pip install -e '.[all]'
# Run as before — backward compatible

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's New

✏️ Recipe Editor TUI

⚡ vLLM Serving Backend

🧠 DeepSeek V4 Support

🖥️ Multi-GPU Cluster Tiers

📊 By the Numbers

Dependencies

Full Changelog

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's New

🏗️ Architecture Overhaul

📦 Now pip-installable

🐛 12 Bug Fixes

🔧 Additional Improvements

Upgrade

Uh oh!

Releases: buckster123/LocalRouter

v0.3.0 — Recipe Editor, vLLM Backend, DeepSeek V4

What's New

✏️ Recipe Editor TUI

⚡ vLLM Serving Backend

🧠 DeepSeek V4 Support

🖥️ Multi-GPU Cluster Tiers

📊 By the Numbers

Dependencies

Full Changelog

Uh oh!

v0.2.0 — Modular Rewrite + 12 Bug Fixes

What's New

🏗️ Architecture Overhaul

📦 Now pip-installable

🐛 12 Bug Fixes

🔧 Additional Improvements

Upgrade

Uh oh!