Releases: buckster123/LocalRouter
Releases · buckster123/LocalRouter
v0.3.0 — Recipe Editor, vLLM Backend, DeepSeek V4
What's New
✏️ Recipe Editor TUI
Browse, create, edit, duplicate, and delete recipes without touching TOML files. Manage GPU tiers and docker images from the same menu. Proper TOML round-trip (tomllib + tomli_w), validation, auto-backup on save.
⚡ vLLM Serving Backend
New provider type for models too large for llama.cpp. Tensor-parallel serving across multi-GPU clusters with automatic GPU detection, FlashInfer attention, FP8 KV cache, and reasoning parser support. Based on the official vllm/vllm-openai:v0.20.1 image.
🧠 DeepSeek V4 Support
- V4-Flash (284B, 13B active): 7 GGUF recipes via llama.cpp + 2 vLLM recipes
- V4-Pro (1.6T, 49B active): 5 vLLM recipes across datacenter clusters
- Custom llama.cpp branch support for models with unmerged upstream PRs
- Split-file GGUF discovery for large sharded models
🖥️ Multi-GPU Cluster Tiers
19 GPU tiers (up from 10), including:
- 2×/4× H100 SXM (160–320 GB)
- 2×/4×/5× H200 SXM (282–705 GB)
- 2×/4× B200 SXM (384–768 GB)
- 8× H100 and 8× A100 clusters
📊 By the Numbers
- 70 recipes across 4 providers (vast_gguf, vLLM, Together AI, local)
- 19 GPU tiers from RTX 4090 to 8×B200 SXM
- ~5,000 lines of Python across 18 modules
- 4 docker images: prebuilt, builder, vLLM, legacy
Dependencies
- Added
tomli_w>=1.0.0for recipe editor TOML write-back
Full Changelog
v0.2.0 — Modular Rewrite + 12 Bug Fixes
What's New
🏗️ Architecture Overhaul
The monolithic vast_manager.py (3,064 lines) has been split into a clean 16-module Python package (localrouter/):
localrouter/
├── config.py # paths, TOML loading, presets
├── helpers.py # shell wrappers, formatting
├── providers.py # Together AI, endpoint management
├── cost.py # cost estimation, usage tracking
├── local_endpoint.py # llama-server lifecycle
├── vast_ops.py # SSH diagnostics, offer browsing
├── hf_browser.py # HuggingFace model browser
├── proxy.py # proxy lifecycle helpers
└── menus/ # TUI menu system
├── main.py # entry point, banner
├── provider_menus.py
├── local_menus.py
├── vast_menus.py
└── tool_menus.py
📦 Now pip-installable
pip install -e '.[all]'
localrouter # new CLI command
python -m localrouter # module entry
./vast_manager.py # backward compatible🐛 12 Bug Fixes
| ID | Severity | What |
|---|---|---|
| C1 | CRITICAL | Vast launch crash — gpu_choices undefined after provider selection refactor |
| C2 | CRITICAL | Broken f-string in proxy status (Qwen sanitization artifact) |
| C3 | CRITICAL | Proxy streaming broken — StreamResponse.prepare() missing request arg |
| C4 | CRITICAL | ClientTimeout positional args (aiohttp 3.x keyword-only) |
| M1 | MAJOR | Duplicate capture()/run() dead code removed |
| M2 | MAJOR | menu_diagnose() no longer crashes on local/Together endpoints |
| M3 | MAJOR | PUT/PATCH requests no longer silently converted to POST in proxy |
| M6 | MAJOR | urllib.error import missing in usage tracker |
| M7 | MAJOR | Per-GPU disk size defaults in vast_up.sh were dead code (always 60GB) |
| M8 | MAJOR | KV cache type for 256K context preset was dead code (always q8_0, should be q4_0) |
| M9 | MAJOR | --flash-attn on → --flash-attn (bare flag) in launch.sh |
| M10 | MAJOR | Nonthinking mode broken — JSON word-splitting in template kwargs |
🔧 Additional Improvements
- Proxy strips hop-by-hop headers before forwarding
- Streaming error handling for client disconnect
- UTC timestamps (was using local time with Z suffix)
- Fixed
vast_up.shHF token handling (Qwen***sanitization artifacts) - Fixed recipe labels (H200 slot count, Qwen3 vs Qwen3.5)
Upgrade
git pull origin main
pip install -e '.[all]'
# Run as before — backward compatible