Commit 202a784
authored
feat: Add local model runner support (ollama, vllm, llama.cpp) (#5)
* feat: Add local model runner support (ollama, vllm, llama.cpp)
Adds infrastructure for spawning and managing local LLM runners:
- **fetch.rs**: Generic file fetching for local paths and remote URLs
with caching support and huggingface:// URL translation
- **ollama.rs**: Modelfile parsing and generation with parameter
merging for runtime configuration
- **vllm.rs**: vLLM CLI argument generation for OpenAI-compatible
server deployment
- **llamacpp.rs**: llama.cpp server CLI argument generation with
parameter aliasing
- **runner.rs**: RunnerManager for spawning/stopping local model
processes with graceful shutdown
- **ModelConfig**: Unified model configuration with runner field
(external, ollama, vllm, llama-cpp) replacing the old
ModelDefinition enum (backward compatible)
- **context**: Architecture nodes can specify deployment context
override for remote deployment of local runners
Includes test fixture: tinyllama.Modelfile for Modelfile testing.
* Add runner management endpoints and remote deployment tests
- Add /v1/runners endpoints for spawn, list, and stop operations
- Extend AppState to include optional SharedRunnerManager
- Add with_runner_manager() builder for worker mode
- Create integration tests simulating control plane to worker dispatch
- Tests verify multi-instance coordination for remote deployment
This enables the control plane to dispatch spawn commands to workers
representing remote clusters, as discussed in the context override design.
* Add Docker runner support and improve vLLM integration
Docker support:
- Add docker.rs module with DockerConfig, RegistryConfig structs
- Support image-based and Dockerfile-based container deployment
- Handle custom registries with authentication
- Map model parameters to container environment variables
- Support GPU passthrough, volumes, network modes, IPC settings
- Add extra_args for vLLM-specific flags (--swap-space, --tool-call-parser)
vLLM improvements:
- Add is_vllm_installed() and is_cuda_available() checks
- Add HuggingFace token detection from HF_TOKEN env var
- Detect gated models requiring authentication
- Generate proper environment variables for vLLM process
- Add warnings for missing CUDA or HF token
Example Docker config matching user's DGX setup:
{
"runner": "docker",
"source": "RESMP-DEV/Qwen3-Next-80B-A3B-Instruct-NVFP4",
"docker": {
"image": "dgx-vllm:cutlass-nvfp4",
"network": "host",
"gpus": "all",
"ipc": "host",
"volumes": ["${HOME}/.cache/huggingface:/root/.cache/huggingface"],
"extra_args": "--swap-space 32 --tool-call-parser hermes"
},
"parameters": {
"tensor_parallel_size": 1,
"max_model_len": 131072
}
}
* Refactor extra_args to structured map for cleaner composability
- Change extra_args from string to HashMap<String, Value>
- Add extra_args_to_string() to convert map to CLI format at runtime
- Support bool flags (true=include, false=omit), numbers, strings, arrays
- Arrays repeat the flag for each value (useful for --stop tokens)
Example:
"extra_args": {
"swap-space": 32,
"tool-call-parser": "hermes",
"enable-auto-tool-choice": true
}
Becomes: "--swap-space 32 --tool-call-parser hermes --enable-auto-tool-choice"
This cleans up the rough edges of CLI tools and makes composition files
more readable and programmatically manipulable.1 parent c9c078e commit 202a784
17 files changed
Lines changed: 3352 additions & 145 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
50 | 50 | | |
51 | 51 | | |
52 | 52 | | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
53 | 56 | | |
54 | 57 | | |
55 | 58 | | |
| |||
This file was deleted.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
35 | 35 | | |
36 | 36 | | |
37 | 37 | | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
38 | 43 | | |
39 | 44 | | |
40 | 45 | | |
| |||
163 | 168 | | |
164 | 169 | | |
165 | 170 | | |
| 171 | + | |
166 | 172 | | |
167 | 173 | | |
168 | 174 | | |
169 | 175 | | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
170 | 190 | | |
0 commit comments