feat: Add local model runner support (ollama, vllm, llama.cpp) #5
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Adds infrastructure for spawning and managing local LLM runners with a unified model configuration system.
New Modules
fetch.rs: Generic file fetching for local paths and remote URLs with SHA256-based caching andhuggingface://URL supportollama.rs: Modelfile parsing/generation with SBIO pattern (pure parsing, I/O wrappers)vllm.rs: vLLM CLI argument generation for OpenAI-compatible deploymentllamacpp.rs: llama.cpp server CLI arg generation with parameter aliasingrunner.rs: RunnerManager for spawning/stopping local processes with graceful shutdownModel Configuration Changes
New unified
ModelConfigreplaces the oldModelDefinitionenum (backward compatible):{ "models": { "my-model": { "runner": "ollama", "interface": "openai-api", "source": "tinyllama:1.1b", "parameters": { "temperature": 0.7 } } } }Runner types:
external,ollama,vllm,llama-cpp,dockerArchitecture Context
Nodes can specify
contextfor remote deployment of local runners:{ "name": "gpu-model", "layer": 1, "adapter": "openai-api", "context": "gpu-cluster" }Test plan
tests/fixtures/tinyllama.Modelfile