The future is small models routed to — intelligently.
Orchestrate multiple LLMs into intelligent, layered pipelines with a single configuration file
Quick Start · Examples · Configuration · CLI
llmnet creates layered AI pipelines where requests flow through multiple models with intelligent routing at each stage. Think of it as a neural network, but each "neuron" is an LLM.
User Query → Router → [Expert A | Expert B | Expert C] → Refiner → Response
Why llmnet?
- Cost optimization: Route simple queries to cheap models, complex ones to powerful models
- Specialization: Use domain-specific fine-tuned models for different query types
- Quality: Add refinement layers to polish responses before delivery
- Flexibility: Swap models without changing code—just update the config
git clone https://github.com/Avarok-Cybersecurity/llmnet.git
cd llmnet
cargo build --release# Validate configuration
./target/release/llmnet validate examples/basic-chatbot.json
# Start the server
./target/release/llmnet run examples/basic-chatbot.json# Start the control plane (in one terminal)
./target/release/llmnet serve --control-plane
# Deploy a pipeline (in another terminal)
./target/release/llmnet deploy examples/basic-chatbot.json
# Check status
./target/release/llmnet get pipelinescurl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "llmnet", "messages": [{"role": "user", "content": "Hello!"}]}'Each example includes real-world use cases showing how the same topology applies to different industries.
| Example | Topology | Description | Guide |
|---|---|---|---|
| Basic Chatbot | 1-0-1 | Simple LLM proxy | đź“– Guide |
| Dual Expert | 1-2-1 | Route to specialized handlers | đź“– Guide |
| OpenRouter Pipeline | 1-2-1 | Cloud-native with free models | đź“– Guide |
| Multi-Layer Pipeline | 1-2-1-1 | Add refinement layer | đź“– Guide |
| Conditional Routing | 1-2-1 | Route by input characteristics | đź“– Guide |
| Nemotron Router | 1-2-2-1 | Enterprise with edge cases | đź“– Guide |
| Calculator with Hooks | 1-2-1 | Validation hooks demo | đź“– Guide |
1-0-1: User → LLM → Response (Basic proxy)
1-2-1: User → Router → [A|B] → Response (Dual expert)
1-2-1-1: User → Router → [A|B] → Refiner → Response (With refinement)
1-2-2-1: User → Router → [A|B] → [C|D] → Response (Deep pipeline)
flowchart TB
subgraph Layer0["Layer 0: Input"]
R[Router Model]
end
subgraph Layer1["Layer 1: Specialists"]
A[Expert A]
B[Expert B]
end
subgraph Layer2["Layer 2: Refinement"]
REF[Refiner]
end
subgraph Output["Output Layer"]
O[Response]
end
R -->|"Analyzes intent"| A
R -->|"Routes query"| B
A --> REF
B --> REF
REF --> O
style R fill:#e1f5fe
style REF fill:#fff3e0
style O fill:#c8e6c9
| Concept | Description |
|---|---|
| Layer | A stage in the pipeline (0 = input, higher = deeper) |
| Node | A model endpoint within a layer |
| Router | Layer 0 model that selects which downstream node handles the request |
| Condition | Rule using system variables ($WORD_COUNT > 10) to filter targets |
| Adapter | Protocol: openai-api, output, or ws (WebSocket) |
| Hooks | Pre/post execution logic (observe or transform mode) |
| Functions | Reusable operations: REST, Shell, WebSocket, gRPC |
| Secrets | Credentials from env files, system env, or Vault |
Available in conditions and hooks:
| Variable | Context | Description |
|---|---|---|
$INPUT |
Pre/Post hooks | Current input content |
$OUTPUT |
Post hooks only | LLM output |
$NODE |
Pre/Post hooks | Current node name |
$PREV_NODE |
All | Previous node name |
$WORD_COUNT |
All | Number of words in input |
$INPUT_LENGTH |
All | Character count |
$HOP_COUNT |
All | Number of hops so far |
$TIMESTAMP |
All | ISO 8601 timestamp |
$REQUEST_ID |
All | Unique request UUID |
$secrets.* |
Functions | Secret values |
See Conditional Routing Guide for full documentation.
{
"models": {
"my-model": {
"type": "external",
"interface": "openai-api",
"url": "http://localhost:11434",
"api-key": null
}
},
"architecture": [
{
"name": "chat",
"layer": 0,
"model": "my-model",
"adapter": "openai-api",
"output-to": ["output"]
},
{
"name": "output",
"adapter": "output"
}
]
}{
"models": {
"<model-name>": {
"type": "external",
"interface": "openai-api",
"url": "<endpoint-url>",
"api-key": "<optional-key-or-$ENV_VAR>"
}
}
}{
"architecture": [
{
"name": "<unique-name>",
"layer": 0,
"model": "<model-name>",
"adapter": "openai-api",
"bind-addr": "0.0.0.0",
"bind-port": "8080",
"output-to": [1],
"use-case": "Description for router",
"if": "$WORD_COUNT > 10",
"extra-options": {
"model_override": "specific-model-id"
}
}
]
}| Field | Type | Description |
|---|---|---|
name |
string | Unique node identifier |
layer |
number | Pipeline stage (0 = input) |
model |
string? | Reference to models section |
adapter |
string | openai-api, output, or ws |
output-to |
array | Layer numbers [1] or node names ["output"] |
use-case |
string? | Description for LLM-based routing |
if |
string? | Condition for rule-based routing |
hooks |
object? | Pre/post hooks for the node |
Load credentials from various sources:
{
"secrets": {
"api-creds": {
"source": "env-file",
"path": "~/.config/llmnet/.env",
"variables": ["API_KEY", "API_SECRET"]
},
"hf-token": {
"source": "env",
"variable": "HF_TOKEN"
},
"vault-secrets": {
"source": "vault",
"address": "https://vault.example.com",
"path": "secret/data/llmnet/api"
}
}
}Reference secrets using $secrets.{name}.{variable}:
"api-key": "$secrets.api-creds.API_KEY"Define reusable operations for hooks:
{
"functions": {
"log-request": {
"type": "rest",
"method": "POST",
"url": "https://api.example.com/log",
"headers": {"Authorization": "Bearer $secrets.api.TOKEN"},
"body": {"node": "$NODE", "input": "$INPUT"}
},
"validate-output": {
"type": "shell",
"command": "python",
"args": ["validate.py", "--input", "$OUTPUT"],
"timeout": 10
}
}
}| Type | Description |
|---|---|
rest |
HTTP requests (GET, POST, PUT, PATCH, DELETE) |
shell |
Execute local commands |
websocket |
Send WebSocket messages |
grpc |
Call gRPC services |
Execute logic before/after LLM calls:
{
"architecture": [
{
"name": "processor",
"hooks": {
"pre": [
{"function": "log-request", "mode": "observe"}
],
"post": [
{"function": "validate-output", "mode": "transform", "on_failure": "abort"}
]
}
}
]
}| Mode | Behavior |
|---|---|
observe |
Fire-and-forget, doesn't affect pipeline |
transform |
Waits for result, can modify data |
| on_failure | Behavior |
|---|---|
continue |
Log error, proceed with original data |
abort |
Stop pipeline, return error |
llmnet provides a kubectl-like interface for managing LLM pipelines across local and remote clusters.
# Run a local pipeline (legacy mode)
llmnet run config.json
# Validate a configuration
llmnet validate config.json
# Start the control plane server
llmnet serve --control-plane
# Deploy a pipeline to the current context
llmnet deploy pipeline.yaml
# List resources
llmnet get pipelines
llmnet get nodes
llmnet get namespaces
# Scale a pipeline
llmnet scale my-pipeline --replicas 3
# Delete resources
llmnet delete pipeline my-pipeline
# View cluster status
llmnet statusManage connections to multiple LLMNet clusters:
# List available contexts
llmnet context list
# Add a remote cluster context
llmnet context add my-cluster --url http://10.0.0.1:8181
# Switch to a context
llmnet context use my-cluster
# Show current context
llmnet context current -v, --verbose... Increase logging verbosity
--config <PATH> Path to config file (default: ~/.llmnet/config)
-h, --help Print help
-V, --version Print version
For backwards compatibility, you can still run pipelines directly:
# Run with dry-run
llmnet run --dry-run config.json
# Override port
llmnet run --port 9000 config.json
# Load API keys from .env
llmnet run --env-file .env.production config.jsonfrom openai import OpenAI
client = OpenAI(base_url="http://localhost:8080/v1", api_key="not-needed")
response = client.chat.completions.create(
model="llmnet",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)import OpenAI from 'openai';
const client = new OpenAI({ baseURL: 'http://localhost:8080/v1', apiKey: 'not-needed' });
const response = await client.chat.completions.create({
model: 'llmnet',
messages: [{ role: 'user', content: 'Hello!' }],
});
console.log(response.choices[0].message.content);curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "llmnet", "messages": [{"role": "user", "content": "Hello!"}]}'Access multiple models through a single API:
{
"models": {
"router": {
"url": "https://openrouter.ai/api",
"api-key": "$OPENROUTER_API_KEY"
}
},
"architecture": [
{
"name": "router",
"extra-options": {
"model_override": "google/gemma-3-27b-it:free"
}
}
]
}See OpenRouter Pipeline Guide.
Stream responses or alerts to WebSocket endpoints:
{
"name": "alert-ws",
"if": "$AlertRequired",
"adapter": "ws",
"url": "ws://alerts:3000"
}For intelligent routing, we recommend NVIDIA's Nemotron-Orchestrator-8B:
{
"models": {
"nemotron": {
"url": "http://localhost:44443"
}
}
}See Nemotron Router Guide and nemotron-router-8b.md.
MIT License - see LICENSE for details.