|
1 | | -p:ine-gate |
| 1 | +# pine-gate |
2 | 2 |
|
3 | | -Open-source, K8s-deployable AI Gateway for local & remote LLMs with quotas, streaming, routing, and observability. |
| 3 | +Open‑source, K8s‑deployable AI Gateway for local & remote LLMs with quotas, streaming, routing, and observability. |
| 4 | + |
| 5 | +Project status: beta. Follows SemVer for releases. See CONTRIBUTING.md and SECURITY.md for governance. |
| 6 | + |
| 7 | +## Table of Contents |
| 8 | +- Overview |
| 9 | +- Features |
| 10 | +- Get Started |
| 11 | +- Backends Overview |
| 12 | +- API Overview |
| 13 | +- Configuration |
| 14 | +- Observability |
| 15 | +- Quotas & Limits |
| 16 | +- Deploy to Kubernetes |
| 17 | +- Security |
| 18 | +- CLI |
| 19 | + |
| 20 | +## Overview |
| 21 | +pine-gate gives you a single, stable HTTP API in front of multiple LLMs (local and hosted). It handles authentication, rate limiting, usage counting, routing (including canaries), streaming responses, and telemetry — so applications can focus on product logic rather than provider differences. |
4 | 22 |
|
5 | 23 | ## Features |
6 | | -- Backends: Echo, Ollama (local), vLLM (OpenAI-compatible), OpenRouter, OpenAI, Anthropic |
7 | | -- Routing: static rules + canary weighted splits |
8 | | -- SSE streaming end-to-end |
9 | | -- Rate limiting per API key (in-mem or Redis) |
10 | | -- Usage counters and admin endpoint |
11 | | -- Prometheus metrics with backend labels |
12 | | -- OTel tracing (OTLP exporter) |
13 | | -- Docker + Helm (ServiceMonitor + HPA) |
| 24 | +- Multiple model backends: use local and hosted LLMs behind one API — Echo for quick tests, Ollama for local models, vLLM via OpenAI‑compatible endpoints, plus OpenAI, OpenRouter, and Anthropic. |
| 25 | +- Smart request routing: direct traffic by model rules or roll out changes safely with weighted canary splits. |
| 26 | +- Real‑time streaming: stream tokens end‑to‑end over SSE for responsive UIs and CLIs. |
| 27 | +- Built‑in safeguards: per‑key rate limiting (in‑memory or Redis) and simple usage counters with an admin query endpoint. |
| 28 | +- First‑class observability: Prometheus metrics labeled by route and backend, and OpenTelemetry traces exported to your collector (OTLP). |
| 29 | +- Resilience controls: configurable retries and circuit breaking around backends to smooth over transient failures. |
| 30 | +- Production‑ready on Kubernetes: minimal, non‑root image; secure defaults; Helm chart with ServiceMonitor and optional HPA. |
| 31 | +- Easy local development: `make run`, `.env` support, and a tiny `pinectl` CLI to manage and test locally. |
14 | 32 |
|
15 | | -## Quickstart (Local) |
| 33 | +## Get Started |
| 34 | +This path gets you from zero to a working gateway locally, then shows how to try a backend. |
16 | 35 |
|
17 | | -cp .env.example .env # optional, or create your own |
| 36 | +1) Run the gateway with the example config |
18 | 37 | ``` |
19 | 38 | CONFIG_FILE=./configs/config.example.yaml make run |
| 39 | +``` |
| 40 | +Check health and send a test request using the built‑in `echo` backend: |
| 41 | +``` |
20 | 42 | curl -i http://localhost:8080/healthz |
21 | | -curl -s -H 'x-api-key: dev-key' -H 'Content-Type: application/json' -X POST http://localhost:8080/v1/completions -d '{"model":"echo","prompt":"hello"}' |
| 43 | +curl -s -H 'x-api-key: dev-key' -H 'Content-Type: application/json' \ |
| 44 | + -X POST http://localhost:8080/v1/completions -d '{"model":"echo","prompt":"hello"}' |
22 | 45 | ``` |
23 | 46 |
|
24 | | -### Enable OpenRouter |
| 47 | +2) Enable a real backend (example: OpenRouter) |
25 | 48 | ``` |
26 | 49 | PINE_GATE_BACKENDS_OPENROUTER_ENABLED=true \ |
27 | 50 | PINE_GATE_BACKENDS_OPENROUTER_APIKEY=<YOUR_KEY> \ |
28 | 51 | CONFIG_FILE=./configs/config.example.yaml make run |
29 | | -curl -s -H 'x-api-key: dev-key' -H 'Content-Type: application/json' -X POST \ |
30 | | - http://localhost:8080/v1/completions -d '{"model":"openrouter:mistralai/mistral-7b-instruct:free","prompt":"hello"}' |
31 | 52 | ``` |
32 | | - |
33 | | -### Enable Ollama (local) |
| 53 | +Then request a model via the `openrouter:` prefix: |
34 | 54 | ``` |
35 | | -PINE_GATE_BACKENDS_OLLAMA_ENABLED=true \ |
36 | | -PINE_GATE_BACKENDS_OLLAMA_HOST=http://localhost:11434 \ |
37 | | -CONFIG_FILE=./configs/config.example.yaml make run |
38 | 55 | curl -s -H 'x-api-key: dev-key' -H 'Content-Type: application/json' -X POST \ |
39 | | - http://localhost:8080/v1/completions -d '{"model":"ollama:llama3","prompt":"hello"}' |
40 | | -``` |
41 | | - |
42 | | -### vLLM (OpenAI-compatible) |
43 | | -``` |
44 | | -PINE_GATE_BACKENDS_VLLM_ENABLED=true \ |
45 | | -PINE_GATE_BACKENDS_VLLM_BASE_URL=http://localhost:8000/v1 \ |
46 | | -CONFIG_FILE=./configs/config.example.yaml make run |
| 56 | + http://localhost:8080/v1/completions -d '{"model":"openrouter:mistralai/mistral-7b-instruct:free","prompt":"hello"}' |
47 | 57 | ``` |
| 58 | +See Backends for other providers and examples. |
48 | 59 |
|
49 | | -### OpenAI |
| 60 | +3) Optional: place settings in `.env` |
| 61 | +pine-gate loads a `.env` file automatically. Put your environment variables there instead of prefixing commands: |
50 | 62 | ``` |
51 | | -PINE_GATE_BACKENDS_OPENAI_ENABLED=true \ |
52 | | -PINE_GATE_BACKENDS_OPENAI_APIKEY=<OPENAI_KEY> \ |
53 | | -CONFIG_FILE=./configs/config.example.yaml make run |
| 63 | +# .env |
| 64 | +PINE_GATE_BACKENDS_OLLAMA_ENABLED=true |
| 65 | +PINE_GATE_BACKENDS_OLLAMA_HOST=http://localhost:11434 |
| 66 | +OTEL_EXPORTER_OTLP_ENDPOINT=localhost:4318 |
| 67 | +PINE_GATE_LIMITS_RATE_RPS=5 |
| 68 | +PINE_GATE_LIMITS_BURST=10 |
54 | 69 | ``` |
55 | | - |
56 | | -### Anthropic |
| 70 | +With `.env` present, start with: |
57 | 71 | ``` |
58 | | -PINE_GATE_BACKENDS_ANTHROPIC_ENABLED=true \ |
59 | | -PINE_GATE_BACKENDS_ANTHROPIC_APIKEY=<ANTHROPIC_KEY> \ |
60 | | -CONFIG_FILE=./configs/config.example.yaml make run |
| 72 | +make run |
| 73 | +# or |
| 74 | +./bin/pinectl serve --config ./configs/config.example.yaml |
61 | 75 | ``` |
62 | 76 |
|
63 | | -## Redis Rate Limit + Usage |
| 77 | +4) Optional: Redis for shared rate limits and usage |
64 | 78 | ``` |
65 | 79 | docker run --rm -p 6379:6379 redis:7 |
66 | 80 | PINE_GATE_REDIS_ENABLED=true PINE_GATE_REDIS_ADDR=localhost:6379 \ |
67 | 81 | PINE_GATE_AUTH_ADMIN_KEY=admin CONFIG_FILE=./configs/config.example.yaml make run |
68 | 82 | curl -s -H 'x-admin-key: admin' 'http://localhost:8080/v1/usage?key=dev-key' |
69 | 83 | ``` |
70 | 84 |
|
71 | | -## Tracing (Jaeger all-in-one) |
| 85 | +5) Optional: Tracing to Jaeger (OTLP) |
72 | 86 | ``` |
73 | 87 | docker run --rm -p 16686:16686 -p 4318:4318 -e COLLECTOR_OTLP_ENABLED=true jaegertracing/all-in-one:1.57 |
74 | 88 | OTEL_EXPORTER_OTLP_ENDPOINT=localhost:4318 CONFIG_FILE=./configs/config.example.yaml make run |
75 | | -
|
76 | | -## .env Support |
77 | | -- pine-gate automatically loads a `.env` file from the working directory. |
78 | | -- Put any configuration env vars in `.env` instead of prefixing commands, for example: |
79 | | -
|
80 | | -``` |
81 | | -# .env |
82 | | -PINE_GATE_BACKENDS_OLLAMA_ENABLED=true |
83 | | -PINE_GATE_BACKENDS_OLLAMA_HOST=http://localhost:11434 |
84 | | -PINE_GATE_BACKENDS_OPENROUTER_ENABLED=false |
85 | | -PINE_GATE_BACKENDS_OPENROUTER_APIKEY= |
86 | | -PINE_GATE_BACKENDS_OPENAI_ENABLED=false |
87 | | -PINE_GATE_BACKENDS_VLLM_ENABLED=false |
88 | | -OTEL_EXPORTER_OTLP_ENDPOINT=localhost:4318 |
89 | | -PINE_GATE_LIMITS_RATE_RPS=5 |
90 | | -PINE_GATE_LIMITS_BURST=10 |
91 | | - |
92 | | -With `.env` present, simply run: |
93 | | -``` |
94 | | -make run |
95 | | -``` |
96 | | -# or |
97 | | -``` |
98 | | -./bin/pinectl serve --config ./configs/config.example.yaml |
99 | 89 | ``` |
100 | 90 |
|
101 | | -## Helm Install |
| 91 | +## Backends Overview |
| 92 | +Choose a backend by prefixing the model (e.g., `openai:gpt-4o-mini`, `ollama:llama3`). Enable providers via environment variables. |
| 93 | +- Echo: built‑in for local testing (no network call) |
| 94 | +- Ollama: local models via `ollama:<model>` |
| 95 | +- vLLM: OpenAI‑compatible server via `vllm:<model>` |
| 96 | +- OpenAI: hosted models via `openai:<model>` |
| 97 | +- OpenRouter: marketplace via `openrouter:<model>` |
| 98 | +- Anthropic: hosted models via `anthropic:<model>` |
| 99 | +Read more: docs/backends.md |
| 100 | + |
| 101 | +## API Overview |
| 102 | +Two core endpoints power synchronous and streaming use cases. |
| 103 | +- `POST /v1/completions` — JSON request `{ model, prompt }` → `{ model, output }` |
| 104 | +- `GET /v1/stream` — SSE stream of tokens with `model` and `prompt` as query params |
| 105 | +Health and metrics are also available: |
| 106 | +- `GET /healthz` — service health |
| 107 | +- `GET /metrics` — Prometheus metrics |
| 108 | +Read more: docs/api.md and docs/openapi.yaml |
| 109 | + |
| 110 | +## Configuration |
| 111 | +Configuration comes from environment variables, a YAML file, and defaults. Environment variables (including from `.env`) take precedence. |
| 112 | +Read more: docs/configuration.md |
| 113 | + |
| 114 | +## Observability |
| 115 | +Prometheus metrics include request rate, latency, errors, and backend labels. OpenTelemetry spans trace requests and backend calls. |
| 116 | +Read more: docs/observability.md and docs/tracing.md |
| 117 | + |
| 118 | +## Quotas & Limits |
| 119 | +Per‑key token buckets enforce rate limits. Use Redis to share limits and counters across replicas. |
| 120 | +Read more: docs/quotas-limits.md |
| 121 | + |
| 122 | +## Deploy to Kubernetes |
| 123 | +Use the Helm chart for production‑grade defaults and easy toggles. ServiceMonitor, HPA, and OTel are available via values. |
| 124 | +Quick install: |
102 | 125 | ``` |
103 | 126 | helm install pine-gate charts/pine-gate --set auth.apiKey=dev-key |
104 | | -# Optional extras: |
105 | | -# --set backends.openrouter.enabled=true --set backends.openrouter.apiKey=$OPENROUTER_API_KEY |
106 | | -# --set redis.enabled=true --set auth.adminKey=admin |
107 | | -# --set monitoring.enabled=true --set otel.enabled=true |
108 | 127 | kubectl port-forward deploy/pine-gate 8080:8080 |
109 | 128 | ``` |
| 129 | +Read more: docs/deploy-k8s.md and charts/pine-gate/README.md |
110 | 130 |
|
111 | 131 | ## Security |
112 | | -- Container: distroless non-root, read-only FS, dropped capabilities, seccomp RuntimeDefault |
113 | | -- K8s: security contexts set via Helm values |
| 132 | +The container runs as non‑root with a read‑only filesystem and dropped capabilities; security contexts are set in the chart. |
| 133 | +Read more: docs/security.md |
| 134 | + |
| 135 | +## CLI |
| 136 | +`pinectl` helps you run the gateway locally, print effective config, send test requests, and open a tiny TUI dashboard. |
| 137 | +Read more: docs/cli.md |
114 | 138 |
|
115 | | -## See Also |
116 | | -- docs/RUNBOOK.md |
117 | | -- docs/SLOs.md |
|
0 commit comments