Skip to content

Commit ac7a223

Browse files
committed
updated docs + minor bugs
Signed-off-by: amanycodes <[email protected]>
1 parent 7a8f849 commit ac7a223

File tree

30 files changed

+1370
-169
lines changed

30 files changed

+1370
-169
lines changed

README.md

Lines changed: 95 additions & 74 deletions
Original file line numberDiff line numberDiff line change
@@ -1,117 +1,138 @@
1-
p:ine-gate
1+
# pine-gate
22

3-
Open-source, K8s-deployable AI Gateway for local & remote LLMs with quotas, streaming, routing, and observability.
3+
Open‑source, K8s‑deployable AI Gateway for local & remote LLMs with quotas, streaming, routing, and observability.
4+
5+
Project status: beta. Follows SemVer for releases. See CONTRIBUTING.md and SECURITY.md for governance.
6+
7+
## Table of Contents
8+
- Overview
9+
- Features
10+
- Get Started
11+
- Backends Overview
12+
- API Overview
13+
- Configuration
14+
- Observability
15+
- Quotas & Limits
16+
- Deploy to Kubernetes
17+
- Security
18+
- CLI
19+
20+
## Overview
21+
pine-gate gives you a single, stable HTTP API in front of multiple LLMs (local and hosted). It handles authentication, rate limiting, usage counting, routing (including canaries), streaming responses, and telemetry — so applications can focus on product logic rather than provider differences.
422

523
## Features
6-
- Backends: Echo, Ollama (local), vLLM (OpenAI-compatible), OpenRouter, OpenAI, Anthropic
7-
- Routing: static rules + canary weighted splits
8-
- SSE streaming end-to-end
9-
- Rate limiting per API key (in-mem or Redis)
10-
- Usage counters and admin endpoint
11-
- Prometheus metrics with backend labels
12-
- OTel tracing (OTLP exporter)
13-
- Docker + Helm (ServiceMonitor + HPA)
24+
- Multiple model backends: use local and hosted LLMs behind one API — Echo for quick tests, Ollama for local models, vLLM via OpenAIcompatible endpoints, plus OpenAI, OpenRouter, and Anthropic.
25+
- Smart request routing: direct traffic by model rules or roll out changes safely with weighted canary splits.
26+
- Real‑time streaming: stream tokens end‑to‑end over SSE for responsive UIs and CLIs.
27+
- Built‑in safeguards: per‑key rate limiting (in‑memory or Redis) and simple usage counters with an admin query endpoint.
28+
- First‑class observability: Prometheus metrics labeled by route and backend, and OpenTelemetry traces exported to your collector (OTLP).
29+
- Resilience controls: configurable retries and circuit breaking around backends to smooth over transient failures.
30+
- Production‑ready on Kubernetes: minimal, non‑root image; secure defaults; Helm chart with ServiceMonitor and optional HPA.
31+
- Easy local development: `make run`, `.env` support, and a tiny `pinectl` CLI to manage and test locally.
1432

15-
## Quickstart (Local)
33+
## Get Started
34+
This path gets you from zero to a working gateway locally, then shows how to try a backend.
1635

17-
cp .env.example .env # optional, or create your own
36+
1) Run the gateway with the example config
1837
```
1938
CONFIG_FILE=./configs/config.example.yaml make run
39+
```
40+
Check health and send a test request using the built‑in `echo` backend:
41+
```
2042
curl -i http://localhost:8080/healthz
21-
curl -s -H 'x-api-key: dev-key' -H 'Content-Type: application/json' -X POST http://localhost:8080/v1/completions -d '{"model":"echo","prompt":"hello"}'
43+
curl -s -H 'x-api-key: dev-key' -H 'Content-Type: application/json' \
44+
-X POST http://localhost:8080/v1/completions -d '{"model":"echo","prompt":"hello"}'
2245
```
2346

24-
### Enable OpenRouter
47+
2) Enable a real backend (example: OpenRouter)
2548
```
2649
PINE_GATE_BACKENDS_OPENROUTER_ENABLED=true \
2750
PINE_GATE_BACKENDS_OPENROUTER_APIKEY=<YOUR_KEY> \
2851
CONFIG_FILE=./configs/config.example.yaml make run
29-
curl -s -H 'x-api-key: dev-key' -H 'Content-Type: application/json' -X POST \
30-
http://localhost:8080/v1/completions -d '{"model":"openrouter:mistralai/mistral-7b-instruct:free","prompt":"hello"}'
3152
```
32-
33-
### Enable Ollama (local)
53+
Then request a model via the `openrouter:` prefix:
3454
```
35-
PINE_GATE_BACKENDS_OLLAMA_ENABLED=true \
36-
PINE_GATE_BACKENDS_OLLAMA_HOST=http://localhost:11434 \
37-
CONFIG_FILE=./configs/config.example.yaml make run
3855
curl -s -H 'x-api-key: dev-key' -H 'Content-Type: application/json' -X POST \
39-
http://localhost:8080/v1/completions -d '{"model":"ollama:llama3","prompt":"hello"}'
40-
```
41-
42-
### vLLM (OpenAI-compatible)
43-
```
44-
PINE_GATE_BACKENDS_VLLM_ENABLED=true \
45-
PINE_GATE_BACKENDS_VLLM_BASE_URL=http://localhost:8000/v1 \
46-
CONFIG_FILE=./configs/config.example.yaml make run
56+
http://localhost:8080/v1/completions -d '{"model":"openrouter:mistralai/mistral-7b-instruct:free","prompt":"hello"}'
4757
```
58+
See Backends for other providers and examples.
4859

49-
### OpenAI
60+
3) Optional: place settings in `.env`
61+
pine-gate loads a `.env` file automatically. Put your environment variables there instead of prefixing commands:
5062
```
51-
PINE_GATE_BACKENDS_OPENAI_ENABLED=true \
52-
PINE_GATE_BACKENDS_OPENAI_APIKEY=<OPENAI_KEY> \
53-
CONFIG_FILE=./configs/config.example.yaml make run
63+
# .env
64+
PINE_GATE_BACKENDS_OLLAMA_ENABLED=true
65+
PINE_GATE_BACKENDS_OLLAMA_HOST=http://localhost:11434
66+
OTEL_EXPORTER_OTLP_ENDPOINT=localhost:4318
67+
PINE_GATE_LIMITS_RATE_RPS=5
68+
PINE_GATE_LIMITS_BURST=10
5469
```
55-
56-
### Anthropic
70+
With `.env` present, start with:
5771
```
58-
PINE_GATE_BACKENDS_ANTHROPIC_ENABLED=true \
59-
PINE_GATE_BACKENDS_ANTHROPIC_APIKEY=<ANTHROPIC_KEY> \
60-
CONFIG_FILE=./configs/config.example.yaml make run
72+
make run
73+
# or
74+
./bin/pinectl serve --config ./configs/config.example.yaml
6175
```
6276

63-
## Redis Rate Limit + Usage
77+
4) Optional: Redis for shared rate limits and usage
6478
```
6579
docker run --rm -p 6379:6379 redis:7
6680
PINE_GATE_REDIS_ENABLED=true PINE_GATE_REDIS_ADDR=localhost:6379 \
6781
PINE_GATE_AUTH_ADMIN_KEY=admin CONFIG_FILE=./configs/config.example.yaml make run
6882
curl -s -H 'x-admin-key: admin' 'http://localhost:8080/v1/usage?key=dev-key'
6983
```
7084

71-
## Tracing (Jaeger all-in-one)
85+
5) Optional: Tracing to Jaeger (OTLP)
7286
```
7387
docker run --rm -p 16686:16686 -p 4318:4318 -e COLLECTOR_OTLP_ENABLED=true jaegertracing/all-in-one:1.57
7488
OTEL_EXPORTER_OTLP_ENDPOINT=localhost:4318 CONFIG_FILE=./configs/config.example.yaml make run
75-
76-
## .env Support
77-
- pine-gate automatically loads a `.env` file from the working directory.
78-
- Put any configuration env vars in `.env` instead of prefixing commands, for example:
79-
80-
```
81-
# .env
82-
PINE_GATE_BACKENDS_OLLAMA_ENABLED=true
83-
PINE_GATE_BACKENDS_OLLAMA_HOST=http://localhost:11434
84-
PINE_GATE_BACKENDS_OPENROUTER_ENABLED=false
85-
PINE_GATE_BACKENDS_OPENROUTER_APIKEY=
86-
PINE_GATE_BACKENDS_OPENAI_ENABLED=false
87-
PINE_GATE_BACKENDS_VLLM_ENABLED=false
88-
OTEL_EXPORTER_OTLP_ENDPOINT=localhost:4318
89-
PINE_GATE_LIMITS_RATE_RPS=5
90-
PINE_GATE_LIMITS_BURST=10
91-
92-
With `.env` present, simply run:
93-
```
94-
make run
95-
```
96-
# or
97-
```
98-
./bin/pinectl serve --config ./configs/config.example.yaml
9989
```
10090

101-
## Helm Install
91+
## Backends Overview
92+
Choose a backend by prefixing the model (e.g., `openai:gpt-4o-mini`, `ollama:llama3`). Enable providers via environment variables.
93+
- Echo: built‑in for local testing (no network call)
94+
- Ollama: local models via `ollama:<model>`
95+
- vLLM: OpenAI‑compatible server via `vllm:<model>`
96+
- OpenAI: hosted models via `openai:<model>`
97+
- OpenRouter: marketplace via `openrouter:<model>`
98+
- Anthropic: hosted models via `anthropic:<model>`
99+
Read more: docs/backends.md
100+
101+
## API Overview
102+
Two core endpoints power synchronous and streaming use cases.
103+
- `POST /v1/completions` — JSON request `{ model, prompt }``{ model, output }`
104+
- `GET /v1/stream` — SSE stream of tokens with `model` and `prompt` as query params
105+
Health and metrics are also available:
106+
- `GET /healthz` — service health
107+
- `GET /metrics` — Prometheus metrics
108+
Read more: docs/api.md and docs/openapi.yaml
109+
110+
## Configuration
111+
Configuration comes from environment variables, a YAML file, and defaults. Environment variables (including from `.env`) take precedence.
112+
Read more: docs/configuration.md
113+
114+
## Observability
115+
Prometheus metrics include request rate, latency, errors, and backend labels. OpenTelemetry spans trace requests and backend calls.
116+
Read more: docs/observability.md and docs/tracing.md
117+
118+
## Quotas & Limits
119+
Per‑key token buckets enforce rate limits. Use Redis to share limits and counters across replicas.
120+
Read more: docs/quotas-limits.md
121+
122+
## Deploy to Kubernetes
123+
Use the Helm chart for production‑grade defaults and easy toggles. ServiceMonitor, HPA, and OTel are available via values.
124+
Quick install:
102125
```
103126
helm install pine-gate charts/pine-gate --set auth.apiKey=dev-key
104-
# Optional extras:
105-
# --set backends.openrouter.enabled=true --set backends.openrouter.apiKey=$OPENROUTER_API_KEY
106-
# --set redis.enabled=true --set auth.adminKey=admin
107-
# --set monitoring.enabled=true --set otel.enabled=true
108127
kubectl port-forward deploy/pine-gate 8080:8080
109128
```
129+
Read more: docs/deploy-k8s.md and charts/pine-gate/README.md
110130

111131
## Security
112-
- Container: distroless non-root, read-only FS, dropped capabilities, seccomp RuntimeDefault
113-
- K8s: security contexts set via Helm values
132+
The container runs as non‑root with a read‑only filesystem and dropped capabilities; security contexts are set in the chart.
133+
Read more: docs/security.md
134+
135+
## CLI
136+
`pinectl` helps you run the gateway locally, print effective config, send test requests, and open a tiny TUI dashboard.
137+
Read more: docs/cli.md
114138

115-
## See Also
116-
- docs/RUNBOOK.md
117-
- docs/SLOs.md

assets/pine-gate.png

132 KB
Loading

charts/pine-gate/README.md

Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
# pine-gate Helm Chart
2+
3+
Helm chart to deploy pine-gate to Kubernetes.
4+
5+
## Installation
6+
```bash
7+
kubectl create ns pine-gate || true
8+
helm upgrade --install pine-gate charts/pine-gate -n pine-gate \
9+
--set auth.apiKey=dev-key
10+
```
11+
12+
### Enable common options
13+
- Redis (rate limiting & usage counters):
14+
```bash
15+
--set redis.enabled=true \
16+
--set redis.addr=redis-master.default.svc.cluster.local:6379
17+
```
18+
- OpenTelemetry exporter:
19+
```bash
20+
--set otel.enabled=true \
21+
--set otel.endpoint=otel-collector.monitoring.svc.cluster.local:4318 \
22+
--set otel.protocol=http
23+
```
24+
- Monitoring (ServiceMonitor):
25+
```bash
26+
--set monitoring.enabled=true
27+
```
28+
29+
### Enable backends
30+
```bash
31+
# Ollama (local models)
32+
--set backends.ollama.enabled=true \
33+
--set backends.ollama.host=http://ollama.pine-gate.svc.cluster.local:11434
34+
35+
# vLLM (OpenAI-compatible)
36+
--set backends.vllm.enabled=true \
37+
--set backends.vllm.baseURL=http://vllm.pine-gate.svc.cluster.local:8000/v1
38+
39+
# OpenAI
40+
--set backends.openai.enabled=true \
41+
--set backends.openai.apiKey=$OPENAI_API_KEY
42+
43+
# OpenRouter
44+
--set backends.openrouter.enabled=true \
45+
--set backends.openrouter.apiKey=$OPENROUTER_API_KEY
46+
47+
# Anthropic
48+
--set backends.anthropic.enabled=true \
49+
--set backends.anthropic.apiKey=$ANTHROPIC_API_KEY
50+
```
51+
52+
## Values
53+
54+
Key settings (see `values.yaml` for full list):
55+
56+
- `image.repository` (string) — container repo
57+
- `image.tag` (string) — image tag (default: `dev`)
58+
- `auth.apiKey` (string) — required API key
59+
- `auth.adminKey` (string) — admin key for `/v1/usage`
60+
- `limits.rateRPS` (int) — token bucket rate (default 5)
61+
- `limits.burst` (int) — token bucket burst (default 10)
62+
- `redis.*` — Redis connection (disabled by default)
63+
- `otel.enabled` (bool) — enable OTLP exporter
64+
- `otel.endpoint` (string) — OTLP collector endpoint (host:port)
65+
- `monitoring.enabled` (bool) — create `ServiceMonitor`
66+
- `backends.*` — per-backend enablement and config
67+
68+
## Notes
69+
- The container runs as non-root with a read-only filesystem and dropped capabilities.
70+
- Expose via an Ingress with TLS termination in front of the Service.
71+
- See `docs/deploy-k8s.md` for more deployment guidance.
72+

cmd/gateway/main.go

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -31,8 +31,8 @@ func main() {
3131
_ = gotenv.Load()
3232
cfgFile := envOr("CONFIG_FILE", "./configs/config.example.yaml")
3333

34-
logger := telemetry.NewLogger()
35-
defer logger.Sync()
34+
logger := telemetry.NewLogger()
35+
defer func() { _ = logger.Sync() }()
3636
_ = os.Setenv("APP_VERSION", version)
3737
tp := telemetry.InitTracing("go-ai-gateway")
3838
defer func() { _ = tp.Shutdown(context.Background()) }()

cmd/pinectl/main.go

Lines changed: 29 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ func main() {
4242
if err != nil {
4343
return err
4444
}
45-
defer resp.Body.Close()
45+
defer func() { _ = resp.Body.Close() }()
4646
b, _ := io.ReadAll(resp.Body)
4747
fmt.Printf("%s: %s\n", resp.Status, string(b))
4848
if resp.StatusCode != http.StatusOK {
@@ -106,18 +106,26 @@ func main() {
106106
if serveCfg == "" {
107107
serveCfg = "./configs/config.example.yaml"
108108
}
109-
if servePort > 0 {
110-
os.Setenv("PINE_GATE_SERVER_PORT", fmt.Sprintf("%d", servePort))
111-
}
112-
if otelEndpoint != "" {
113-
os.Setenv("OTEL_EXPORTER_OTLP_ENDPOINT", otelEndpoint)
114-
}
115-
if rateRPS > 0 {
116-
os.Setenv("PINE_GATE_LIMITS_RATE_RPS", fmt.Sprintf("%d", rateRPS))
117-
}
118-
if burst > 0 {
119-
os.Setenv("PINE_GATE_LIMITS_BURST", fmt.Sprintf("%d", burst))
120-
}
109+
if servePort > 0 {
110+
if err := os.Setenv("PINE_GATE_SERVER_PORT", fmt.Sprintf("%d", servePort)); err != nil {
111+
return err
112+
}
113+
}
114+
if otelEndpoint != "" {
115+
if err := os.Setenv("OTEL_EXPORTER_OTLP_ENDPOINT", otelEndpoint); err != nil {
116+
return err
117+
}
118+
}
119+
if rateRPS > 0 {
120+
if err := os.Setenv("PINE_GATE_LIMITS_RATE_RPS", fmt.Sprintf("%d", rateRPS)); err != nil {
121+
return err
122+
}
123+
}
124+
if burst > 0 {
125+
if err := os.Setenv("PINE_GATE_LIMITS_BURST", fmt.Sprintf("%d", burst)); err != nil {
126+
return err
127+
}
128+
}
121129
return server.Run(version, serveCfg)
122130
},
123131
}
@@ -151,17 +159,19 @@ func main() {
151159
if err != nil {
152160
return err
153161
}
154-
defer resp.Body.Close()
162+
defer func() { _ = resp.Body.Close() }()
155163
if resp.StatusCode != 200 {
156164
b, _ := io.ReadAll(resp.Body)
157165
return fmt.Errorf("%s: %s", resp.Status, string(b))
158166
}
159167
buf := make([]byte, 1024)
160168
for {
161169
n, err := resp.Body.Read(buf)
162-
if n > 0 {
163-
os.Stdout.Write(buf[:n])
164-
}
170+
if n > 0 {
171+
if _, err := os.Stdout.Write(buf[:n]); err != nil {
172+
return err
173+
}
174+
}
165175
if err != nil {
166176
break
167177
}
@@ -177,7 +187,7 @@ func main() {
177187
if err != nil {
178188
return err
179189
}
180-
defer resp.Body.Close()
190+
defer func() { _ = resp.Body.Close() }()
181191
rb, _ := io.ReadAll(resp.Body)
182192
if resp.StatusCode != 200 {
183193
return fmt.Errorf("%s: %s", resp.Status, string(rb))
@@ -309,7 +319,7 @@ func fetchMetricsCmd(addr string) tea.Cmd {
309319
if err != nil {
310320
return metricsMsg{nil, err}
311321
}
312-
defer resp.Body.Close()
322+
defer func() { _ = resp.Body.Close() }()
313323
b, _ := io.ReadAll(resp.Body)
314324
totals := parseGatewayTotals(string(b))
315325
return metricsMsg{totals: totals, err: nil}

configs/config.example.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ backends:
1212
base_url: "https://openrouter.ai/api/v1"
1313
timeout_seconds: 15
1414
ollama:
15-
enabled: false
15+
enabled: true
1616
host: "http://localhost:11434"
1717
timeout_seconds: 15
1818
openai:

0 commit comments

Comments
 (0)