amanycodes
diff --git a/‎README.md‎
Lines changed: 95 additions & 74 deletions b/‎README.md‎
Lines changed: 95 additions & 74 deletions
diff --git a/‎assets/pine-gate.png‎
132 KB b/‎assets/pine-gate.png‎
132 KB
diff --git a/‎charts/pine-gate/README.md‎
Lines changed: 72 additions & 0 deletions b/‎charts/pine-gate/README.md‎
Lines changed: 72 additions & 0 deletions
diff --git a/‎cmd/gateway/main.go‎
Lines changed: 2 additions & 2 deletions b/‎cmd/gateway/main.go‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎cmd/pinectl/main.go‎
Lines changed: 29 additions & 19 deletions b/‎cmd/pinectl/main.go‎
Lines changed: 29 additions & 19 deletions
diff --git a/‎configs/config.example.yaml‎
Lines changed: 1 addition & 1 deletion b/‎configs/config.example.yaml‎
Lines changed: 1 addition & 1 deletion
@@ -1,117 +1,138 @@
-p:ine-gate
+# pine-gate
 
-Open-source, K8s-deployable AI Gateway for local & remote LLMs with quotas, streaming, routing, and observability.
+Open‑source, K8s‑deployable AI Gateway for local & remote LLMs with quotas, streaming, routing, and observability.
+
+Project status: beta. Follows SemVer for releases. See CONTRIBUTING.md and SECURITY.md for governance.
+
+## Table of Contents
+- Overview
+- Features
+- Get Started
+- Backends Overview
+- API Overview
+- Configuration
+- Observability
+- Quotas & Limits
+- Deploy to Kubernetes
+- Security
+- CLI
+
+## Overview
+pine-gate gives you a single, stable HTTP API in front of multiple LLMs (local and hosted). It handles authentication, rate limiting, usage counting, routing (including canaries), streaming responses, and telemetry — so applications can focus on product logic rather than provider differences.
 
 ## Features
-- Backends: Echo, Ollama (local), vLLM (OpenAI-compatible), OpenRouter, OpenAI, Anthropic
-- Routing: static rules + canary weighted splits
-- SSE streaming end-to-end
-- Rate limiting per API key (in-mem or Redis)
-- Usage counters and admin endpoint
-- Prometheus metrics with backend labels
-- OTel tracing (OTLP exporter)
-- Docker + Helm (ServiceMonitor + HPA)
+- Multiple model backends: use local and hosted LLMs behind one API — Echo for quick tests, Ollama for local models, vLLM via OpenAI‑compatible endpoints, plus OpenAI, OpenRouter, and Anthropic.
+- Smart request routing: direct traffic by model rules or roll out changes safely with weighted canary splits.
+- Real‑time streaming: stream tokens end‑to‑end over SSE for responsive UIs and CLIs.
+- Built‑in safeguards: per‑key rate limiting (in‑memory or Redis) and simple usage counters with an admin query endpoint.
+- First‑class observability: Prometheus metrics labeled by route and backend, and OpenTelemetry traces exported to your collector (OTLP).
+- Resilience controls: configurable retries and circuit breaking around backends to smooth over transient failures.
+- Production‑ready on Kubernetes: minimal, non‑root image; secure defaults; Helm chart with ServiceMonitor and optional HPA.
+- Easy local development: `make run`, `.env` support, and a tiny `pinectl` CLI to manage and test locally.
 
-## Quickstart (Local)
+## Get Started
+This path gets you from zero to a working gateway locally, then shows how to try a backend.
 
-cp .env.example .env  # optional, or create your own
+1) Run the gateway with the example config
 ```
 CONFIG_FILE=./configs/config.example.yaml make run
+```
+Check health and send a test request using the built‑in `echo` backend:
+```
 curl -i http://localhost:8080/healthz
-curl -s -H 'x-api-key: dev-key' -H 'Content-Type: application/json' -X POST http://localhost:8080/v1/completions -d '{"model":"echo","prompt":"hello"}'
+curl -s -H 'x-api-key: dev-key' -H 'Content-Type: application/json' \
+  -X POST http://localhost:8080/v1/completions -d '{"model":"echo","prompt":"hello"}'
 ```
 
-### Enable OpenRouter
+2) Enable a real backend (example: OpenRouter)
 ```
 PINE_GATE_BACKENDS_OPENROUTER_ENABLED=true \
 PINE_GATE_BACKENDS_OPENROUTER_APIKEY=<YOUR_KEY> \
 CONFIG_FILE=./configs/config.example.yaml make run
-curl -s -H 'x-api-key: dev-key' -H 'Content-Type: application/json' -X POST \
-  http://localhost:8080/v1/completions -d '{"model":"openrouter:mistralai/mistral-7b-instruct:free","prompt":"hello"}'
 ```
-
-### Enable Ollama (local)
+Then request a model via the `openrouter:` prefix:
 ```
-PINE_GATE_BACKENDS_OLLAMA_ENABLED=true \
-PINE_GATE_BACKENDS_OLLAMA_HOST=http://localhost:11434 \
-CONFIG_FILE=./configs/config.example.yaml make run
 curl -s -H 'x-api-key: dev-key' -H 'Content-Type: application/json' -X POST \
-  http://localhost:8080/v1/completions -d '{"model":"ollama:llama3","prompt":"hello"}'
-```
-
-### vLLM (OpenAI-compatible)
-```
-PINE_GATE_BACKENDS_VLLM_ENABLED=true \
-PINE_GATE_BACKENDS_VLLM_BASE_URL=http://localhost:8000/v1 \
-CONFIG_FILE=./configs/config.example.yaml make run
+  http://localhost:8080/v1/completions -d '{"model":"openrouter:mistralai/mistral-7b-instruct:free","prompt":"hello"}'
 ```
+See Backends for other providers and examples.
 
-### OpenAI
+3) Optional: place settings in `.env`
+pine-gate loads a `.env` file automatically. Put your environment variables there instead of prefixing commands:
 ```
-PINE_GATE_BACKENDS_OPENAI_ENABLED=true \
-PINE_GATE_BACKENDS_OPENAI_APIKEY=<OPENAI_KEY> \
-CONFIG_FILE=./configs/config.example.yaml make run
+# .env
+PINE_GATE_BACKENDS_OLLAMA_ENABLED=true
+PINE_GATE_BACKENDS_OLLAMA_HOST=http://localhost:11434
+OTEL_EXPORTER_OTLP_ENDPOINT=localhost:4318
+PINE_GATE_LIMITS_RATE_RPS=5
+PINE_GATE_LIMITS_BURST=10
 ```
-
-### Anthropic
+With `.env` present, start with:
 ```
-PINE_GATE_BACKENDS_ANTHROPIC_ENABLED=true \
-PINE_GATE_BACKENDS_ANTHROPIC_APIKEY=<ANTHROPIC_KEY> \
-CONFIG_FILE=./configs/config.example.yaml make run
+make run
+# or
+./bin/pinectl serve --config ./configs/config.example.yaml
 ```
 
-## Redis Rate Limit + Usage
+4) Optional: Redis for shared rate limits and usage
 ```
 docker run --rm -p 6379:6379 redis:7
 PINE_GATE_REDIS_ENABLED=true PINE_GATE_REDIS_ADDR=localhost:6379 \
 PINE_GATE_AUTH_ADMIN_KEY=admin CONFIG_FILE=./configs/config.example.yaml make run
 curl -s -H 'x-admin-key: admin' 'http://localhost:8080/v1/usage?key=dev-key'
 ```
 
-## Tracing (Jaeger all-in-one)
+5) Optional: Tracing to Jaeger (OTLP)
 ```
 docker run --rm -p 16686:16686 -p 4318:4318 -e COLLECTOR_OTLP_ENABLED=true jaegertracing/all-in-one:1.57
 OTEL_EXPORTER_OTLP_ENDPOINT=localhost:4318 CONFIG_FILE=./configs/config.example.yaml make run
-
-## .env Support
-- pine-gate automatically loads a `.env` file from the working directory.
-- Put any configuration env vars in `.env` instead of prefixing commands, for example:
-
-```
-# .env
-PINE_GATE_BACKENDS_OLLAMA_ENABLED=true
-PINE_GATE_BACKENDS_OLLAMA_HOST=http://localhost:11434
-PINE_GATE_BACKENDS_OPENROUTER_ENABLED=false
-PINE_GATE_BACKENDS_OPENROUTER_APIKEY=
-PINE_GATE_BACKENDS_OPENAI_ENABLED=false
-PINE_GATE_BACKENDS_VLLM_ENABLED=false
-OTEL_EXPORTER_OTLP_ENDPOINT=localhost:4318
-PINE_GATE_LIMITS_RATE_RPS=5
-PINE_GATE_LIMITS_BURST=10
-
-With `.env` present, simply run:
-```
-make run
-```
-# or
-```
-./bin/pinectl serve --config ./configs/config.example.yaml
 ```
 
-## Helm Install
+## Backends Overview
+Choose a backend by prefixing the model (e.g., `openai:gpt-4o-mini`, `ollama:llama3`). Enable providers via environment variables.
+- Echo: built‑in for local testing (no network call)
+- Ollama: local models via `ollama:<model>`
+- vLLM: OpenAI‑compatible server via `vllm:<model>`
+- OpenAI: hosted models via `openai:<model>`
+- OpenRouter: marketplace via `openrouter:<model>`
+- Anthropic: hosted models via `anthropic:<model>`
+Read more: docs/backends.md
+
+## API Overview
+Two core endpoints power synchronous and streaming use cases.
+- `POST /v1/completions` — JSON request `{ model, prompt }` → `{ model, output }`
+- `GET /v1/stream` — SSE stream of tokens with `model` and `prompt` as query params
+Health and metrics are also available:
+- `GET /healthz` — service health
+- `GET /metrics` — Prometheus metrics
+Read more: docs/api.md and docs/openapi.yaml
+
+## Configuration
+Configuration comes from environment variables, a YAML file, and defaults. Environment variables (including from `.env`) take precedence.
+Read more: docs/configuration.md
+
+## Observability
+Prometheus metrics include request rate, latency, errors, and backend labels. OpenTelemetry spans trace requests and backend calls.
+Read more: docs/observability.md and docs/tracing.md
+
+## Quotas & Limits
+Per‑key token buckets enforce rate limits. Use Redis to share limits and counters across replicas.
+Read more: docs/quotas-limits.md
+
+## Deploy to Kubernetes
+Use the Helm chart for production‑grade defaults and easy toggles. ServiceMonitor, HPA, and OTel are available via values.
+Quick install:
 ```
 helm install pine-gate charts/pine-gate --set auth.apiKey=dev-key
-# Optional extras:
-# --set backends.openrouter.enabled=true --set backends.openrouter.apiKey=$OPENROUTER_API_KEY
-# --set redis.enabled=true --set auth.adminKey=admin
-# --set monitoring.enabled=true --set otel.enabled=true
 kubectl port-forward deploy/pine-gate 8080:8080
 ```
+Read more: docs/deploy-k8s.md and charts/pine-gate/README.md
 
 ## Security
-- Container: distroless non-root, read-only FS, dropped capabilities, seccomp RuntimeDefault
-- K8s: security contexts set via Helm values
+The container runs as non‑root with a read‑only filesystem and dropped capabilities; security contexts are set in the chart.
+Read more: docs/security.md
+
+## CLI
+`pinectl` helps you run the gateway locally, print effective config, send test requests, and open a tiny TUI dashboard.
+Read more: docs/cli.md
 
-## See Also
-- docs/RUNBOOK.md
-- docs/SLOs.md
 
@@ -0,0 +1,72 @@
+# pine-gate Helm Chart
+
+Helm chart to deploy pine-gate to Kubernetes.
+
+## Installation
+```bash
+kubectl create ns pine-gate || true
+helm upgrade --install pine-gate charts/pine-gate -n pine-gate \
+  --set auth.apiKey=dev-key
+```
+
+### Enable common options
+- Redis (rate limiting & usage counters):
+```bash
+--set redis.enabled=true \
+--set redis.addr=redis-master.default.svc.cluster.local:6379
+```
+- OpenTelemetry exporter:
+```bash
+--set otel.enabled=true \
+--set otel.endpoint=otel-collector.monitoring.svc.cluster.local:4318 \
+--set otel.protocol=http
+```
+- Monitoring (ServiceMonitor):
+```bash
+--set monitoring.enabled=true
+```
+
+### Enable backends
+```bash
+# Ollama (local models)
+--set backends.ollama.enabled=true \
+--set backends.ollama.host=http://ollama.pine-gate.svc.cluster.local:11434
+
+# vLLM (OpenAI-compatible)
+--set backends.vllm.enabled=true \
+--set backends.vllm.baseURL=http://vllm.pine-gate.svc.cluster.local:8000/v1
+
+# OpenAI
+--set backends.openai.enabled=true \
+--set backends.openai.apiKey=$OPENAI_API_KEY
+
+# OpenRouter
+--set backends.openrouter.enabled=true \
+--set backends.openrouter.apiKey=$OPENROUTER_API_KEY
+
+# Anthropic
+--set backends.anthropic.enabled=true \
+--set backends.anthropic.apiKey=$ANTHROPIC_API_KEY
+```
+
+## Values
+
+Key settings (see `values.yaml` for full list):
+
+- `image.repository` (string) — container repo
+- `image.tag` (string) — image tag (default: `dev`)
+- `auth.apiKey` (string) — required API key
+- `auth.adminKey` (string) — admin key for `/v1/usage`
+- `limits.rateRPS` (int) — token bucket rate (default 5)
+- `limits.burst` (int) — token bucket burst (default 10)
+- `redis.*` — Redis connection (disabled by default)
+- `otel.enabled` (bool) — enable OTLP exporter
+- `otel.endpoint` (string) — OTLP collector endpoint (host:port)
+- `monitoring.enabled` (bool) — create `ServiceMonitor`
+- `backends.*` — per-backend enablement and config
+
+## Notes
+- The container runs as non-root with a read-only filesystem and dropped capabilities.
+- Expose via an Ingress with TLS termination in front of the Service.
+- See `docs/deploy-k8s.md` for more deployment guidance.
+
@@ -31,8 +31,8 @@ func main() {
 	_ = gotenv.Load()
 	cfgFile := envOr("CONFIG_FILE", "./configs/config.example.yaml")
 
-	logger := telemetry.NewLogger()
-	defer logger.Sync()
+    logger := telemetry.NewLogger()
+    defer func() { _ = logger.Sync() }()
 	_ = os.Setenv("APP_VERSION", version)
 	tp := telemetry.InitTracing("go-ai-gateway")
 	defer func() { _ = tp.Shutdown(context.Background()) }()
 
@@ -42,7 +42,7 @@ func main() {
 			if err != nil {
 				return err
 			}
-			defer resp.Body.Close()
+            defer func() { _ = resp.Body.Close() }()
 			b, _ := io.ReadAll(resp.Body)
 			fmt.Printf("%s: %s\n", resp.Status, string(b))
 			if resp.StatusCode != http.StatusOK {
@@ -106,18 +106,26 @@ func main() {
 			if serveCfg == "" {
 				serveCfg = "./configs/config.example.yaml"
 			}
-			if servePort > 0 {
-				os.Setenv("PINE_GATE_SERVER_PORT", fmt.Sprintf("%d", servePort))
-			}
-			if otelEndpoint != "" {
-				os.Setenv("OTEL_EXPORTER_OTLP_ENDPOINT", otelEndpoint)
-			}
-			if rateRPS > 0 {
-				os.Setenv("PINE_GATE_LIMITS_RATE_RPS", fmt.Sprintf("%d", rateRPS))
-			}
-			if burst > 0 {
-				os.Setenv("PINE_GATE_LIMITS_BURST", fmt.Sprintf("%d", burst))
-			}
+            if servePort > 0 {
+                if err := os.Setenv("PINE_GATE_SERVER_PORT", fmt.Sprintf("%d", servePort)); err != nil {
+                    return err
+                }
+            }
+            if otelEndpoint != "" {
+                if err := os.Setenv("OTEL_EXPORTER_OTLP_ENDPOINT", otelEndpoint); err != nil {
+                    return err
+                }
+            }
+            if rateRPS > 0 {
+                if err := os.Setenv("PINE_GATE_LIMITS_RATE_RPS", fmt.Sprintf("%d", rateRPS)); err != nil {
+                    return err
+                }
+            }
+            if burst > 0 {
+                if err := os.Setenv("PINE_GATE_LIMITS_BURST", fmt.Sprintf("%d", burst)); err != nil {
+                    return err
+                }
+            }
 			return server.Run(version, serveCfg)
 		},
 	}
@@ -151,17 +159,19 @@ func main() {
 				if err != nil {
 					return err
 				}
-				defer resp.Body.Close()
+                    defer func() { _ = resp.Body.Close() }()
 				if resp.StatusCode != 200 {
 					b, _ := io.ReadAll(resp.Body)
 					return fmt.Errorf("%s: %s", resp.Status, string(b))
 				}
 				buf := make([]byte, 1024)
 				for {
 					n, err := resp.Body.Read(buf)
-					if n > 0 {
-						os.Stdout.Write(buf[:n])
-					}
+                    if n > 0 {
+                        if _, err := os.Stdout.Write(buf[:n]); err != nil {
+                            return err
+                        }
+                    }
 					if err != nil {
 						break
 					}
@@ -177,7 +187,7 @@ func main() {
 			if err != nil {
 				return err
 			}
-			defer resp.Body.Close()
+            defer func() { _ = resp.Body.Close() }()
 			rb, _ := io.ReadAll(resp.Body)
 			if resp.StatusCode != 200 {
 				return fmt.Errorf("%s: %s", resp.Status, string(rb))
@@ -309,7 +319,7 @@ func fetchMetricsCmd(addr string) tea.Cmd {
 		if err != nil {
 			return metricsMsg{nil, err}
 		}
-		defer resp.Body.Close()
+        defer func() { _ = resp.Body.Close() }()
 		b, _ := io.ReadAll(resp.Body)
 		totals := parseGatewayTotals(string(b))
 		return metricsMsg{totals: totals, err: nil}
 
@@ -12,7 +12,7 @@ backends:
     base_url: "https://openrouter.ai/api/v1"
     timeout_seconds: 15
   ollama:
-    enabled: false
+    enabled: true
     host: "http://localhost:11434"
     timeout_seconds: 15
   openai: