pine-gate

Open‑source, K8s‑deployable AI Gateway for local & remote LLMs with quotas, streaming, routing, and observability.

Overview

pine-gate gives you a single, stable HTTP API in front of multiple LLMs (local and hosted). It handles authentication, rate limiting, usage counting, routing (including canaries), streaming responses, and telemetry so applications can focus on product logic rather than provider differences.

Features

Multiple model backends: use local and hosted LLMs behind one API. Echo for quick tests, Ollama for local models, vLLM via OpenAI‑compatible endpoints, plus OpenAI, OpenRouter, and Anthropic.
Smart request routing: direct traffic by model rules or roll out changes safely with weighted canary splits.
Real‑time streaming: stream tokens end‑to‑end over SSE for responsive UIs and CLIs.
Built‑in safeguards: per‑key rate limiting (in‑memory or Redis) and simple usage counters with an admin query endpoint.
First‑class observability: Prometheus metrics labeled by route and backend, and OpenTelemetry traces exported to your collector (OTLP).
Resilience controls: configurable retries and circuit breaking around backends to smooth over transient failures.
Production‑ready on Kubernetes: minimal, non‑root image; secure defaults; Helm chart with ServiceMonitor and optional HPA.
Easy local development: make run, .env support, and a tiny pinectl CLI to manage and test locally.

Get Started

This path gets you from zero to a working gateway locally, then shows how to try a backend.

Run the gateway with the example config

CONFIG_FILE=./configs/config.example.yaml make run

Check health and send a test request using the built‑in echo backend:

curl -i http://localhost:8080/healthz
curl -s -H 'x-api-key: dev-key' -H 'Content-Type: application/json' \
  -X POST http://localhost:8080/v1/completions -d '{"model":"echo","prompt":"hello"}'

Enable a real backend (example: OpenRouter)

PINE_GATE_BACKENDS_OPENROUTER_ENABLED=true \
PINE_GATE_BACKENDS_OPENROUTER_APIKEY=<YOUR_KEY> \
CONFIG_FILE=./configs/config.example.yaml make run

Then request a model via the openrouter: prefix:

curl -s -H 'x-api-key: dev-key' -H 'Content-Type: application/json' -X POST \
  http://localhost:8080/v1/completions -d '{"model":"openrouter:mistralai/mistral-7b-instruct:free","prompt":"hello"}'

See Backends for other providers and examples.

Optional: place settings in .env pine-gate loads a .env file automatically. Put your environment variables there instead of prefixing commands:

# .env
PINE_GATE_BACKENDS_OLLAMA_ENABLED=true
PINE_GATE_BACKENDS_OLLAMA_HOST=http://localhost:11434
OTEL_EXPORTER_OTLP_ENDPOINT=localhost:4318
PINE_GATE_LIMITS_RATE_RPS=5
PINE_GATE_LIMITS_BURST=10

With .env present, start with:

make run
# or
./bin/pinectl serve --config ./configs/config.example.yaml

Optional: Redis for shared rate limits and usage

docker run --rm -p 6379:6379 redis:7
PINE_GATE_REDIS_ENABLED=true PINE_GATE_REDIS_ADDR=localhost:6379 \
PINE_GATE_AUTH_ADMIN_KEY=admin CONFIG_FILE=./configs/config.example.yaml make run
curl -s -H 'x-admin-key: admin' 'http://localhost:8080/v1/usage?key=dev-key'

Optional: Tracing to Jaeger (OTLP)

docker run --rm -p 16686:16686 -p 4318:4318 -e COLLECTOR_OTLP_ENABLED=true jaegertracing/all-in-one:1.57
OTEL_EXPORTER_OTLP_ENDPOINT=localhost:4318 CONFIG_FILE=./configs/config.example.yaml make run

Backends Overview

Choose a backend by prefixing the model (e.g., openai:gpt-4o-mini, ollama:llama3). Enable providers via environment variables.

Echo: built‑in for local testing (no network call)
Ollama: local models via ollama:<model>
vLLM: OpenAI‑compatible server via vllm:<model>
OpenAI: hosted models via openai:<model>
OpenRouter: marketplace via openrouter:<model>
Anthropic: hosted models via anthropic:<model> Read more: docs/backends.md

API Overview

Two core endpoints power synchronous and streaming use cases.

POST /v1/completions — JSON request { model, prompt } → { model, output }
GET /v1/stream — SSE stream of tokens with model and prompt as query params Health and metrics are also available:
GET /healthz — service health
GET /metrics — Prometheus metrics Read more: docs/api.md and docs/openapi.yaml

Configuration

Configuration comes from environment variables, a YAML file, and defaults. Environment variables (including from .env) take precedence. Read more: docs/configuration.md

Observability

Prometheus metrics include request rate, latency, errors, and backend labels. OpenTelemetry spans trace requests and backend calls. Read more: docs/observability.md and docs/tracing.md

Quotas & Limits

Per‑key token buckets enforce rate limits. Use Redis to share limits and counters across replicas. Read more: docs/quotas-limits.md

Deploy to Kubernetes

Use the Helm chart for production‑grade defaults and easy toggles. ServiceMonitor, HPA, and OTel are available via values. Quick install:

helm install pine-gate charts/pine-gate --set auth.apiKey=dev-key
kubectl port-forward deploy/pine-gate 8080:8080

Read more: docs/deploy-k8s.md and charts/pine-gate/README.md

Security

The container runs as non‑root with a read‑only filesystem and dropped capabilities; security contexts are set in the chart. Read more: docs/security.md

CLI

pinectl helps you run the gateway locally, print effective config, send test requests, and open a tiny TUI dashboard. Read more: docs/cli.md

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github/workflows		.github/workflows
assets		assets
charts/pine-gate		charts/pine-gate
cmd		cmd
configs		configs
deployments		deployments
docs		docs
pkg		pkg
.dockerignore		.dockerignore
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

pine-gate

Table of Contents

Overview

Features

Get Started

Backends Overview

API Overview

Configuration

Observability

Quotas & Limits

Deploy to Kubernetes

Security

CLI

About

Uh oh!

Releases

Packages

Languages

Uh oh!

Uh oh!

amanycodes/pine-gate

Folders and files

Latest commit

History

Repository files navigation

pine-gate

Table of Contents

Overview

Features

Get Started

Backends Overview

API Overview

Configuration

Observability

Quotas & Limits

Deploy to Kubernetes

Security

CLI

About

Resources

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages