Skip to content

pine-gate is an open-source AI Gateway for local and cloud LLMs. It unifies access to OpenAI, OpenRouter, Anthropic, vLLM, and Ollama behind one API, with streaming, routing, quotas, and first‑class observability.

amanycodes/pine-gate

Repository files navigation

Preq
pine-gate

Open‑source, K8s‑deployable AI Gateway for local & remote LLMs with quotas, streaming, routing, and observability.

Table of Contents

  • Overview
  • Features
  • Get Started
  • Backends Overview
  • API Overview
  • Configuration
  • Observability
  • Quotas & Limits
  • Deploy to Kubernetes
  • Security
  • CLI

Overview

pine-gate gives you a single, stable HTTP API in front of multiple LLMs (local and hosted). It handles authentication, rate limiting, usage counting, routing (including canaries), streaming responses, and telemetry so applications can focus on product logic rather than provider differences.

Features

  • Multiple model backends: use local and hosted LLMs behind one API. Echo for quick tests, Ollama for local models, vLLM via OpenAI‑compatible endpoints, plus OpenAI, OpenRouter, and Anthropic.
  • Smart request routing: direct traffic by model rules or roll out changes safely with weighted canary splits.
  • Real‑time streaming: stream tokens end‑to‑end over SSE for responsive UIs and CLIs.
  • Built‑in safeguards: per‑key rate limiting (in‑memory or Redis) and simple usage counters with an admin query endpoint.
  • First‑class observability: Prometheus metrics labeled by route and backend, and OpenTelemetry traces exported to your collector (OTLP).
  • Resilience controls: configurable retries and circuit breaking around backends to smooth over transient failures.
  • Production‑ready on Kubernetes: minimal, non‑root image; secure defaults; Helm chart with ServiceMonitor and optional HPA.
  • Easy local development: make run, .env support, and a tiny pinectl CLI to manage and test locally.

Get Started

This path gets you from zero to a working gateway locally, then shows how to try a backend.

  1. Run the gateway with the example config
CONFIG_FILE=./configs/config.example.yaml make run

Check health and send a test request using the built‑in echo backend:

curl -i http://localhost:8080/healthz
curl -s -H 'x-api-key: dev-key' -H 'Content-Type: application/json' \
  -X POST http://localhost:8080/v1/completions -d '{"model":"echo","prompt":"hello"}'
  1. Enable a real backend (example: OpenRouter)
PINE_GATE_BACKENDS_OPENROUTER_ENABLED=true \
PINE_GATE_BACKENDS_OPENROUTER_APIKEY=<YOUR_KEY> \
CONFIG_FILE=./configs/config.example.yaml make run

Then request a model via the openrouter: prefix:

curl -s -H 'x-api-key: dev-key' -H 'Content-Type: application/json' -X POST \
  http://localhost:8080/v1/completions -d '{"model":"openrouter:mistralai/mistral-7b-instruct:free","prompt":"hello"}'

See Backends for other providers and examples.

  1. Optional: place settings in .env pine-gate loads a .env file automatically. Put your environment variables there instead of prefixing commands:
# .env
PINE_GATE_BACKENDS_OLLAMA_ENABLED=true
PINE_GATE_BACKENDS_OLLAMA_HOST=http://localhost:11434
OTEL_EXPORTER_OTLP_ENDPOINT=localhost:4318
PINE_GATE_LIMITS_RATE_RPS=5
PINE_GATE_LIMITS_BURST=10

With .env present, start with:

make run
# or
./bin/pinectl serve --config ./configs/config.example.yaml
  1. Optional: Redis for shared rate limits and usage
docker run --rm -p 6379:6379 redis:7
PINE_GATE_REDIS_ENABLED=true PINE_GATE_REDIS_ADDR=localhost:6379 \
PINE_GATE_AUTH_ADMIN_KEY=admin CONFIG_FILE=./configs/config.example.yaml make run
curl -s -H 'x-admin-key: admin' 'http://localhost:8080/v1/usage?key=dev-key'
  1. Optional: Tracing to Jaeger (OTLP)
docker run --rm -p 16686:16686 -p 4318:4318 -e COLLECTOR_OTLP_ENABLED=true jaegertracing/all-in-one:1.57
OTEL_EXPORTER_OTLP_ENDPOINT=localhost:4318 CONFIG_FILE=./configs/config.example.yaml make run

Backends Overview

Choose a backend by prefixing the model (e.g., openai:gpt-4o-mini, ollama:llama3). Enable providers via environment variables.

  • Echo: built‑in for local testing (no network call)
  • Ollama: local models via ollama:<model>
  • vLLM: OpenAI‑compatible server via vllm:<model>
  • OpenAI: hosted models via openai:<model>
  • OpenRouter: marketplace via openrouter:<model>
  • Anthropic: hosted models via anthropic:<model> Read more: docs/backends.md

API Overview

Two core endpoints power synchronous and streaming use cases.

  • POST /v1/completions — JSON request { model, prompt }{ model, output }
  • GET /v1/stream — SSE stream of tokens with model and prompt as query params Health and metrics are also available:
  • GET /healthz — service health
  • GET /metrics — Prometheus metrics Read more: docs/api.md and docs/openapi.yaml

Configuration

Configuration comes from environment variables, a YAML file, and defaults. Environment variables (including from .env) take precedence. Read more: docs/configuration.md

Observability

Prometheus metrics include request rate, latency, errors, and backend labels. OpenTelemetry spans trace requests and backend calls. Read more: docs/observability.md and docs/tracing.md

Quotas & Limits

Per‑key token buckets enforce rate limits. Use Redis to share limits and counters across replicas. Read more: docs/quotas-limits.md

Deploy to Kubernetes

Use the Helm chart for production‑grade defaults and easy toggles. ServiceMonitor, HPA, and OTel are available via values. Quick install:

helm install pine-gate charts/pine-gate --set auth.apiKey=dev-key
kubectl port-forward deploy/pine-gate 8080:8080

Read more: docs/deploy-k8s.md and charts/pine-gate/README.md

Security

The container runs as non‑root with a read‑only filesystem and dropped capabilities; security contexts are set in the chart. Read more: docs/security.md

CLI

pinectl helps you run the gateway locally, print effective config, send test requests, and open a tiny TUI dashboard. Read more: docs/cli.md

About

pine-gate is an open-source AI Gateway for local and cloud LLMs. It unifies access to OpenAI, OpenRouter, Anthropic, vLLM, and Ollama behind one API, with streaming, routing, quotas, and first‑class observability.

Resources

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages