Prism translates between Anthropic Messages, OpenAI Chat Completions, OpenAI Responses, and Ollama native APIs in real time. Native system tray, built-in web admin UI, model remapping, and full SSE streaming. Zero config.
Claude Desktop, Cursor, Continue, and other AI tools each expect a specific API format — but cloud providers don't all speak the same language. Prism sits in between, translating requests and responses on the fly so you can point any client at any provider.
One proxy. Every format. No Python.
| Prism | LiteLLM | |
|---|---|---|
| Binary size | ~5 MB | ~200 MB (Python + deps) |
| Memory | ~5–10 MB | ~200–500 MB |
| Startup | < 100 ms | ~2–5 s |
| Runtime deps | None | Python 3.9+, pip packages |
| Anthropic API | ✅ | ✅ |
| OpenAI Chat API | ✅ | ✅ |
| OpenAI Responses API | ✅ | ❌ |
| Ollama Native API | ✅ | ✅ |
| Streaming (SSE) | ✅ | ✅ |
| Model remapping | ✅ | ✅ |
| Tool calling | ✅ | ✅ |
| Thinking/reasoning | ✅ | |
| Per-model reasoning toggle | ✅ | ❌ |
| Reasoning effort validation | ✅ | ❌ |
| Image support | ✅ | ✅ |
| Structured outputs | ✅ | |
| Per-model capabilities | ✅ Tools / Vision / Struct | ❌ |
| models.dev auto-lookup | ✅ | ❌ |
| Provider-per-model routing | ✅ | ❌ |
| Web admin UI | ✅ | ❌ |
| Windows native | ✅ System tray + admin UI | ❌ Requires Python |
Your tools Cloud providers
───────── ────────────────
Claude Desktop ──┐
(Anthropic API) │ ┌──────────────┐
│ ┌───────────┐ │ Ollama Cloud │
Cursor ──────────┼───→│ Prism │──────→│ /api/chat │
(OpenAI API) │ │ :11434 │ └──────────────┘
│ └───────────┘ ┌──────────────┐
Continue ────────┤ │ │ OpenCode Go │
(OpenAI API) │ │ │ /v1/chat/... │
│ │ └──────────────┘
OpenAI SDK ──────┘ │ ┌──────────────┐
(Responses API) ├──────────────→│ Custom │
│ │ /v1/chat/... │
│ └──────────────┘
│
┌────┴────┐
│ Admin UI │
│ :8765 │
└─────────┘
┌────────────────────┐
│ Codex (via OAuth) │── Sign in with OpenAI
│ /v1/chat/... │ account — no API key
└────────────────────┘ needed
Prism accepts requests in Anthropic Messages format (/v1/messages), OpenAI Chat Completions format (/v1/chat/completions), or OpenAI Responses format (/v1/responses), translates them to whatever your upstream provider speaks, and translates responses back. Streaming works seamlessly in all directions.
./prism.exeThat's it. Prism starts on http://127.0.0.1:11434 and a system tray icon appears. A web admin UI is available at http://127.0.0.1:8765/admin.
Open the admin UI from the system tray (right-click → Open Settings) or navigate to http://127.0.0.1:8765/admin. In the Provider tab:
- Select your upstream provider (Ollama Cloud, OpenCode Go, a custom provider, or a Codex OAuth account)
- For API-key providers, enter your API key
- For Codex, click Add Codex Account to sign in with your OpenAI account
- Prism auto-restarts with the new config
You can also configure via %APPDATA%\prism\config.json — see Providers below.
Setting up with Claude Desktop
Edit your Claude Desktop config:
{
"inferenceProvider": "gateway",
"inferenceGatewayBaseUrl": "http://127.0.0.1:11434",
"inferenceGatewayApiKey": "prism",
"inferenceModels": [
{ "name": "glm-5.1:cloud" },
{ "name": "deepseek-v4-pro:cloud", "supports1m": true }
]
}Setting up with Claude Code
Edit ~/.claude/settings.json:
{
"env": {
"ANTHROPIC_BASE_URL": "http://127.0.0.1:11434",
"ANTHROPIC_AUTH_TOKEN": "prism",
"ANTHROPIC_API_KEY": ""
}
}Setting up with Cursor / Continue / other OpenAI clients
Point your client to http://127.0.0.1:11434/v1 with any API key. Prism accepts OpenAI Chat Completions requests and translates them to the configured upstream provider.
Setting up with OpenAI SDK (Responses API)
Set the base URL to http://127.0.0.1:11434/v1. Prism accepts OpenAI Responses API requests at /v1/responses and translates them to the configured upstream provider — including streaming, tool calls, and reasoning.
from openai import OpenAI
client = OpenAI(
base_url="http://127.0.0.1:11434/v1",
api_key="prism"
)
response = client.responses.create(
model="glm-5.1:cloud",
input="Hello!",
stream=True
)When launched without arguments, Prism runs as a system tray application with these options:
| Menu item | Action |
|---|---|
| Start / Stop / Restart Proxy | Control the proxy server process |
| Provider → Ollama Cloud / OpenCode Go / Custom providers | Switch upstream provider on the fly |
| Add Codex Account | Start Codex OAuth flow to link an OpenAI account |
| Refresh Usage | Refresh credit usage for all connected Codex accounts |
| Open Settings | Open the web admin UI in your browser |
| Open Folder | Open the proxy directory in Explorer |
| Edit Model Config | Open model_remapping.json in Notepad |
| Show Logs | Open a live log viewer console |
| Set API Key | Open the web admin UI to set keys |
| Quit | Stop proxy and exit |
Prism includes a built-in web admin interface for managing everything without editing config files by hand.
URL: http://127.0.0.1:8765/admin (configurable via PRISM_ADMIN_PORT)
The admin UI provides:
| Tab | Features |
|---|---|
| Provider | Select default provider, set API keys, add/edit/remove custom providers |
| OAuth | Manage Codex (OpenAI) accounts — sign in, view usage credits, activate, or remove accounts |
| Models | Edit model remapping — default model, known models with per-model provider, reasoning toggle, capabilities (tools/vision/struct), context length, max output tokens, reasoning effort levels, and aliases. Includes models.dev search and auto-fill for model info. |
| Stats | Live and historical performance dashboard (see below) |
| Proxy | Start, stop, and restart the proxy; view status; toggle auto-start at login |
| Logs | Live tail of the last 200 log lines |
Changes are saved immediately and the proxy auto-restarts when needed.
The Stats tab surfaces every metric about your proxy usage:
| Section | What it shows |
|---|---|
| Filter bar | Filter by provider, model, client origin, or date range; refresh button to reload all data |
| Tokens Per Day | Stacked bar chart (input + output) with a total headline — persists across restarts via SQLite |
| Tokens Per Month | Filled line chart showing monthly aggregate totals |
| Live TPS | Real-time tokens/sec hero value with a live sparkline chart (120-point rolling window, updated every second) |
| Session Totals | Running counts: total requests, input tokens, output tokens, and average TPS |
| Client Breakdown | Per-client usage stats showing requests, total tokens, and a distribution pie chart — identifies tools like Claude Code, Cursor, Continue, Copilot, Factory Droid, and more automatically by User-Agent |
| TPS History | Table (model, provider, avg/max TPS) paired with a multi-line chart of 5-minute bucket averages over time |
| By Model | Per-model breakdown of requests, token counts, and average TPS |
| Recent Requests | Timestamped log of the last 50 requests with model, client, token counts, TPS, and duration |
| Data Management | One-click Clear All Stats button to wipe all persisted history |
All request data and TPS snapshots are persisted to %APPDATA%\prism\stats.db (SQLite, WAL mode) so the dashboard survives proxy restarts and page refreshes. Charts are rendered with Chart.js and automatically adapt to light/dark theme.
Prism automatically identifies which tool is making each request by inspecting the User-Agent header. Detected clients include:
Claude Code, Cursor, Continue, GitHub Copilot, Aider, OpenCode, Windsurf, Trae, Factory Droid, Supermaven, and Claude Desktop.
You can override detection by setting the X-Client-Name header on your requests — the value is used directly in stats, so you can tag requests with custom names like "my-script" or "ci-pipeline".
| Variable | Default | Description |
|---|---|---|
PRISM_PORT |
11434 |
Port for the proxy server |
PRISM_HOST |
127.0.0.1 |
Host to bind (use 0.0.0.0 for network access) |
PRISM_ADMIN_PORT |
8765 |
Port for the admin web UI |
OLLAMA_API_KEY |
— | API key for Ollama Cloud (fallback if not in config) |
OPENCODE_GO_API_KEY |
— | API key for OpenCode Go (fallback if not in config) |
Prism supports multiple upstream providers, configured via the admin UI or %APPDATA%\prism\config.json:
| Provider | Config key | Upstream format | Endpoint |
|---|---|---|---|
| Ollama Cloud | ollama_cloud |
Ollama Native | /api/chat |
| OpenCode Go | opencode_go |
OpenAI | /v1/chat/completions |
| Custom providers | custom_providers[] |
OpenAI | /v1/chat/completions |
| Codex (via OAuth) | oauth_accounts[] |
OpenAI | /v1/chat/completions |
Each model in your remapping is assigned to a specific provider. When a request arrives, Prism resolves the model, looks up its assigned provider, and routes the request to that upstream — even if other models go to different providers. This means you can mix models from Ollama Cloud, OpenCode Go, custom providers, and OAuth accounts in a single session.
- The
default_providerfield in config is used only as a fallback when a model has no explicit provider assignment. - Provider routing is handled by the ProviderRouter, which resolves the provider per-request based on the requested model name.
- Models from different providers can coexist — set each model's provider when adding it to Known Models.
You can add multiple custom providers (e.g. OpenRouter, Groq, Together AI) — each with its own name, base URL, and API key. Add, edit, or delete them from the admin UI Provider tab. Custom providers are assigned unique IDs like custom_myprovider_abc123.
Prism supports signing in with your OpenAI account via OAuth (no API key needed). Click Add Codex Account in the admin UI OAuth tab or system tray, and your browser will open for authentication. Once connected, Prism uses your account token automatically, including token refresh and credit usage tracking.
Switch providers from the system tray, admin UI, or by changing the default_provider field — no restart required when using the tray/UI.
Full config example
{
"default_provider": "ollama_cloud",
"ollama_cloud": {
"id": "ollama_cloud",
"name": "Ollama Cloud",
"base_url": "https://ollama.com",
"api_key": ""
},
"opencode_go": {
"id": "opencode_go",
"name": "OpenCode Go",
"base_url": "https://opencode.ai/zen/go",
"api_key": ""
},
"custom_providers": [
{
"id": "custom_openrouter_abc123",
"name": "OpenRouter",
"base_url": "https://openrouter.ai/api/v1",
"api_key": ""
}
],
"oauth_accounts": [
{
"id": "codex_user_abc123",
"provider": "codex",
"label": "Codex",
"email": "user@example.com",
"access_token": "...",
"refresh_token": "...",
"expires_at": 1234567890,
"plan_tier": "plus",
"active": true
}
]
}API keys in the config file take priority. If empty, Prism falls back to these environment variables:
| Variable | Used for |
|---|---|
OLLAMA_API_KEY |
Ollama Cloud |
OPENCODE_GO_API_KEY |
OpenCode Go |
Prism can remap model names on the fly — useful when clients send model names that don't exist on your upstream provider.
Configured via the admin UI (Models tab) or %APPDATA%\prism\model_remapping.json.
When an unknown model is requested, Prism falls back to this model. Select it from the dropdown in the admin UI or set default_model.
Known models are now rich entries — not just strings — with per-model provider assignment, reasoning toggle, capabilities, and token limits. Each entry includes:
| Field | Type | Description |
|---|---|---|
id |
string | Model identifier (e.g. deepseek-v4-flash:cloud) |
provider |
string | Provider to route this model to (e.g. ollama_cloud, opencode_go, a custom provider ID, or an OAuth account ID) |
reasoning |
bool | Whether this model supports thinking/reasoning |
reasoning_effort |
string[] | Allowed reasoning effort levels (low, medium, high, max) |
context_length |
int | Maximum context window in tokens |
max_output_tokens |
int | Maximum output tokens |
capabilities.tool_calling |
bool | Supports tool/function calling |
capabilities.structured_outputs |
bool | Supports structured/JSON output |
capabilities.vision |
bool | Supports image input |
Models matching a known entry pass through without remapping. A model that doesn't match any entry falls back to the default model.
Prism now validates reasoning_effort against each model's capabilities:
- Non-reasoning models:
reasoning_effortis automatically stripped from requests. - Reasoning models: Invalid effort values are normalized to the model's first allowed effort (e.g.
"invalid"→"medium"), with a warning logged. - Unknown models:
reasoning_effortis stripped for safety. - Responses API normalization:
enabled/on/true→medium;disabled/off/false/none→ omitted. - Anthropic → OpenAI translation: Anthropic
thinkingis mapped toreasoning_effort=medium.
Map incoming model names to different upstream models.
| Feature | What it does |
|---|---|
| Aliases | Map model names (e.g. claude-3-5-haiku → deepseek-v4-flash:cloud) |
| Default model | Fallback when a requested model isn't recognized |
| Known models | Rich entries with per-model provider, reasoning, and capabilities |
The admin UI integrates with models.dev to auto-fill model information. When adding or editing a model:
- Start typing a model ID — a search dropdown appears with results from models.dev, scoped to the selected provider.
- Click Fetch or select a search result to auto-fill:
- Context length
- Max output tokens
- Reasoning toggle and allowed effort levels
- Tool calling, structured output, and vision capabilities
- The lookup runs directly from the admin server (not through the proxy) so it works even when the proxy is stopped.
The search is scoped to the selected provider (Ollama Cloud, OpenCode Go, or custom) — results are fuzzy-matched against models.dev provider keys.
Full remapping example
{
"default_model": "glm-5.1:cloud",
"known_models": [
{
"id": "glm-5.1:cloud",
"provider": "ollama_cloud",
"reasoning": true,
"reasoning_effort": ["low", "medium", "high"],
"context_length": 128000,
"max_output_tokens": 16384,
"capabilities": {
"tool_calling": true,
"structured_outputs": true,
"vision": true
}
},
{
"id": "deepseek-v4-flash:cloud",
"provider": "ollama_cloud",
"reasoning": true,
"reasoning_effort": ["low", "medium", "high"],
"context_length": 128000,
"max_output_tokens": 16384,
"capabilities": {
"tool_calling": true,
"structured_outputs": true
}
},
{
"id": "opencode/deepseek-v4-flash",
"provider": "opencode_go",
"reasoning": true,
"reasoning_effort": ["low", "medium", "high"],
"context_length": 128000,
"max_output_tokens": 16384,
"capabilities": {
"tool_calling": true,
"structured_outputs": true
}
},
{
"id": "deepseek-v4-pro:cloud",
"provider": "ollama_cloud",
"reasoning": true,
"reasoning_effort": ["low", "medium", "high", "max"],
"context_length": 128000,
"max_output_tokens": 16384,
"capabilities": {
"tool_calling": true,
"structured_outputs": true
}
}
],
"aliases": {
"claude-3-5-haiku": "deepseek-v4-flash:cloud",
"claude-3-5-haiku-20241022": "deepseek-v4-flash:cloud",
"claude-3-haiku-20240307": "deepseek-v4-flash:cloud",
"claude-haiku-3-5-20241022": "deepseek-v4-flash:cloud"
}
}| Method | Path | Auth | Description |
|---|---|---|---|
POST |
/v1/messages |
x-api-key header |
Anthropic Messages API |
POST |
/v1/chat/completions |
Authorization: Bearer <key> |
OpenAI Chat Completions API |
POST |
/v1/responses |
Authorization: Bearer <key> |
OpenAI Responses API |
GET |
/v1/models |
Authorization: Bearer <key> |
List available models |
GET |
/health |
None | Health check |
GET |
/api/model-info |
None | Look up model details from models.dev (admin UI) |
GET |
/admin/model-info |
None | Look up model details from models.dev (admin server only) |
GET |
/admin/model-search |
None | Search models on models.dev (admin server only) |
POST |
/v1/messages/count_tokens |
x-api-key header |
Returns 404 (not supported upstream) |
Prism handles the full translation surface between all API formats:
Anthropic ↔ Ollama
Request mapping:
| Anthropic | Ollama | Notes |
|---|---|---|
messages |
messages |
Content blocks → string or array |
system |
messages[].role=system |
Injected as first message |
max_tokens |
options.num_predict |
|
temperature / top_p / top_k |
options.* |
|
tools |
tools |
Schema translation |
thinking |
think |
|
stop_sequences |
options.stop |
|
images (base64) |
images |
Image content blocks → image array |
Response mapping:
| Ollama | Anthropic | Notes |
|---|---|---|
message.content |
content[0].text |
Wrapped in content block array |
message.tool_calls |
content[].tool_use |
|
message.thinking |
content[].thinking |
|
done_reason: stop |
stop_reason: end_turn |
|
done_reason: length |
stop_reason: max_tokens |
|
done_reason: tool_call |
stop_reason: tool_use |
Anthropic ↔ OpenAI
Request mapping:
| Anthropic | OpenAI | Notes |
|---|---|---|
messages |
messages |
Content blocks → OpenAI format |
system |
messages[].role=system |
|
max_tokens |
max_tokens |
|
tools |
tools |
Schema translation |
thinking |
reasoning_content |
|
images (base64) |
image_url (data URI) |
Image content blocks → OpenAI image parts |
Response mapping:
| OpenAI | Anthropic | Notes |
|---|---|---|
choices[0].message.content |
content[0].text |
|
choices[0].message.tool_calls |
content[].tool_use |
|
choices[0].message.reasoning_content |
content[].thinking |
|
finish_reason: stop |
stop_reason: end_turn |
|
finish_reason: length |
stop_reason: max_tokens |
|
finish_reason: tool_calls |
stop_reason: tool_use |
OpenAI inbound → Ollama
When an OpenAI client talks to Prism with an Ollama upstream, Prism translates the full OpenAI Chat Completions request/response format to/from Ollama native format — including streaming, tool calls, reasoning content, and images.
| OpenAI | Ollama | Notes |
|---|---|---|
reasoning_effort |
think |
Any non-"off" value enables thinking |
image_url (data URI) |
images |
Base64 data extracted from data URI |
response_format |
— | Passed through when supported |
OpenAI inbound → OpenAI (pass-through)
When both the client and upstream speak OpenAI format, Prism applies model remapping and forwards the request with minimal modification. Streaming is passed through as-is.
Responses API ↔ Ollama / OpenAI
Prism translates the OpenAI Responses API (/v1/responses) to the upstream format, whether Ollama or OpenAI:
| Responses API | Chat Completions / Ollama | Notes |
|---|---|---|
input (string) |
messages[].role=user |
Simple string input → user message |
input (array of items) |
messages[] |
message, function_call, function_call_output items mapped |
instructions |
messages[].role=system |
System prompt |
tools (function type) |
tools |
Only type: function tools forwarded |
reasoning |
reasoning_effort / think |
Reasoning config → thinking mode |
text.format |
response_format / format |
Structured output / JSON schema |
max_output_tokens |
max_tokens / options.num_predict |
|
temperature / top_p |
temperature / top_p |
Response mapping (OpenAI upstream → Responses API):
| Chat Completions | Responses API | Notes |
|---|---|---|
message.content |
output[].message.content[].output_text |
Text content → output parts |
message.reasoning_content |
output[].reasoning |
Reasoning → reasoning item |
message.tool_calls |
output[].function_call |
Tool calls → function call items |
finish_reason: stop |
status: completed |
|
finish_reason: length |
status: incomplete |
Streaming: Full Responses API streaming event sequence is emitted — response.created, response.output_item.added, response.output_text.delta, response.output_text.done, response.content_part.added/done, response.output_item.done, response.function_call_arguments.delta/done, and response.completed.
All six routing paths support real-time SSE streaming with correct event translation:
| Inbound | Upstream | Streaming |
|---|---|---|
| Anthropic | Ollama | ✅ Newline-delimited JSON → Anthropic SSE |
| Anthropic | OpenAI | ✅ OpenAI SSE → Anthropic SSE |
| OpenAI Chat | Ollama | ✅ Newline-delimited JSON → OpenAI SSE |
| OpenAI Chat | OpenAI | ✅ Pass-through with model remapping |
| OpenAI Responses | Ollama | ✅ Newline-delimited JSON → Responses API SSE events |
| OpenAI Responses | OpenAI | ✅ OpenAI SSE → Responses API SSE events |
Thinking/reasoning blocks, tool calls, and images are fully supported in all streaming paths.
Prism can start automatically when you log in to Windows. Toggle this from the admin UI (Proxy tab → Start at Login) or manually:
The auto-start feature uses the Windows Registry (HKCU\Software\Microsoft\Windows\CurrentVersion\Run) to launch the Prism executable at login. No admin rights required.
The following features are not supported by upstream providers and are handled gracefully:
- Anthropic:
count_tokens,tool_choice,metadata, prompt caching, batches, PDF, URL images - OpenAI Chat inbound:
/v1/modelsreturns a static list from config (not proxied),parallel_tool_calls,logprobs,seed,user - OpenAI Responses inbound:
previous_response_id(conversation continuity),store, built-in tools (web search, file search, code interpreter) are filtered out for Ollama upstreams
go-winres make --in resource.rc --out resource.syso; go build -ldflags="-H windowsgui" -o prism.exe .The -H windowsgui flag hides the console window and enables system tray integration.
To run in console mode (for debugging), build without the flag:
go build -o prism.exe .
./prism.exe --serve# 1. Start Prism
./prism.exe
# 2. Test Anthropic endpoint
Invoke-RestMethod -Uri "http://127.0.0.1:11434/v1/messages" -Method POST `
-ContentType "application/json" `
-Headers @{"x-api-key"="prism"} `
-Body '{"model":"glm-5.1:cloud","max_tokens":50,"messages":[{"role":"user","content":"hi"}]}'
# 3. Test OpenAI Chat Completions endpoint
Invoke-RestMethod -Uri "http://127.0.0.1:11434/v1/chat/completions" -Method POST `
-ContentType "application/json" `
-Headers @{"Authorization"="Bearer prism"} `
-Body '{"model":"glm-5.1:cloud","max_tokens":50,"messages":[{"role":"user","content":"hi"}]}'
# 4. Test OpenAI Responses API endpoint
Invoke-RestMethod -Uri "http://127.0.0.1:11434/v1/responses" -Method POST `
-ContentType "application/json" `
-Headers @{"Authorization"="Bearer prism"} `
-Body '{"model":"glm-5.1:cloud","input":"hi"}'
# 5. Test model listing
Invoke-RestMethod -Uri "http://127.0.0.1:11434/v1/models" -Headers @{"Authorization"="Bearer prism"}
# 6. Test admin UI
Invoke-RestMethod -Uri "http://127.0.0.1:8765/admin/status"Prism — translate, proxy, stream.
