Use ChatGPT / Codex OAuth as a local OpenAI-compatible API server.
- OpenAI & Anthropic compatible —
POST /v1/chat/completionsandPOST /v1/messagesendpoints - Claude Code ready — use Codex models directly from Claude Code CLI
- Streaming — full SSE streaming for both OpenAI and Anthropic protocols
- Tool calling — function calls, tool results, and parallel tool calls
- Image support — generation, inspection, and base64 image passthrough (including tool result images)
- Reasoning — configurable reasoning effort with streaming thinking content
- Codex features —
prompt_cache_key,previous_response_id, subagent headers, remote compaction - Codex config aware — reads
CODEX_HOME/~/.codex/config.tomlfor model and context-window settings - Token estimate & compaction helpers — Anthropic-compatible
/v1/messages/count_tokensand/v1/messages/compact - Auto auth — reads
~/.codex/auth.jsonand auto-refreshes OAuth tokens - 3 implementations — Python, TypeScript (npm), and Rust — identical behavior
Runs a lightweight HTTP server on localhost that translates standard OpenAI API calls into authenticated requests against the ChatGPT / Codex backend using your existing ~/.codex/auth.json OAuth credentials.
Python, Rust, and TypeScript (npm) implementations are provided — identical functionality, same endpoints, same behavior.
Install the official Codex CLI and log in so that ~/.codex/auth.json exists:
npm install -g @openai/codex
codex loginThe server reads that file to obtain and refresh ChatGPT OAuth tokens automatically.
Install from PyPI:
pip install codex-as-api
codex-as-apiOr with uv:
uv pip install codex-as-api
codex-as-apiOr from source:
git clone https://github.com/Eunho-J/codex-as-api.git
cd codex-as-api
pip install -e ".[server]"
codex-as-apicd rust
cargo build --release
./target/release/codex-as-apiInstall from npm and run:
npm install -g codex-as-api
codex-as-apiOr use npx without installing:
npx codex-as-apiOr from source:
cd ts
npm install
npm run build
node dist/cli.jsCan also be used as a library:
import { ChatGPTOAuthProvider, createApp } from "codex-as-api";
// Use the provider directly
const provider = new ChatGPTOAuthProvider({ model: "gpt-5.5" });
const response = await provider.chat(
[
{ role: "system", content: "You are helpful." },
{ role: "user", content: "Hello!" },
],
);
console.log(response.content);
// Or create an Express app
const app = createApp();
app.listen(18080);All versions bind to 127.0.0.1:18080 (localhost only) by default.
Environment variables (Python, Rust, and TypeScript):
| Variable | Default | Description |
|---|---|---|
CODEX_AS_API_HOST |
127.0.0.1 |
Bind address |
CODEX_AS_API_PORT |
18080 |
Listen port |
CODEX_AS_API_MODEL |
~/.codex/config.toml model, else gpt-5.5 |
Model identifier passed to Codex backend |
CODEX_AS_API_AUTH_PATH |
~/.codex/auth.json |
Path to OAuth credentials file |
CODEX_HOME |
~/.codex |
Codex home directory used for auth.json and config.toml discovery |
The server also reads root-level Codex CLI settings from ~/.codex/config.toml:
model = "gpt-5.5"
model_context_window = 200000
model_auto_compact_token_limit = 160000CODEX_AS_API_MODEL overrides the Codex config model. The context settings are exposed from /health and returned by Anthropic token-count responses.
| Model | Description |
|---|---|
gpt-5.5 |
Frontier model for complex coding, research, and real-world work |
gpt-5.4 |
Strong model for everyday coding |
gpt-5.4-mini |
Small, fast, and cost-efficient model for simpler coding tasks |
gpt-5.3-codex |
Coding-optimized model |
gpt-5.3-codex-spark |
Ultra-fast coding model |
gpt-5.2 |
Previous generation model |
To use a different port:
CODEX_AS_API_PORT=9000 codex-as-apiTo expose on all interfaces (e.g. for remote access):
CODEX_AS_API_HOST=0.0.0.0 codex-as-apiStandard OpenAI chat completions. Supports streaming (stream: true) and non-streaming.
curl http://localhost:18080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5.5",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello"}
]
}'Streaming:
curl http://localhost:18080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5.5",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello"}
],
"stream": true
}'With tools:
curl http://localhost:18080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5.5",
"messages": [
{"role": "system", "content": "You have access to tools."},
{"role": "user", "content": "What is the weather in Seoul?"}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather",
"parameters": {
"type": "object",
"properties": {"city": {"type": "string"}},
"required": ["city"]
}
}
}
]
}'Anthropic Messages API compatible endpoint. Supports streaming (stream: true) and non-streaming. The client's model name is reflected in responses, but the server always uses the configured CODEX_AS_API_MODEL for the backend call.
curl http://localhost:18080/v1/messages \
-H "Content-Type: application/json" \
-H "x-api-key: unused" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "claude-sonnet-4-6",
"max_tokens": 200,
"system": "You are a helpful assistant.",
"messages": [
{"role": "user", "content": "Hello!"}
]
}'Streaming:
curl -N http://localhost:18080/v1/messages \
-H "Content-Type: application/json" \
-H "x-api-key: unused" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "claude-sonnet-4-6",
"max_tokens": 200,
"stream": true,
"system": "You are a helpful assistant.",
"messages": [
{"role": "user", "content": "Hello!"}
]
}'Anthropic-compatible token counting helper. Codex OAuth does not expose a count-only endpoint equivalent to Anthropic's native API, so this route returns a conservative local estimate plus the configured context-window metadata. The estimate uses UTF-8 byte length as an upper bound for GPT/Codex BPE text tokens, then adds protocol overhead for roles, message boundaries, tools, raw request metadata, and images.
curl http://localhost:18080/v1/messages/count_tokens \
-H "Content-Type: application/json" \
-H "x-api-key: unused" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "claude-sonnet-4-6",
"messages": [{"role": "user", "content": "Hello!"}]
}'Anthropic-compatible alias for remote conversation compaction. Accepts Anthropic Messages-shaped bodies and returns compacted checkpoint content.
Generate images via the Codex image generation tool.
curl http://localhost:18080/v1/images/generations \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5.5",
"prompt": "a futuristic city at sunset",
"size": "1024x1024"
}'Inspect images with a text prompt (custom endpoint).
curl http://localhost:18080/v1/inspect \
-H "Content-Type: application/json" \
-d '{
"prompt": "Describe what you see",
"images": [{"image_url": "data:image/png;base64,iVBORw0KGgo..."}]
}'Compact a conversation into a checkpoint for continuation (custom endpoint). /v1/messages/compact provides the Anthropic-compatible alias.
curl http://localhost:18080/v1/compact \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Summarize our conversation so far."},
{"role": "assistant", "content": "We discussed the project architecture."}
]
}'Health check. Returns auth availability, configured model, Codex config path, and context-window settings.
curl http://localhost:18080/health
# {"status":"ok","auth_available":true,"model":"gpt-5.5","codex_config_path":"/Users/me/.codex/config.toml","context_window":200000,"auto_compact_token_limit":160000}These features are extensions beyond the standard OpenAI API, designed for Codex CLI compatibility.
Enables prefix-cache stickiness on the Codex backend. When multiple requests share the same prompt_cache_key, the backend can reuse cached KV computations for the shared prefix, reducing latency and cost.
When to use: Set a stable key per conversation or session. All turns within the same session should share one key.
Important: Do not use usage.prompt_tokens_details.cached_tokens (or usage.input_tokens_details.cached_tokens) as a prompt or context-management signal. This server passes through the Codex backend usage payload when it is available, and current Codex OAuth responses may report cached_tokens: 0 even when prompt_cache_key is used. Treat prompt_cache_key as a backend cache-affinity hint, not as a guarantee that cache-hit accounting will be exposed through the API response.
curl http://localhost:18080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5.5",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello"}
],
"prompt_cache_key": "session-abc-123"
}'Controls how much compute the model spends on reasoning. Valid values: none, minimal, low, medium, high, xhigh.
curl http://localhost:18080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5.5",
"messages": [
{"role": "system", "content": "Solve this step by step."},
{"role": "user", "content": "Prove that sqrt(2) is irrational."}
],
"reasoning_effort": "high"
}'Chains responses together on the backend. Pass the response ID from a previous turn to maintain server-side conversation state.
curl http://localhost:18080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5.5",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Continue from where we left off."}
],
"previous_response_id": "resp_abc123"
}'Identifies the request as coming from a specific subagent type. Values used by Codex CLI: review, compact, memory_consolidation, collab_spawn.
Can be passed as a body field or HTTP header:
# As body field
curl http://localhost:18080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5.5",
"messages": [{"role": "system", "content": "Review this code."}, {"role": "user", "content": "..."}],
"subagent": "review"
}'
# As HTTP header
curl http://localhost:18080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "x-openai-subagent: review" \
-d '{
"model": "gpt-5.5",
"messages": [{"role": "system", "content": "Review this code."}, {"role": "user", "content": "..."}]
}'Flags the request as a memory generation/consolidation request. Can be passed as a body field (bool) or HTTP header ("true"/"false"):
curl http://localhost:18080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "x-openai-memgen-request: true" \
-d '{
"model": "gpt-5.5",
"messages": [{"role": "system", "content": "Consolidate memories."}, {"role": "user", "content": "..."}]
}'Point the base URL to your local server:
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:18080/v1",
api_key="unused",
)
response = client.chat.completions.create(
model="gpt-5.5",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"},
],
extra_body={"prompt_cache_key": "my-session"},
)
print(response.choices[0].message.content)import OpenAI from "openai";
const client = new OpenAI({
baseURL: "http://localhost:18080/v1",
apiKey: "unused",
});
const response = await client.chat.completions.create({
model: "gpt-5.5",
messages: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Hello!" },
],
});
console.log(response.choices[0].message.content);curl -N http://localhost:18080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5.5",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Tell me a joke."}
],
"stream": true,
"prompt_cache_key": "joke-session"
}'The /v1/messages endpoint is compatible with Claude Code. Claude Code can send its normal Anthropic model names; responses preserve the client-supplied model name, while backend Codex requests use CODEX_AS_API_MODEL or the model from ~/.codex/config.toml.
# Minimal setup
ANTHROPIC_BASE_URL=http://localhost:18080 \
ANTHROPIC_API_KEY=unused \
claude# Optional: force the backend Codex model for all Claude Code requests
CODEX_AS_API_MODEL=gpt-5.5 codex-as-api
# In another shell
ANTHROPIC_BASE_URL=http://localhost:18080 \
ANTHROPIC_API_KEY=unused \
claudeClient (OpenAI SDK / curl)
|
v
HTTP Server (FastAPI / Axum / Express)
|
+---> ChatGPTOAuthProvider
|
+---> ~/.codex/auth.json (OAuth tokens, auto-refresh)
+---> https://chatgpt.com/backend-api/codex/responses
The provider handles:
- Token loading and automatic refresh on 401
- OpenAI Responses API over SSE
prompt_cache_keypassthrough for prefix-cache stickiness- Reasoning content streaming (
reasoning_content,reasoning) - Tool call streaming
- Codex-specific headers (
x-openai-subagent,x-openai-memgen-request) previous_response_idfor response chaining- Image generation and inspection
- Remote conversation compaction
pip install -e ".[dev,server]"
pip install httpx
pytest tests/ -vcd rust
cargo testcd ts
npm install
npm test- Stop forwarding client
max_tokensas Codexmax_output_tokens, restoring Claude Code compatibility with the Codex OAuth backend. - Add Python, TypeScript, and Rust regression tests for the provider payload.
- Restore immediate Anthropic streaming so Claude Code receives events without waiting for the backend response to finish.
- Use conservative local token estimates for
/v1/messages/count_tokens; Codex OAuth has no count-only backend endpoint. - Keep real final streaming usage metadata in
message_delta.
- Attempted real backend token counting for
/v1/messages/count_tokenswithmax_output_tokens: 0; this is superseded by v0.3.2 because Codex OAuth rejects count-only requests. - Forward converted Anthropic tools, tool choice, stop sequences, and thinking/reasoning settings during token-count requests.
- Propagate cumulative Anthropic streaming usage, including cache accounting, server tool use, and service tier metadata when available.
- Pass
max_output_tokensthrough provider requests across Python, TypeScript, and Rust.
- Read Codex CLI config from
CODEX_HOME/~/.codex/config.tomlacross Python, TypeScript, and Rust. - Use the configured Codex backend model while preserving Anthropic client model names in
/v1/messagesresponses. - Expose
context_windowandauto_compact_token_limitthrough/healthand/v1/messages/count_tokens. - Add Anthropic-compatible
/v1/messages/count_tokensand/v1/messages/compact. - Map context-window failures to Anthropic-style
400 invalid_request_errorresponses and stream error events.
Apache License 2.0 — derived from OpenAI Codex CLI (Apache-2.0, Copyright 2025 OpenAI).