codex-as-api

Use ChatGPT / Codex OAuth as a local OpenAI-compatible API server.

Features

OpenAI & Anthropic compatible — POST /v1/chat/completions and POST /v1/messages endpoints
Claude Code ready — use Codex models directly from Claude Code CLI
Streaming — full SSE streaming for both OpenAI and Anthropic protocols
Tool calling — function calls, tool results, and parallel tool calls
Image support — generation, inspection, and base64 image passthrough (including tool result images)
Reasoning — configurable reasoning effort with streaming thinking content
Codex features — prompt_cache_key, previous_response_id, subagent headers, remote compaction
Codex config aware — reads CODEX_HOME / ~/.codex/config.toml for model and context-window settings
Token estimate & compaction helpers — Anthropic-compatible /v1/messages/count_tokens and /v1/messages/compact
Auto auth — reads ~/.codex/auth.json and auto-refreshes OAuth tokens
3 implementations — Python, TypeScript (npm), and Rust — identical behavior

What it does

Runs a lightweight HTTP server on localhost that translates standard OpenAI API calls into authenticated requests against the ChatGPT / Codex backend using your existing ~/.codex/auth.json OAuth credentials.

Python, Rust, and TypeScript (npm) implementations are provided — identical functionality, same endpoints, same behavior.

Prerequisites

Install the official Codex CLI and log in so that ~/.codex/auth.json exists:

npm install -g @openai/codex
codex login

The server reads that file to obtain and refresh ChatGPT OAuth tokens automatically.

Install & Run

Python

Install from PyPI:

pip install codex-as-api
codex-as-api

Or with uv:

uv pip install codex-as-api
codex-as-api

Or from source:

git clone https://github.com/Eunho-J/codex-as-api.git
cd codex-as-api
pip install -e ".[server]"
codex-as-api

Rust

cd rust
cargo build --release
./target/release/codex-as-api

TypeScript (npm)

Install from npm and run:

npm install -g codex-as-api
codex-as-api

Or use npx without installing:

npx codex-as-api

Or from source:

cd ts
npm install
npm run build
node dist/cli.js

Can also be used as a library:

import { ChatGPTOAuthProvider, createApp } from "codex-as-api";

// Use the provider directly
const provider = new ChatGPTOAuthProvider({ model: "gpt-5.5" });
const response = await provider.chat(
  [
    { role: "system", content: "You are helpful." },
    { role: "user", content: "Hello!" },
  ],
);
console.log(response.content);

// Or create an Express app
const app = createApp();
app.listen(18080);

All versions bind to 127.0.0.1:18080 (localhost only) by default.

Configuration

Environment variables (Python, Rust, and TypeScript):

Variable	Default	Description
`CODEX_AS_API_HOST`	`127.0.0.1`	Bind address
`CODEX_AS_API_PORT`	`18080`	Listen port
`CODEX_AS_API_MODEL`	`~/.codex/config.toml` `model`, else `gpt-5.5`	Model identifier passed to Codex backend
`CODEX_AS_API_AUTH_PATH`	`~/.codex/auth.json`	Path to OAuth credentials file
`CODEX_HOME`	`~/.codex`	Codex home directory used for `auth.json` and `config.toml` discovery

The server also reads root-level Codex CLI settings from ~/.codex/config.toml:

model = "gpt-5.5"
model_context_window = 200000
model_auto_compact_token_limit = 160000

CODEX_AS_API_MODEL overrides the Codex config model. The context settings are exposed from /health and returned by Anthropic token-count responses.

Supported Models

Model	Description
`gpt-5.5`	Frontier model for complex coding, research, and real-world work
`gpt-5.4`	Strong model for everyday coding
`gpt-5.4-mini`	Small, fast, and cost-efficient model for simpler coding tasks
`gpt-5.3-codex`	Coding-optimized model
`gpt-5.3-codex-spark`	Ultra-fast coding model
`gpt-5.2`	Previous generation model

To use a different port:

CODEX_AS_API_PORT=9000 codex-as-api

To expose on all interfaces (e.g. for remote access):

CODEX_AS_API_HOST=0.0.0.0 codex-as-api

API Endpoints

`POST /v1/chat/completions`

Standard OpenAI chat completions. Supports streaming (stream: true) and non-streaming.

curl http://localhost:18080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.5",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Hello"}
    ]
  }'

Streaming:

curl http://localhost:18080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.5",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Hello"}
    ],
    "stream": true
  }'

With tools:

curl http://localhost:18080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.5",
    "messages": [
      {"role": "system", "content": "You have access to tools."},
      {"role": "user", "content": "What is the weather in Seoul?"}
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Get current weather",
          "parameters": {
            "type": "object",
            "properties": {"city": {"type": "string"}},
            "required": ["city"]
          }
        }
      }
    ]
  }'

`POST /v1/messages`

Anthropic Messages API compatible endpoint. Supports streaming (stream: true) and non-streaming. The client's model name is reflected in responses, but the server always uses the configured CODEX_AS_API_MODEL for the backend call.

curl http://localhost:18080/v1/messages \
  -H "Content-Type: application/json" \
  -H "x-api-key: unused" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-sonnet-4-6",
    "max_tokens": 200,
    "system": "You are a helpful assistant.",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ]
  }'

Streaming:

curl -N http://localhost:18080/v1/messages \
  -H "Content-Type: application/json" \
  -H "x-api-key: unused" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-sonnet-4-6",
    "max_tokens": 200,
    "stream": true,
    "system": "You are a helpful assistant.",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ]
  }'

`POST /v1/messages/count_tokens`

Anthropic-compatible token counting helper. Codex OAuth does not expose a count-only endpoint equivalent to Anthropic's native API, so this route returns a conservative local estimate plus the configured context-window metadata. The estimate uses UTF-8 byte length as an upper bound for GPT/Codex BPE text tokens, then adds protocol overhead for roles, message boundaries, tools, raw request metadata, and images.

curl http://localhost:18080/v1/messages/count_tokens \
  -H "Content-Type: application/json" \
  -H "x-api-key: unused" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-sonnet-4-6",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

`POST /v1/messages/compact`

Anthropic-compatible alias for remote conversation compaction. Accepts Anthropic Messages-shaped bodies and returns compacted checkpoint content.

`POST /v1/images/generations`

Generate images via the Codex image generation tool.

curl http://localhost:18080/v1/images/generations \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.5",
    "prompt": "a futuristic city at sunset",
    "size": "1024x1024"
  }'

`POST /v1/inspect`

Inspect images with a text prompt (custom endpoint).

curl http://localhost:18080/v1/inspect \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Describe what you see",
    "images": [{"image_url": "data:image/png;base64,iVBORw0KGgo..."}]
  }'

`POST /v1/compact`

Compact a conversation into a checkpoint for continuation (custom endpoint). /v1/messages/compact provides the Anthropic-compatible alias.

curl http://localhost:18080/v1/compact \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Summarize our conversation so far."},
      {"role": "assistant", "content": "We discussed the project architecture."}
    ]
  }'

`GET /health`

Health check. Returns auth availability, configured model, Codex config path, and context-window settings.

curl http://localhost:18080/health
# {"status":"ok","auth_available":true,"model":"gpt-5.5","codex_config_path":"/Users/me/.codex/config.toml","context_window":200000,"auto_compact_token_limit":160000}

Codex-Specific Features

These features are extensions beyond the standard OpenAI API, designed for Codex CLI compatibility.

`prompt_cache_key`

Enables prefix-cache stickiness on the Codex backend. When multiple requests share the same prompt_cache_key, the backend can reuse cached KV computations for the shared prefix, reducing latency and cost.

When to use: Set a stable key per conversation or session. All turns within the same session should share one key.

Important: Do not use usage.prompt_tokens_details.cached_tokens (or usage.input_tokens_details.cached_tokens) as a prompt or context-management signal. This server passes through the Codex backend usage payload when it is available, and current Codex OAuth responses may report cached_tokens: 0 even when prompt_cache_key is used. Treat prompt_cache_key as a backend cache-affinity hint, not as a guarantee that cache-hit accounting will be exposed through the API response.

curl http://localhost:18080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.5",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Hello"}
    ],
    "prompt_cache_key": "session-abc-123"
  }'

`reasoning_effort`

Controls how much compute the model spends on reasoning. Valid values: none, minimal, low, medium, high, xhigh.

curl http://localhost:18080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.5",
    "messages": [
      {"role": "system", "content": "Solve this step by step."},
      {"role": "user", "content": "Prove that sqrt(2) is irrational."}
    ],
    "reasoning_effort": "high"
  }'

`previous_response_id`

Chains responses together on the backend. Pass the response ID from a previous turn to maintain server-side conversation state.

curl http://localhost:18080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.5",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Continue from where we left off."}
    ],
    "previous_response_id": "resp_abc123"
  }'

`subagent` / `x-openai-subagent`

Identifies the request as coming from a specific subagent type. Values used by Codex CLI: review, compact, memory_consolidation, collab_spawn.

Can be passed as a body field or HTTP header:

# As body field
curl http://localhost:18080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.5",
    "messages": [{"role": "system", "content": "Review this code."}, {"role": "user", "content": "..."}],
    "subagent": "review"
  }'

# As HTTP header
curl http://localhost:18080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "x-openai-subagent: review" \
  -d '{
    "model": "gpt-5.5",
    "messages": [{"role": "system", "content": "Review this code."}, {"role": "user", "content": "..."}]
  }'

`memgen_request` / `x-openai-memgen-request`

Flags the request as a memory generation/consolidation request. Can be passed as a body field (bool) or HTTP header ("true"/"false"):

curl http://localhost:18080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "x-openai-memgen-request: true" \
  -d '{
    "model": "gpt-5.5",
    "messages": [{"role": "system", "content": "Consolidate memories."}, {"role": "user", "content": "..."}]
  }'

Using with OpenAI SDKs

Point the base URL to your local server:

Python (openai SDK)

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:18080/v1",
    api_key="unused",
)

response = client.chat.completions.create(
    model="gpt-5.5",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"},
    ],
    extra_body={"prompt_cache_key": "my-session"},
)
print(response.choices[0].message.content)

Node.js (openai SDK)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "http://localhost:18080/v1",
  apiKey: "unused",
});

const response = await client.chat.completions.create({
  model: "gpt-5.5",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "Hello!" },
  ],
});
console.log(response.choices[0].message.content);

curl (streaming)

curl -N http://localhost:18080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.5",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Tell me a joke."}
    ],
    "stream": true,
    "prompt_cache_key": "joke-session"
  }'

Using with Claude Code

The /v1/messages endpoint is compatible with Claude Code. Claude Code can send its normal Anthropic model names; responses preserve the client-supplied model name, while backend Codex requests use CODEX_AS_API_MODEL or the model from ~/.codex/config.toml.

# Minimal setup
ANTHROPIC_BASE_URL=http://localhost:18080 \
ANTHROPIC_API_KEY=unused \
claude

# Optional: force the backend Codex model for all Claude Code requests
CODEX_AS_API_MODEL=gpt-5.5 codex-as-api

# In another shell
ANTHROPIC_BASE_URL=http://localhost:18080 \
ANTHROPIC_API_KEY=unused \
claude

Architecture

Client (OpenAI SDK / curl)
    |
    v
HTTP Server (FastAPI / Axum / Express)
    |
    +---> ChatGPTOAuthProvider
            |
            +---> ~/.codex/auth.json (OAuth tokens, auto-refresh)
            +---> https://chatgpt.com/backend-api/codex/responses

The provider handles:

Token loading and automatic refresh on 401
OpenAI Responses API over SSE
prompt_cache_key passthrough for prefix-cache stickiness
Reasoning content streaming (reasoning_content, reasoning)
Tool call streaming
Codex-specific headers (x-openai-subagent, x-openai-memgen-request)
previous_response_id for response chaining
Image generation and inspection
Remote conversation compaction

Tests

Python

pip install -e ".[dev,server]"
pip install httpx
pytest tests/ -v

Rust

cd rust
cargo test

TypeScript

cd ts
npm install
npm test

Release Notes

v0.3.3

Stop forwarding client max_tokens as Codex max_output_tokens, restoring Claude Code compatibility with the Codex OAuth backend.
Add Python, TypeScript, and Rust regression tests for the provider payload.

v0.3.2

Restore immediate Anthropic streaming so Claude Code receives events without waiting for the backend response to finish.
Use conservative local token estimates for /v1/messages/count_tokens; Codex OAuth has no count-only backend endpoint.
Keep real final streaming usage metadata in message_delta.

v0.3.1

Attempted real backend token counting for /v1/messages/count_tokens with max_output_tokens: 0; this is superseded by v0.3.2 because Codex OAuth rejects count-only requests.
Forward converted Anthropic tools, tool choice, stop sequences, and thinking/reasoning settings during token-count requests.
Propagate cumulative Anthropic streaming usage, including cache accounting, server tool use, and service tier metadata when available.
Pass max_output_tokens through provider requests across Python, TypeScript, and Rust.

v0.3.0

Read Codex CLI config from CODEX_HOME / ~/.codex/config.toml across Python, TypeScript, and Rust.
Use the configured Codex backend model while preserving Anthropic client model names in /v1/messages responses.
Expose context_window and auto_compact_token_limit through /health and /v1/messages/count_tokens.
Add Anthropic-compatible /v1/messages/count_tokens and /v1/messages/compact.
Map context-window failures to Anthropic-style 400 invalid_request_error responses and stream error events.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
rust		rust
src/codex_as_api		src/codex_as_api
tests		tests
ts		ts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
RELEASES.md		RELEASES.md
pyproject.toml		pyproject.toml
requirements-server.txt		requirements-server.txt
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

codex-as-api

Features

What it does

Prerequisites

Install & Run

Python

Rust

TypeScript (npm)

Configuration

Supported Models

API Endpoints

POST /v1/chat/completions

POST /v1/messages

POST /v1/messages/count_tokens

POST /v1/messages/compact

POST /v1/images/generations

POST /v1/inspect

POST /v1/compact

GET /health

Codex-Specific Features

prompt_cache_key

reasoning_effort

previous_response_id

subagent / x-openai-subagent

memgen_request / x-openai-memgen-request

Using with OpenAI SDKs

Python (openai SDK)

Node.js (openai SDK)

curl (streaming)

Using with Claude Code

Architecture

Tests

Python

Rust

TypeScript

Release Notes

v0.3.3

v0.3.2

v0.3.1

v0.3.0

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 7

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`POST /v1/chat/completions`

`POST /v1/messages`

`POST /v1/messages/count_tokens`

`POST /v1/messages/compact`

`POST /v1/images/generations`

`POST /v1/inspect`

`POST /v1/compact`

`GET /health`

`prompt_cache_key`

`reasoning_effort`

`previous_response_id`

`subagent` / `x-openai-subagent`

`memgen_request` / `x-openai-memgen-request`

Packages