Skip to content

Eunho-J/codex-as-api

Repository files navigation

codex-as-api

GitHub Release PyPI npm License

Use ChatGPT / Codex OAuth as a local OpenAI-compatible API server.

Features

  • OpenAI & Anthropic compatiblePOST /v1/chat/completions and POST /v1/messages endpoints
  • Claude Code ready — use Codex models directly from Claude Code CLI
  • Streaming — full SSE streaming for both OpenAI and Anthropic protocols
  • Tool calling — function calls, tool results, and parallel tool calls
  • Image support — generation, inspection, and base64 image passthrough (including tool result images)
  • Reasoning — configurable reasoning effort with streaming thinking content
  • Codex featuresprompt_cache_key, previous_response_id, subagent headers, remote compaction
  • Codex config aware — reads CODEX_HOME / ~/.codex/config.toml for model and context-window settings
  • Token estimate & compaction helpers — Anthropic-compatible /v1/messages/count_tokens and /v1/messages/compact
  • Auto auth — reads ~/.codex/auth.json and auto-refreshes OAuth tokens
  • 3 implementations — Python, TypeScript (npm), and Rust — identical behavior

What it does

Runs a lightweight HTTP server on localhost that translates standard OpenAI API calls into authenticated requests against the ChatGPT / Codex backend using your existing ~/.codex/auth.json OAuth credentials.

Python, Rust, and TypeScript (npm) implementations are provided — identical functionality, same endpoints, same behavior.

Prerequisites

Install the official Codex CLI and log in so that ~/.codex/auth.json exists:

npm install -g @openai/codex
codex login

The server reads that file to obtain and refresh ChatGPT OAuth tokens automatically.

Install & Run

Python

Install from PyPI:

pip install codex-as-api
codex-as-api

Or with uv:

uv pip install codex-as-api
codex-as-api

Or from source:

git clone https://github.com/Eunho-J/codex-as-api.git
cd codex-as-api
pip install -e ".[server]"
codex-as-api

Rust

cd rust
cargo build --release
./target/release/codex-as-api

TypeScript (npm)

Install from npm and run:

npm install -g codex-as-api
codex-as-api

Or use npx without installing:

npx codex-as-api

Or from source:

cd ts
npm install
npm run build
node dist/cli.js

Can also be used as a library:

import { ChatGPTOAuthProvider, createApp } from "codex-as-api";

// Use the provider directly
const provider = new ChatGPTOAuthProvider({ model: "gpt-5.5" });
const response = await provider.chat(
  [
    { role: "system", content: "You are helpful." },
    { role: "user", content: "Hello!" },
  ],
);
console.log(response.content);

// Or create an Express app
const app = createApp();
app.listen(18080);

All versions bind to 127.0.0.1:18080 (localhost only) by default.

Configuration

Environment variables (Python, Rust, and TypeScript):

Variable Default Description
CODEX_AS_API_HOST 127.0.0.1 Bind address
CODEX_AS_API_PORT 18080 Listen port
CODEX_AS_API_MODEL ~/.codex/config.toml model, else gpt-5.5 Model identifier passed to Codex backend
CODEX_AS_API_AUTH_PATH ~/.codex/auth.json Path to OAuth credentials file
CODEX_HOME ~/.codex Codex home directory used for auth.json and config.toml discovery

The server also reads root-level Codex CLI settings from ~/.codex/config.toml:

model = "gpt-5.5"
model_context_window = 200000
model_auto_compact_token_limit = 160000

CODEX_AS_API_MODEL overrides the Codex config model. The context settings are exposed from /health and returned by Anthropic token-count responses.

Supported Models

Model Description
gpt-5.5 Frontier model for complex coding, research, and real-world work
gpt-5.4 Strong model for everyday coding
gpt-5.4-mini Small, fast, and cost-efficient model for simpler coding tasks
gpt-5.3-codex Coding-optimized model
gpt-5.3-codex-spark Ultra-fast coding model
gpt-5.2 Previous generation model

To use a different port:

CODEX_AS_API_PORT=9000 codex-as-api

To expose on all interfaces (e.g. for remote access):

CODEX_AS_API_HOST=0.0.0.0 codex-as-api

API Endpoints

POST /v1/chat/completions

Standard OpenAI chat completions. Supports streaming (stream: true) and non-streaming.

curl http://localhost:18080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.5",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Hello"}
    ]
  }'

Streaming:

curl http://localhost:18080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.5",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Hello"}
    ],
    "stream": true
  }'

With tools:

curl http://localhost:18080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.5",
    "messages": [
      {"role": "system", "content": "You have access to tools."},
      {"role": "user", "content": "What is the weather in Seoul?"}
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Get current weather",
          "parameters": {
            "type": "object",
            "properties": {"city": {"type": "string"}},
            "required": ["city"]
          }
        }
      }
    ]
  }'

POST /v1/messages

Anthropic Messages API compatible endpoint. Supports streaming (stream: true) and non-streaming. The client's model name is reflected in responses, but the server always uses the configured CODEX_AS_API_MODEL for the backend call.

curl http://localhost:18080/v1/messages \
  -H "Content-Type: application/json" \
  -H "x-api-key: unused" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-sonnet-4-6",
    "max_tokens": 200,
    "system": "You are a helpful assistant.",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ]
  }'

Streaming:

curl -N http://localhost:18080/v1/messages \
  -H "Content-Type: application/json" \
  -H "x-api-key: unused" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-sonnet-4-6",
    "max_tokens": 200,
    "stream": true,
    "system": "You are a helpful assistant.",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ]
  }'

POST /v1/messages/count_tokens

Anthropic-compatible token counting helper. Codex OAuth does not expose a count-only endpoint equivalent to Anthropic's native API, so this route returns a conservative local estimate plus the configured context-window metadata. The estimate uses UTF-8 byte length as an upper bound for GPT/Codex BPE text tokens, then adds protocol overhead for roles, message boundaries, tools, raw request metadata, and images.

curl http://localhost:18080/v1/messages/count_tokens \
  -H "Content-Type: application/json" \
  -H "x-api-key: unused" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-sonnet-4-6",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

POST /v1/messages/compact

Anthropic-compatible alias for remote conversation compaction. Accepts Anthropic Messages-shaped bodies and returns compacted checkpoint content.

POST /v1/images/generations

Generate images via the Codex image generation tool.

curl http://localhost:18080/v1/images/generations \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.5",
    "prompt": "a futuristic city at sunset",
    "size": "1024x1024"
  }'

POST /v1/inspect

Inspect images with a text prompt (custom endpoint).

curl http://localhost:18080/v1/inspect \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Describe what you see",
    "images": [{"image_url": "data:image/png;base64,iVBORw0KGgo..."}]
  }'

POST /v1/compact

Compact a conversation into a checkpoint for continuation (custom endpoint). /v1/messages/compact provides the Anthropic-compatible alias.

curl http://localhost:18080/v1/compact \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Summarize our conversation so far."},
      {"role": "assistant", "content": "We discussed the project architecture."}
    ]
  }'

GET /health

Health check. Returns auth availability, configured model, Codex config path, and context-window settings.

curl http://localhost:18080/health
# {"status":"ok","auth_available":true,"model":"gpt-5.5","codex_config_path":"/Users/me/.codex/config.toml","context_window":200000,"auto_compact_token_limit":160000}

Codex-Specific Features

These features are extensions beyond the standard OpenAI API, designed for Codex CLI compatibility.

prompt_cache_key

Enables prefix-cache stickiness on the Codex backend. When multiple requests share the same prompt_cache_key, the backend can reuse cached KV computations for the shared prefix, reducing latency and cost.

When to use: Set a stable key per conversation or session. All turns within the same session should share one key.

Important: Do not use usage.prompt_tokens_details.cached_tokens (or usage.input_tokens_details.cached_tokens) as a prompt or context-management signal. This server passes through the Codex backend usage payload when it is available, and current Codex OAuth responses may report cached_tokens: 0 even when prompt_cache_key is used. Treat prompt_cache_key as a backend cache-affinity hint, not as a guarantee that cache-hit accounting will be exposed through the API response.

curl http://localhost:18080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.5",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Hello"}
    ],
    "prompt_cache_key": "session-abc-123"
  }'

reasoning_effort

Controls how much compute the model spends on reasoning. Valid values: none, minimal, low, medium, high, xhigh.

curl http://localhost:18080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.5",
    "messages": [
      {"role": "system", "content": "Solve this step by step."},
      {"role": "user", "content": "Prove that sqrt(2) is irrational."}
    ],
    "reasoning_effort": "high"
  }'

previous_response_id

Chains responses together on the backend. Pass the response ID from a previous turn to maintain server-side conversation state.

curl http://localhost:18080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.5",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Continue from where we left off."}
    ],
    "previous_response_id": "resp_abc123"
  }'

subagent / x-openai-subagent

Identifies the request as coming from a specific subagent type. Values used by Codex CLI: review, compact, memory_consolidation, collab_spawn.

Can be passed as a body field or HTTP header:

# As body field
curl http://localhost:18080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.5",
    "messages": [{"role": "system", "content": "Review this code."}, {"role": "user", "content": "..."}],
    "subagent": "review"
  }'

# As HTTP header
curl http://localhost:18080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "x-openai-subagent: review" \
  -d '{
    "model": "gpt-5.5",
    "messages": [{"role": "system", "content": "Review this code."}, {"role": "user", "content": "..."}]
  }'

memgen_request / x-openai-memgen-request

Flags the request as a memory generation/consolidation request. Can be passed as a body field (bool) or HTTP header ("true"/"false"):

curl http://localhost:18080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "x-openai-memgen-request: true" \
  -d '{
    "model": "gpt-5.5",
    "messages": [{"role": "system", "content": "Consolidate memories."}, {"role": "user", "content": "..."}]
  }'

Using with OpenAI SDKs

Point the base URL to your local server:

Python (openai SDK)

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:18080/v1",
    api_key="unused",
)

response = client.chat.completions.create(
    model="gpt-5.5",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"},
    ],
    extra_body={"prompt_cache_key": "my-session"},
)
print(response.choices[0].message.content)

Node.js (openai SDK)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "http://localhost:18080/v1",
  apiKey: "unused",
});

const response = await client.chat.completions.create({
  model: "gpt-5.5",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "Hello!" },
  ],
});
console.log(response.choices[0].message.content);

curl (streaming)

curl -N http://localhost:18080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.5",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Tell me a joke."}
    ],
    "stream": true,
    "prompt_cache_key": "joke-session"
  }'

Using with Claude Code

The /v1/messages endpoint is compatible with Claude Code. Claude Code can send its normal Anthropic model names; responses preserve the client-supplied model name, while backend Codex requests use CODEX_AS_API_MODEL or the model from ~/.codex/config.toml.

# Minimal setup
ANTHROPIC_BASE_URL=http://localhost:18080 \
ANTHROPIC_API_KEY=unused \
claude
# Optional: force the backend Codex model for all Claude Code requests
CODEX_AS_API_MODEL=gpt-5.5 codex-as-api

# In another shell
ANTHROPIC_BASE_URL=http://localhost:18080 \
ANTHROPIC_API_KEY=unused \
claude

Architecture

Client (OpenAI SDK / curl)
    |
    v
HTTP Server (FastAPI / Axum / Express)
    |
    +---> ChatGPTOAuthProvider
            |
            +---> ~/.codex/auth.json (OAuth tokens, auto-refresh)
            +---> https://chatgpt.com/backend-api/codex/responses

The provider handles:

  • Token loading and automatic refresh on 401
  • OpenAI Responses API over SSE
  • prompt_cache_key passthrough for prefix-cache stickiness
  • Reasoning content streaming (reasoning_content, reasoning)
  • Tool call streaming
  • Codex-specific headers (x-openai-subagent, x-openai-memgen-request)
  • previous_response_id for response chaining
  • Image generation and inspection
  • Remote conversation compaction

Tests

Python

pip install -e ".[dev,server]"
pip install httpx
pytest tests/ -v

Rust

cd rust
cargo test

TypeScript

cd ts
npm install
npm test

Release Notes

v0.3.3

  • Stop forwarding client max_tokens as Codex max_output_tokens, restoring Claude Code compatibility with the Codex OAuth backend.
  • Add Python, TypeScript, and Rust regression tests for the provider payload.

v0.3.2

  • Restore immediate Anthropic streaming so Claude Code receives events without waiting for the backend response to finish.
  • Use conservative local token estimates for /v1/messages/count_tokens; Codex OAuth has no count-only backend endpoint.
  • Keep real final streaming usage metadata in message_delta.

v0.3.1

  • Attempted real backend token counting for /v1/messages/count_tokens with max_output_tokens: 0; this is superseded by v0.3.2 because Codex OAuth rejects count-only requests.
  • Forward converted Anthropic tools, tool choice, stop sequences, and thinking/reasoning settings during token-count requests.
  • Propagate cumulative Anthropic streaming usage, including cache accounting, server tool use, and service tier metadata when available.
  • Pass max_output_tokens through provider requests across Python, TypeScript, and Rust.

v0.3.0

  • Read Codex CLI config from CODEX_HOME / ~/.codex/config.toml across Python, TypeScript, and Rust.
  • Use the configured Codex backend model while preserving Anthropic client model names in /v1/messages responses.
  • Expose context_window and auto_compact_token_limit through /health and /v1/messages/count_tokens.
  • Add Anthropic-compatible /v1/messages/count_tokens and /v1/messages/compact.
  • Map context-window failures to Anthropic-style 400 invalid_request_error responses and stream error events.

License

Apache License 2.0 — derived from OpenAI Codex CLI (Apache-2.0, Copyright 2025 OpenAI).

About

Local OpenAI & Anthropic compatible API server backed by ChatGPT/Codex OAuth credentials. Python, TypeScript, and Rust.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors