Skip to content

Add durable control-plane state storage#179

Open
jynbil1 wants to merge 1 commit into
XortexAI:mainfrom
jynbil1:codex-durable-control-plane-161-v2
Open

Add durable control-plane state storage#179
jynbil1 wants to merge 1 commit into
XortexAI:mainfrom
jynbil1:codex-durable-control-plane-161-v2

Conversation

@jynbil1
Copy link
Copy Markdown

@jynbil1 jynbil1 commented May 13, 2026

Summary

  • add a Mongo-backed control-plane store for single-use OAuth/MCP tokens, admin sessions, and shared rate-limit counters
  • extend API keys with scopes, expiry, org/project bindings, and production fail-fast behavior for durable storage
  • wire auth/admin/rate-limit paths to durable state and add focused regression coverage

Verification

  • uv run --extra dev python -m pytest tests/unit/test_control_plane_store.py tests/unit/test_database_stores.py tests/api/test_dependencies_and_routes.py
  • uv run --extra dev ruff check src/database/control_plane_store.py src/database/api_key_store.py src/api/dependencies.py src/api/routes/auth.py src/api/routes/admin.py src/api/routes/api_keys.py tests/unit/test_control_plane_store.py tests/unit/test_database_stores.py tests/api/test_dependencies_and_routes.py

/claim #161

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a ControlPlaneStore to manage short-lived state such as admin sessions, OAuth tokens, and rate limits using MongoDB, replacing several in-memory implementations. API keys are enhanced with support for scopes, expiration, and organization/project bindings, including validation logic in the request dependencies. The update also enforces durable storage requirements for production environments and includes comprehensive unit tests for the new functionality. I have no feedback to provide.

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented May 23, 2026

Greptile Summary

This PR adds a MongoDB-backed ControlPlaneStore to durably persist single-use OAuth/MCP tokens, admin sessions, and rate-limit counters that were previously kept in-process memory, and extends API keys with scopes, expiry, and org/project bindings.

  • ControlPlaneStore correctly uses find_one_and_update (atomic) for single-use token consumption, and a TTL index for automatic expiry; admin session migration in admin.py is clean.
  • api_key_store.py's write-failure fallback silently drops scopes, expires_at, org_id, and project_id when recursing to in-memory storage, and the rate-limit MongoDB path uses a non-atomic find_one + update_one that concurrent requests can race past.
  • _mcp_temp_tokens and _oauth_auth_codes in-memory mirrors are now dead code; the expires_at lookup through them is indirect and fragile.
  • scopes are stored and returned in responses but never enforced in the auth pipeline.

Confidence Score: 3/5

Not safe to merge without addressing the rate-limit race condition and the fallback that silently drops key constraints.

The write-failure fallback in create_api_key creates keys without the requested scopes, expiry, or org/project binding — any caller relying on those constraints gets back an unconstrained key with no indication anything went wrong. The MongoDB rate-limiter's find_one + update_one pattern is non-atomic, so concurrent requests can both pass the limit check and overwrite each other's writes, allowing bursts well above the configured quota. Both defects are on paths that run in normal production operation.

src/database/api_key_store.py (fallback drops key constraints) and src/database/control_plane_store.py (non-atomic rate-limit path) need fixes before this ships.

Important Files Changed

Filename Overview
src/database/control_plane_store.py New MongoDB-backed store for single-use tokens, admin sessions, and rate limits; single-use token path is correctly atomic via find_one_and_update, but the rate-limit path's find_one + update_one is non-atomic and can be defeated under concurrency.
src/database/api_key_store.py Adds scopes, expiry, org/project bindings, expiry enforcement, and production fail-fast; the write-failure fallback incorrectly drops the new fields when recursing to in-memory storage.
src/api/dependencies.py Wires org/project context enforcement via _api_key_matches_request_context and replaces the in-process rate limiter with the durable store; enforcement logic is correct but scopes stored on keys are not checked here.
src/api/routes/auth.py Migrates MCP temp tokens and OAuth auth codes to the durable store; validation and single-use semantics are correct, but the local in-memory mirrors (_mcp_temp_tokens, _oauth_auth_codes) are now dead weight with a fragile expires_at lookup.
src/api/routes/admin.py Replaces in-process _admin_sessions dict with durable store; session creation, lookup, and deletion all look correct.
src/api/routes/api_keys.py Extends CRUD endpoints to expose and persist scopes, expiry, org/project bindings; plumbing is correct but the scopes field has no enforcement in the auth layer.
tests/unit/test_control_plane_store.py Covers single-use token single-use semantics, admin session lifecycle, and rate-limit shape; the async rate-limit test lacks a pytest.mark.asyncio decorator but the rest looks solid.
tests/unit/test_database_stores.py Adds coverage for scoped key creation, expiry validation, and production fail-fast on write errors.
tests/api/test_dependencies_and_routes.py New test for bound API key org/project context enforcement; covers both the rejection and acceptance paths.

Sequence Diagram

sequenceDiagram
    participant Client
    participant AuthRoute as auth.py
    participant ControlPlaneStore
    participant MongoDB
    participant APIKeyStore

    Note over AuthRoute,MongoDB: MCP Temp Token Flow
    Client->>AuthRoute: POST /auth/mcp-token (JWT)
    AuthRoute->>ControlPlaneStore: "create_single_use_token(mcp_temp_token, user_id, ttl=10m)"
    ControlPlaneStore->>MongoDB: insert_one(record with token_hash)
    MongoDB-->>ControlPlaneStore: ok
    ControlPlaneStore-->>AuthRoute: "{token, expires_at}"
    AuthRoute-->>Client: MCPTempTokenResponse

    Client->>AuthRoute: POST /auth/mcp-exchange (temp_token)
    AuthRoute->>ControlPlaneStore: consume_single_use_token(mcp_temp_token, token)
    ControlPlaneStore->>MongoDB: "find_one_and_update(exchanged=False to True)"
    MongoDB-->>ControlPlaneStore: record (atomic)
    ControlPlaneStore-->>AuthRoute: user_id
    AuthRoute->>APIKeyStore: create_api_key(user_id)
    APIKeyStore-->>AuthRoute: "{key, scopes, org_id}"
    AuthRoute-->>Client: MCPExchangeResponse(api_key)

    Note over Client,MongoDB: API Request with Bound Key
    Client->>AuthRoute: GET /... Bearer xmem_xxx + X-Org-Id header
    AuthRoute->>APIKeyStore: validate_api_key(token)
    APIKeyStore-->>AuthRoute: "key_doc {org_id, project_id, scopes}"
    AuthRoute->>AuthRoute: _api_key_matches_request_context(key_doc, request)
    alt context mismatch
        AuthRoute-->>Client: 403 Forbidden
    else context ok
        AuthRoute->>ControlPlaneStore: check_rate_limit(identity, max, window)
        ControlPlaneStore->>MongoDB: find_one + update_one (non-atomic)
        MongoDB-->>ControlPlaneStore: allowed/remaining
        ControlPlaneStore-->>AuthRoute: (True, remaining)
        AuthRoute-->>Client: 200 OK
    end
Loading

Comments Outside Diff (1)

  1. src/api/routes/api_keys.py, line 199-202 (link)

    P2 scopes field is stored and returned but never enforced

    API keys can now be created with a scopes list (e.g., ["memory:read"]), but nothing in the auth pipeline reads that list to gate which operations the key may perform. A key scoped to ["memory:read"] can still call write endpoints. If scopes are intended as a future addition, a comment noting the enforcement gap would help; if they are meant to be active now, a scope-check step needs to be added (e.g., in _api_key_matches_request_context or in a dedicated require_scope dependency).

    Fix in Cursor Fix in Codex Fix in Claude Code

Fix All in Cursor Fix All in Codex Fix All in Claude Code

Reviews (1): Last reviewed commit: "Add durable control-plane state" | Re-trigger Greptile

Comment on lines 207 to 208
self._in_memory = True
return self.create_api_key(user_id, name)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 When MongoDB insert_one fails and the code falls back to in-memory storage, the recursive call passes only user_id and name, silently dropping scopes, expires_at, org_id, and project_id. Any caller who created a scoped or expiring key will get back an unscoped, non-expiring key with no error, breaking the constraint the caller intended to enforce.

Suggested change
self._in_memory = True
return self.create_api_key(user_id, name)
self._in_memory = True
return self.create_api_key(
user_id,
name,
scopes=scopes,
expires_at=expires_at,
org_id=org_id,
project_id=project_id,
)

Fix in Cursor Fix in Codex Fix in Claude Code

Comment on lines +218 to +244
record = self.rate_limits.find_one({"identity": identity}) or {}
hits = [hit for hit in record.get("hits", []) if hit > cutoff]
if len(hits) >= max_requests:
self.rate_limits.update_one(
{"identity": identity},
{
"$set": {
"hits": hits,
"expires_at": _now() + timedelta(seconds=window_seconds),
}
},
upsert=True,
)
return False, 0

hits.append(now)
self.rate_limits.update_one(
{"identity": identity},
{
"$set": {
"hits": hits,
"expires_at": _now() + timedelta(seconds=window_seconds),
}
},
upsert=True,
)
return True, max(max_requests - len(hits), 0)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Non-atomic read-modify-write in MongoDB rate limiter

check_rate_limit on the MongoDB path does a find_one then a separate update_one with $set. Two concurrent requests can both read the same hits array (say, [t1, t2], length 2 with max_requests=3), both see they are under the limit, both append their own timestamp, and both call update_one({$set: {hits: [t1, t2, now]}}). The second write overwrites the first, so the counter effectively only advances by 1 instead of 2 — allowing a burst of concurrent calls to exceed the configured limit. The old asyncio.Lock() path was safe for a single process; the durable path needs atomic operations (e.g., $push / $size) or findOneAndUpdate with an aggregation pipeline to avoid this window.

Fix in Cursor Fix in Codex Fix in Claude Code

Comment thread src/api/routes/auth.py
Comment on lines +52 to +53
_mcp_temp_tokens[created["token"]] = {"expires_at": created["expires_at"]}
return created["token"]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Redundant in-memory dicts with fragile access pattern

_mcp_temp_tokens (and _oauth_auth_codes) now hold only a mirror of the expires_at returned by the store — they are never consulted for validation (the pop before consume_single_use_token is a no-op for correctness). The access _mcp_temp_tokens[temp_token]["expires_at"] on line 443 works only because both the store call and this access happen in the same synchronous request, but it is fragile: any future refactor that separates creation from the response lookup, or any multi-worker scenario where the store call and the response are on different workers, would produce a KeyError. The expires_at value is already available from created["expires_at"] returned by the store; use it directly and remove the dict entirely.

Fix in Cursor Fix in Codex Fix in Claude Code

Comment on lines +46 to +59
class ControlPlaneStore:
"""MongoDB-backed store for short-lived control-plane state."""

def __init__(self, uri: str = None, database: str = None) -> None:
self._uri = uri or settings.mongodb_uri
self._database = database or settings.mongodb_database
self._client = None
self._db = None
self.records = None
self.rate_limits = None
self._connected = False
self._in_memory = False
self._try_connect()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Three independent ControlPlaneStore instances, three connection pools

ControlPlaneStore() is instantiated at module level in auth.py, admin.py, and dependencies.py. Each instance calls MongoClient(...) in _try_connect, which opens its own connection pool. These instances are never shared, so the application holds three separate pools to the same database. Consider sharing a single instance (e.g., via a module-level singleton or a FastAPI app.state attribute) to avoid unnecessary resource overhead.

Fix in Cursor Fix in Codex Fix in Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant