Skip to content

feat(fleet): add refresh endpoint#1128

Open
AnobleSCM wants to merge 8 commits into
Soju06:mainfrom
AnobleSCM:feat/fleet-refresh-endpoint
Open

feat(fleet): add refresh endpoint#1128
AnobleSCM wants to merge 8 commits into
Soju06:mainfrom
AnobleSCM:feat/fleet-refresh-endpoint

Conversation

@AnobleSCM

Copy link
Copy Markdown
Contributor

Summary

Add API-key-authenticated fleet summary and refresh endpoints so trusted local dashboards can distinguish a read of codex-lb state from an upstream usage-refresh attempt.

Type of change

  • feat: — new user-facing feature or capability
  • fix: — bug fix (no behavior change beyond the bug)
  • refactor: — internal refactor (no behavior change, no API change)
  • docs: — documentation only
  • chore: / ci: / build: — tooling, CI, packaging
  • test: — test-only change
  • Breaking change

Linked issue: N/A

OpenSpec

  • This PR includes / updates an OpenSpec change
  • Not applicable — bug fix that matches the existing spec
  • Not applicable — docs / CI / chore only
  • This PR touches a codex-faithful path (image pipeline, request/response shape, SSE framing, OAuth flow) and preserves upstream-equivalent behavior

Change directory: openspec/changes/add-fleet-refresh-endpoint/

Changes

  • Add GET /api/fleet/summary with a minimal non-sensitive account/window projection.
  • Add POST /api/fleet/refresh behind the existing always-required usage API key dependency.
  • Reuse UsageUpdater.refresh_accounts outside proxy request selection, skip paused/reauth/deactivated accounts, and invalidate relevant caches only when usage was written.
  • Add integration coverage for auth, response shape, sensitive-field omission, unsafe-state skipping, and route-local refresh behavior.

Test plan

uv run pytest tests/integration/test_fleet_summary_api.py -q
uv run ruff check .
uv run ruff format --check .
uv run ty check
git diff --check

Also attempted:

openspec validate --specs
uv run openspec validate --specs
uv run pytest

openspec is not installed in this local shell (command not found / failed spawn). Full uv run pytest reached 4104 passed / 45 skipped but failed three existing account-recovery / usage-updater tests outside this change; rerunning those exact tests in isolation reproduces the same failures.

Screenshots / output (optional)

N/A — backend/API change.

Checklist

  • Title is in Conventional Commits format (<type>(<scope>)?: <subject>).
  • Linked the related issue / discussion above.
  • Added or updated tests covering the change.
  • Ran the relevant local-ci subset locally.
  • If touching specs: openspec validate --specs passes and /opsx:verify is clean. Blocked locally because the OpenSpec CLI is unavailable.
  • CHANGELOG is not edited by hand.

@AnobleSCM

Copy link
Copy Markdown
Contributor Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 77a27fa3c2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread app/modules/fleet/api.py Outdated
Comment on lines +46 to +50
usage_written = await UsageUpdater(
usage_repo,
context.repository,
additional_usage_repo,
).refresh_accounts(eligible_accounts, latest_primary)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Use a background session for shielded refreshes

When POST /api/fleet/refresh is cancelled or times out while an upstream fetch is in flight, UsageUpdater.refresh_accounts() runs the per-account work inside _USAGE_REFRESH_SINGLEFLIGHT.run() via an asyncio.shielded task. Because the repos passed here all share the request-scoped context.session, FastAPI can close or roll back that session while the shielded task keeps using it, leading to failed refreshes or checked-out connections not returning under disconnects; create and own a background session inside the refresh task instead.

Useful? React with 👍 / 👎.

Comment thread app/modules/fleet/api.py Outdated
router = APIRouter(
prefix="/api/fleet",
tags=["fleet"],
dependencies=[Security(validate_usage_api_key), Depends(set_dashboard_error_format)],

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve API-key scope in fleet handlers

For deployments that issue account-scoped API keys to clients, this router-level auth dependency only validates the token and discards the ApiKeyData, so both fleet handlers operate as an unscoped key: GET /api/fleet/summary lists every account/email and POST /api/fleet/refresh refreshes every eligible account instead of just assignedAccountIds. The existing API-key contract treats non-empty assigned_account_ids as account scope, so these handlers should receive the key data and filter accounts/usage queries before returning or refreshing.

Useful? React with 👍 / 👎.

@AnobleSCM

Copy link
Copy Markdown
Contributor Author

Addressed the two Codex P1 findings on new head 8fbb99a8:

  • POST /api/fleet/refresh now performs the refresh work inside its own get_background_session() instead of using the request-scoped session.
  • Fleet summary/refresh now receive the validated ApiKeyData and honor account_assignment_scope_enabled / assigned_account_ids for both returned accounts and refresh attempts.

Verification after the fix:

uv run pytest tests/integration/test_fleet_summary_api.py -q
uv run ruff check .
uv run ruff format --check .
uv run ty check
git diff --check

@codex review

@Soju06

Soju06 commented Jul 1, 2026

Copy link
Copy Markdown
Owner

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8fbb99a8d3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread app/modules/fleet/api.py Outdated
Comment on lines +39 to +40
accounts = await context.service.list_accounts(account_ids=_visible_account_ids(api_key))
return FleetSummaryResponse(accounts=build_fleet_account_summaries(accounts))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Honor API-key quota visibility for fleet summary

For API keys deliberately configured not to see upstream quota data, this new path bypasses the existing visibility controls: /v1/usage parses api_key.usage_sections and suppresses account_pool_usage when hide_upstream_quota_from_api_keys is enabled, but GET /api/fleet/summary returns per-account remaining percentages and reset times for the same key. When usageSections excludes account_pool_usage/upstream_limits or the global privacy toggle is on, callers can recover the hidden quota details by calling this endpoint instead, so the summary should apply the same gates or require a separate explicit permission.

Useful? React with 👍 / 👎.

@Soju06 Soju06 added the 🤖 codex: needs work [@codex review] raised an issue label Jul 1, 2026
@AnobleSCM

Copy link
Copy Markdown
Contributor Author

Addressed the latest Codex P1 on new head e293ec25:

  • GET /api/fleet/summary now respects API-key quota visibility. If usage_sections does not include account_pool_usage, or hide_upstream_quota_from_api_keys is enabled, the endpoint keeps account identity/status visible but blanks quota-specific fields (remainingPercent, resets, window lengths, lastRefreshAt).
  • Added regression coverage for both per-key usage section hiding and the global privacy toggle.

Verification after the fix:

uv run pytest tests/integration/test_fleet_summary_api.py -q
uv run ruff check .
uv run ruff format --check .
uv run ty check
git diff --check

@codex review

@Soju06 Soju06 removed the 🤖 codex: needs work [@codex review] raised an issue label Jul 1, 2026
@Soju06

Soju06 commented Jul 1, 2026

Copy link
Copy Markdown
Owner

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e293ec254d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread app/modules/fleet/api.py Outdated
Comment on lines +77 to +81
usage_written = await UsageUpdater(
usage_repo,
accounts_repo,
additional_usage_repo,
).refresh_accounts(eligible_accounts, latest_primary)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Run fleet refresh with a task-owned DB session

When a client disconnects/cancels POST /api/fleet/refresh while an upstream usage fetch is in flight, UsageUpdater.refresh_accounts runs the per-account refresh through the module-level singleflight using asyncio.shield, so that task can continue after this endpoint unwinds. Because the repositories passed here all share the surrounding get_background_session() context, its finalizer rolls back/closes the same AsyncSession while the surviving task may still write usage rows, token metadata, or status updates through it, leading to lost refresh work or broken/leaked pooled connections. Use a session owned by the shielded refresh task or ensure the singleflight is awaited/cancelled before leaving the context.

Useful? React with 👍 / 👎.

@AnobleSCM

Copy link
Copy Markdown
Contributor Author

Addressed the current-head Codex P2 on new head db1d9326:

  • POST /api/fleet/refresh now runs the refresh body in a task-owned helper and shields that helper from caller cancellation, so its background DB session remains open until shielded usage work and cache invalidation finish.
  • Added a cancellation regression proving the session does not exit while shielded refresh work is still running after request cancellation.

Verification after the fix:

uv run pytest tests/integration/test_fleet_summary_api.py -q
uv run ruff check .
uv run ruff format --check .
uv run ty check
git diff --check

@codex review

@Soju06 Soju06 added the 🤖 codex: needs work [@codex review] raised an issue label Jul 1, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: db1d932613

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread app/modules/fleet/mappers.py Outdated
account_id=account.account_id,
display_name=account.display_name,
email=account.email,
status=account.status,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Hide usage-derived status when quota visibility is disabled

When include_usage is false because the API key lacks account_pool_usage or hide_upstream_quota_from_api_keys is enabled, this still forwards AccountSummary.status. That status is computed by the accounts summary mapper from the latest usage rows, so an otherwise active account with primary or secondary usage at 100% is serialized as rate_limited/quota_exceeded while the window fields are nulled, leaking the hidden quota state through the new fleet endpoint.

Useful? React with 👍 / 👎.

@Soju06 Soju06 removed the 🤖 codex: needs work [@codex review] raised an issue label Jul 1, 2026
@AnobleSCM

Copy link
Copy Markdown
Contributor Author

Addressed both current-head Codex P2 findings on new head 50f70404:

  • Fleet summary no longer leaks quota-derived status when usage visibility is disabled. Hidden-usage responses now use persisted account status from the account rows and keep usage windows/source timestamps blank.
  • Fleet refresh now opts into a UsageUpdater.refresh_accounts(..., own_singleflight_sessions=True) path where the singleflight refresh factory owns its own background DB session. This keeps the DB session alive with the shielded task itself, even if the caller awaiting the refresh is cancelled.

Regression coverage added/updated:

  • Hidden-usage fleet summary uses an exhausted usage row but still returns persisted active status.
  • Fleet endpoint asserts it opts into owned singleflight sessions.
  • Usage updater unit test proves the owned-session singleflight task keeps its session open after caller cancellation.

Verification after the fix:

uv run pytest tests/integration/test_fleet_summary_api.py -q
uv run pytest tests/unit/test_usage_updater.py::test_refresh_accounts_owned_singleflight_session_outlives_caller_cancellation -q
uv run ruff check .
uv run ruff format --check .
uv run ty check
git diff --check

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 50f704041c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 309 to 312
result = await _USAGE_REFRESH_SINGLEFLIGHT.run(
account.id,
lambda account=account: self._refresh_account_if_stale(
account,
usage_account_id=account.chatgpt_account_id,
interval_seconds=interval,
),
refresh_factory,
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid joining non-owned singleflight refreshes

When a fleet refresh overlaps an already-running refresh for the same account from paths that call refresh_accounts without own_singleflight_sessions, this call keeps the default join_existing=True, so the owned-session factory above is never executed. If that original waiter is cancelled and its DB context exits, the fleet refresh is still awaiting work tied to the original session, which reintroduces the closed-session failure this flag is meant to avoid; the owned path should avoid joining non-owned in-flight work or key it separately.

Useful? React with 👍 / 👎.

@Soju06 Soju06 added the 🤖 codex: needs work [@codex review] raised an issue label Jul 1, 2026
@AnobleSCM

Copy link
Copy Markdown
Contributor Author

Addressed the current-head Codex P2 on new head 94270c59:

  • Owned-session usage refreshes now use a separate singleflight key (owned-session:<account_id>), so fleet refresh cannot join an in-flight non-owned refresh that may be tied to a request-scoped/session-closing caller.
  • Expanded the owned-session unit regression to start a non-owned singleflight for the same account first, then prove the owned-session refresh still starts its own factory and keeps its own session open after caller cancellation.

Verification after the fix:

uv run pytest tests/unit/test_usage_updater.py::test_refresh_accounts_owned_singleflight_session_outlives_caller_cancellation -q
uv run pytest tests/integration/test_fleet_summary_api.py -q
uv run ruff check .
uv run ruff format --check .
uv run ty check
git diff --check

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 94270c59dd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread app/modules/usage/updater.py Outdated


def _usage_refresh_singleflight_key(account_id: str, *, own_singleflight_session: bool = False) -> str:
return f"owned-session:{account_id}" if own_singleflight_session else account_id

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use collision-free keys for owned refreshes

If an imported/upstream account id starts with owned-session:, the normal singleflight key for that account is identical to the owned-session key for the account id after the prefix (for example normal owned-session:abc vs owned abc). In that case a fleet refresh can still join an unrelated in-flight request-scoped refresh, leaving the intended account unrefreshed and reintroducing the session-lifetime coupling this change is trying to avoid; use a structured key or namespace both modes unambiguously.

Useful? React with 👍 / 👎.

@AnobleSCM

Copy link
Copy Markdown
Contributor Author

Addressed the current-head Codex P2 on new head 104aa134:

  • Owned-session refresh keys are now structured tuples (("owned-session", account_id)) instead of strings, so they cannot collide with normal account-id keys even when an account id starts with owned-session:.
  • Expanded the owned-session regression to keep both a normal non-owned key and a prefixed non-owned key in flight before proving the owned-session refresh starts its own factory/session.

Verification after the fix:

uv run pytest tests/unit/test_usage_updater.py::test_refresh_accounts_owned_singleflight_session_outlives_caller_cancellation -q
uv run pytest tests/integration/test_fleet_summary_api.py -q
uv run ruff check .
uv run ruff format --check .
uv run ty check
git diff --check

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. 🎉

Reviewed commit: 104aa13496

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@AnobleSCM AnobleSCM marked this pull request as ready for review July 2, 2026 01:54

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 104aa13496

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread app/modules/fleet/api.py


async def _can_view_fleet_usage(api_key: ApiKeyData) -> bool:
if "account_pool_usage" not in _usage_sections(api_key.usage_sections):

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Gate reset/window fields on upstream_limits permission

Fresh evidence after the quota-visibility fix is that this predicate still treats account_pool_usage alone as permission to return the entire fleet summary. /v1/usage gates upstream_limits separately, but the fleet mapper returns per-account resetAt and windowMinutes; a key configured with usage_sections="account_pool_usage" and no upstream_limits can recover reset/window quota details that are hidden from /v1/usage. Require upstream_limits for those fields or null them independently.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🤖 codex: needs work [@codex review] raised an issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants