fix(litellm): codex-cli is a sidecar (containers:), not an init container#366
Merged
Conversation
ChatGPT binds OAuth sessions to the IP/device that completed device-auth.
Laptop-device-auth'd tokens uploaded to the cluster get token_invalidated
on first use (confirmed via direct probe today). The cloud-codex-cody pod
already proved the fix: device-auth FROM inside the cluster produces
sessions ChatGPT keeps alive across cluster usage.
This brings that fix one layer up so Nova/Pixel and any future codex
agent share the same auth surface (LiteLLM), rather than each agent
needing its own pod with its own codex login.
What changes:
1. New `codex-cli` sidecar on the LiteLLM pod. Installs codex CLI on
first boot, idles. Operator runs:
kubectl exec -it deploy/litellm -c codex-cli -- /scripts/auth-login.sh 1
Completes device-auth in browser; resulting auth.json lands on the
shared chatgpt-auth volume as /chatgpt-auth/auth-1.json. Repeat for
accounts 2 and 3.
2. codex-auth-rotator now PREFERS pod-side /chatgpt-auth/auth-N.json
files when present, and only falls back to env-var-fed tokens
(laptop-bound, dead) when no pod-side files exist. Keeps the existing
rotation cadence + 429 signal handling unchanged.
3. chatgpt-auth volume can be a PVC (values: litellm.chatgptAuth.
persistence.enabled). Required for the cluster-bound flow — emptyDir
loses tokens on every pod restart. Dev opts in; defaults stay off
so OSS deployments aren't surprised.
4. Adds `strategy.type: Recreate` to the LiteLLM Deployment when the
PVC is enabled — RWO single-writer can't hand off cleanly with
RollingUpdate.
After this lands + operator does device-auth × N from inside the
codex-cli sidecar, all dev LLM traffic (openclaw moltbot via LiteLLM
chatgpt/ bridge, and any future codex CLI agents pointed at LiteLLM)
uses cluster-bound sessions. Nova/Pixel come back to life without
another laptop device-auth round.
Follow-up: switch cloud-codex-cody to point codex CLI at LiteLLM
(model_provider override + virtual key) so Cody routes through the
same auth surface instead of needing her own /state/.codex/auth.json.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…iner In PR #365 the codex-cli block landed in the initContainers list by mistake, which made the pod stuck Init:1/2 — codex-cli's sleep loop never exits, so the pod never progressed to Running, and helm-upgrade hit the 10m timeout. Move codex-cli into containers: (sidecar position, after the codex-auth-rotator). LiteLLM main container can now reach Ready while codex-cli idles in parallel waiting for operator exec. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
samxu01
added a commit
that referenced
this pull request
May 15, 2026
…ners (#367) PR #366 was supposed to move codex-cli from initContainers to containers, but the awk move only added the new entry and didn't delete the old one. Result: spec had codex-cli in both lists, k8s rejected with "spec.template.spec.initContainers[1].name: Duplicate value". Strip the leftover container + its comment block. Final structure: containers = [litellm, codex-auth-rotator, codex-cli], initContainers = [codex-auth-seed]. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
samxu01
added a commit
that referenced
this pull request
May 15, 2026
* fix(litellm): remove duplicate codex-cli container left in initContainers PR #366 was supposed to move codex-cli from initContainers to containers, but the awk move only added the new entry and didn't delete the old one. Result: spec had codex-cli in both lists, k8s rejected with "spec.template.spec.initContainers[1].name: Duplicate value". Strip the leftover container + its comment block. Final structure: containers = [litellm, codex-auth-rotator, codex-cli], initContainers = [codex-auth-seed]. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(rotator): read codex CLI's nested auth.json shape codex CLI 0.125 writes auth.json as {tokens: {access_token, refresh_token, id_token}, auth_mode, OPENAI_API_KEY, last_refresh}. The rotator's _read_pod_auth_file only looked at top-level access_token, missed the nested shape, returned None, and fell back to env-var candidates (which are the laptop-bound dead tokens we're trying to escape). Read either shape — nested wins, flat is the legacy rotator-written fallback. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Hotfix for #365. The codex-cli block landed in `initContainers:` instead of `containers:` — its sleep-loop never exits, so the pod was stuck `Init:1/2` for 15min and helm-upgrade timed out.
Move codex-cli into the sidecar position so LiteLLM main container can reach Ready while codex-cli idles in parallel.
Test plan
🤖 Generated with Claude Code