Summary
The /codex, /review, and /ship skill wrappers currently inspect $_CODEX_EXIT only for the timeout sentinel (124). Other non-zero exit codes — including the exit 2 parse error that #1196 (and PR #1209) addresses — are silently swallowed. From the orchestrating agent's perspective, codex appears to produce no output, indistinguishable from a silent stall.
This is a defensive layer separate from the canonical fix in #1209: even after #1209 lands, future Codex CLI breaking changes that change accepted arg combinations will keep masquerading as silent stalls until/unless someone manually inspects the captured stderr.
Repro
The same fixture used in #1196 — pre-#1209 invocation:
codex review "<prompt>" --base main -c 'model_reasoning_effort="xhigh"' --enable web_search_cached < /dev/null
Returns exit 2 with parse error in milliseconds. The skill wrapper currently:
- Captures stderr to a temp file
- Checks
$_CODEX_EXIT for 124
- If not 124, treats as success and returns no diagnostic
- Calling agent sees empty output, hangs waiting for content that will never come
The stderr file at %LOCALAPPDATA%\Temp\codex-err-*.txt contains the actual parse error the entire time, but the wrapper doesn't surface it.
Empirical context
In our project (claude-teams-bot) we've hit this ~5 times across PRs over the past week. Multiple pr-reviewer agents and one build agent independently misdiagnosed this as a Claude Code stall, an Anthropic API issue, or a model rate-limit before someone finally inspected the temp stderr file and saw the parse error. Each misdiagnosis cost ~30-60 minutes of agent time.
CLAUDE.md #54f in our cross-project policy file now reads: "Don't trust silent-stall framing. When an agent reports 'codex stalled silently for N minutes,' the likely answer is NOT a model API stall — it's exit non-zero with empty stdout, swallowed by a wrapper that only handles the timeout exit code."
That's a workaround at the agent-discipline layer, but the wrapper-level fix is cheaper and more general.
Proposed fix
Append to the wrapper's exit-code check:
if [ "$_CODEX_EXIT" = "124" ]; then
echo "Codex timed out after the configured deadline."
elif [ "$_CODEX_EXIT" != "0" ]; then
echo "Codex exited with code $_CODEX_EXIT. Stderr (first 20 lines):"
head -20 "$TMPERR" 2>/dev/null || true
fi
This makes the failure mode self-describing. Even if a future Codex CLI release introduces another arg-shape break, the surfaced stderr tells the calling agent exactly what's wrong.
Stretch — capability probe
Longer-term, replacing the version-string regex in gstack-codex-probe with a capability probe (run codex review --help, parse for known-good arg combinations, dynamically construct the invocation) would make the skill self-correct when Codex changes its arg contract. That's a larger refactor — not blocking on this issue.
Distinct concern. #1209 fixes the specific parse-error trigger by removing the broken arg shape. This issue addresses the wrapper's blindness to ALL non-zero exits, including the next time something similar happens. Both fixes are complementary.
Empirical link
Local workaround documented at docs/operations/codex-cli-workaround.md in our project — comments on #1209 explain the relationship to our empirical recurrence.
Summary
The
/codex,/review, and/shipskill wrappers currently inspect$_CODEX_EXITonly for the timeout sentinel (124). Other non-zero exit codes — including theexit 2parse error that #1196 (and PR #1209) addresses — are silently swallowed. From the orchestrating agent's perspective, codex appears to produce no output, indistinguishable from a silent stall.This is a defensive layer separate from the canonical fix in #1209: even after #1209 lands, future Codex CLI breaking changes that change accepted arg combinations will keep masquerading as silent stalls until/unless someone manually inspects the captured stderr.
Repro
The same fixture used in #1196 — pre-#1209 invocation:
Returns exit 2 with parse error in milliseconds. The skill wrapper currently:
$_CODEX_EXITfor124The stderr file at
%LOCALAPPDATA%\Temp\codex-err-*.txtcontains the actual parse error the entire time, but the wrapper doesn't surface it.Empirical context
In our project (claude-teams-bot) we've hit this ~5 times across PRs over the past week. Multiple pr-reviewer agents and one build agent independently misdiagnosed this as a Claude Code stall, an Anthropic API issue, or a model rate-limit before someone finally inspected the temp stderr file and saw the parse error. Each misdiagnosis cost ~30-60 minutes of agent time.
CLAUDE.md #54f in our cross-project policy file now reads: "Don't trust silent-stall framing. When an agent reports 'codex stalled silently for N minutes,' the likely answer is NOT a model API stall — it's exit non-zero with empty stdout, swallowed by a wrapper that only handles the timeout exit code."
That's a workaround at the agent-discipline layer, but the wrapper-level fix is cheaper and more general.
Proposed fix
Append to the wrapper's exit-code check:
This makes the failure mode self-describing. Even if a future Codex CLI release introduces another arg-shape break, the surfaced stderr tells the calling agent exactly what's wrong.
Stretch — capability probe
Longer-term, replacing the version-string regex in
gstack-codex-probewith a capability probe (runcodex review --help, parse for known-good arg combinations, dynamically construct the invocation) would make the skill self-correct when Codex changes its arg contract. That's a larger refactor — not blocking on this issue.Relation to #1196 / #1209
Distinct concern. #1209 fixes the specific parse-error trigger by removing the broken arg shape. This issue addresses the wrapper's blindness to ALL non-zero exits, including the next time something similar happens. Both fixes are complementary.
Empirical link
Local workaround documented at
docs/operations/codex-cli-workaround.mdin our project — comments on #1209 explain the relationship to our empirical recurrence.