Skip to content

[codex skill] Surface non-zero exit codes from codex wrappers (currently silent on parse errors) #1327

@habassa5

Description

@habassa5

Summary

The /codex, /review, and /ship skill wrappers currently inspect $_CODEX_EXIT only for the timeout sentinel (124). Other non-zero exit codes — including the exit 2 parse error that #1196 (and PR #1209) addresses — are silently swallowed. From the orchestrating agent's perspective, codex appears to produce no output, indistinguishable from a silent stall.

This is a defensive layer separate from the canonical fix in #1209: even after #1209 lands, future Codex CLI breaking changes that change accepted arg combinations will keep masquerading as silent stalls until/unless someone manually inspects the captured stderr.

Repro

The same fixture used in #1196 — pre-#1209 invocation:

codex review "<prompt>" --base main -c 'model_reasoning_effort="xhigh"' --enable web_search_cached < /dev/null

Returns exit 2 with parse error in milliseconds. The skill wrapper currently:

  1. Captures stderr to a temp file
  2. Checks $_CODEX_EXIT for 124
  3. If not 124, treats as success and returns no diagnostic
  4. Calling agent sees empty output, hangs waiting for content that will never come

The stderr file at %LOCALAPPDATA%\Temp\codex-err-*.txt contains the actual parse error the entire time, but the wrapper doesn't surface it.

Empirical context

In our project (claude-teams-bot) we've hit this ~5 times across PRs over the past week. Multiple pr-reviewer agents and one build agent independently misdiagnosed this as a Claude Code stall, an Anthropic API issue, or a model rate-limit before someone finally inspected the temp stderr file and saw the parse error. Each misdiagnosis cost ~30-60 minutes of agent time.

CLAUDE.md #54f in our cross-project policy file now reads: "Don't trust silent-stall framing. When an agent reports 'codex stalled silently for N minutes,' the likely answer is NOT a model API stall — it's exit non-zero with empty stdout, swallowed by a wrapper that only handles the timeout exit code."

That's a workaround at the agent-discipline layer, but the wrapper-level fix is cheaper and more general.

Proposed fix

Append to the wrapper's exit-code check:

if [ "$_CODEX_EXIT" = "124" ]; then
  echo "Codex timed out after the configured deadline."
elif [ "$_CODEX_EXIT" != "0" ]; then
  echo "Codex exited with code $_CODEX_EXIT. Stderr (first 20 lines):"
  head -20 "$TMPERR" 2>/dev/null || true
fi

This makes the failure mode self-describing. Even if a future Codex CLI release introduces another arg-shape break, the surfaced stderr tells the calling agent exactly what's wrong.

Stretch — capability probe

Longer-term, replacing the version-string regex in gstack-codex-probe with a capability probe (run codex review --help, parse for known-good arg combinations, dynamically construct the invocation) would make the skill self-correct when Codex changes its arg contract. That's a larger refactor — not blocking on this issue.

Relation to #1196 / #1209

Distinct concern. #1209 fixes the specific parse-error trigger by removing the broken arg shape. This issue addresses the wrapper's blindness to ALL non-zero exits, including the next time something similar happens. Both fixes are complementary.

Empirical link

Local workaround documented at docs/operations/codex-cli-workaround.md in our project — comments on #1209 explain the relationship to our empirical recurrence.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions