Skip to content

exec resume: input_tokens grow by ~18K per turn regardless of message size (system prompt re-injected on every resume) #16213

@SwaroopMeher

Description

@SwaroopMeher

What version of Codex CLI is running?

codex-cli 0.117.0

What subscription do you have?

ChatGPT Team

Which model were you using?

gpt-5.4

What platform is your computer?

Microsoft Windows NT 10.0.26200.0 x64

What terminal emulator and version are you using (if applicable)?

Windows Terminal (PowerShell)

What issue are you seeing?

Every time I resume a session with codex exec resume --last, the input_tokens in turn.completed grows by ~18,000 tokens — the same as the base cost of Turn 1 — regardless of how short my message is.

After 3 turns with single-sentence messages (≈5 words each, ≈10 tokens), the token count has tripled:

Turn input_tokens cached_input_tokens Message
1 18,080 9,728 "the cat said meow"
2 36,197 27,776 "what did the cat say?"
3 54,341 45,824 "and what was the cat's name?"

Each resume adds the full ~18K base cost again, not just the new message content.

What steps can reproduce the bug?

# Turn 1 — new session
echo "the cat said meow" | codex exec --json -
# turn.completed → input_tokens: 18080

# Turn 2 — resume with a short follow-up
echo "what did the cat say?" | codex exec resume --last --json -
# turn.completed → input_tokens: 36197  (+18,117)

# Turn 3 — another short follow-up
echo "and what was the cat's name?" | codex exec resume --last --json -
# turn.completed → input_tokens: 54341  (+18,144)

Inspecting ~/.codex/sessions/.../rollout-*.jsonl shows the system/developer instructions stored as response_item entries inline in the conversation history. On each exec resume, those stored items are replayed (including the system instructions), and then a fresh copy of the system instructions is prepended for the new turn — so each resume adds one more copy of the full system prompt to the context.

What is the expected behavior?

The system instructions should appear exactly once per API call, regardless of how many turns have been resumed. A 3-turn conversation with short messages should cost roughly:

Turn 1: ~18K tokens  (system prompt + message)
Turn 2: ~18K + ~50 tokens  (system prompt + turn 1 history + new message)
Turn 3: ~18K + ~100 tokens  (system prompt + turn 1+2 history + new message)

Not 3× the base cost after just 3 short messages.

Additional information

The root cause appears to be that the system/developer instructions are serialised into the session JSONL as conversation history items, rather than being stored separately and re-applied as a prefix. When resuming, both the replayed history (which contains the instructions) and the freshly applied instructions are sent to the model.

Related issues: #10403 (AGENTS.md repeated in JSONL), #4047 (previous_response_id removed), #3841 (previous_response_id missing from request struct).

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingcontextIssues related to context management (including compaction)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions