fix: Fix Anthropic cumulative usage metrics inflation #5276

jtalmi · 2025-11-02T22:46:04Z

Fix Anthropic cumulative usage metrics inflation

Summary

Fix bug where Anthropic's cumulative token usage metrics were being accumulated instead of replaced during streaming with tool calls
Add _is_cumulative_usage flag to Anthropic streaming responses to distinguish cumulative metrics from incremental metrics
Update base model logic to replace (not accumulate) metrics when the cumulative flag is present
Add comprehensive unit tests that verify the fix and ensure no regression for other providers (OpenAI, Gemini, etc.)

Root Cause: During tool calling, Anthropic returns cumulative usage totals across multiple streaming events (e.g., Event 1: 63k tokens, Event 2: 64k tokens including Event 1, Event 3: 65k tokens including Events 1+2). Agno was accumulating these values (63k + 64k + 65k = 192k) instead of using the final cumulative total (65k).

Fix: Mark Anthropic usage metrics as cumulative with _is_cumulative_usage = True flag, then check this flag in base model and use assignment (=) instead of accumulation (+=) for cumulative metrics.

Type of change

Checklist

Code complies with style guidelines
Self-review completed
Documentation updated (comments, docstrings)
Examples and guides: Relevant cookbook examples have been included or updated (if applicable)
Tested in clean environment
Tests added/updated (if applicable)
Ran format/validation scripts (./scripts/format.sh and ./scripts/validate.sh)

Files Modified

libs/agno/agno/models/anthropic/claude.py (lines 586-589)
- Added _is_cumulative_usage = True flag when parsing streaming usage metrics
libs/agno/agno/models/base.py (lines 810-817)
- Modified _populate_assistant_message to check for cumulative usage flag
- Use assignment (=) for cumulative metrics, accumulation (+=) for incremental metrics
libs/agno/tests/unit/models/test_anthropic_cumulative_usage.py (new file)
- test_anthropic_cumulative_usage_not_inflated: Verifies Anthropic cumulative usage is replaced correctly
- test_non_cumulative_usage_still_accumulates: Ensures OpenAI/Gemini behavior remains unchanged

Test Results

$ python -m pytest libs/agno/tests/unit/models/ -v
======================== 19 passed, 1 warning in 2.89s =========================

All existing model tests pass, including:

7 AWS Bedrock streaming tests
4 OpenAI client persistence tests
4 OpenAI response ID handling tests
2 function call show result tests
2 new Anthropic cumulative usage tests ✨

Additional Notes

Before Fix (Bug)

Event 1: 63,325 tokens → metrics = 63,325
Event 2: 64,197 tokens (cumulative) → metrics = 127,522 ❌ (accumulated)
Event 3: 64,911 tokens (cumulative) → metrics = 192,433 ❌ (inflated 3x!)

After Fix (Correct)

Event 1: 63,325 tokens → metrics = 63,325
Event 2: 64,197 tokens (cumulative) → metrics = 64,197 ✅ (replaced)
Event 3: 64,911 tokens (cumulative) → metrics = 64,911 ✅ (correct)

Impact

No breaking changes - Only fixes bug for Anthropic
No impact on other providers - OpenAI, Gemini, etc. continue to work as before
Fully backwards compatible - Uses getattr() with default False for missing flag
Works for all Anthropic endpoints - Native Anthropic, AWS Bedrock Claude, VertexAI Claude

References

Anthropic API Docs: https://docs.anthropic.com/en/api/messages
Related to tool calling with streaming: https://docs.anthropic.com/en/docs/build-with-claude/tool-use

- Add _is_cumulative_usage bool field to ModelResponse dataclass - Fixes mypy error: ModelResponse has no attribute _is_cumulative_usage - Defaults to False for backward compatibility

a

ed29edc

jtalmi requested a review from a team as a code owner November 2, 2025 22:46

jtalmi added 2 commits November 2, 2025 17:46

a

de79b0d

Add _is_cumulative_usage field to ModelResponse for type safety

7a12508

- Add _is_cumulative_usage bool field to ModelResponse dataclass - Fixes mypy error: ModelResponse has no attribute _is_cumulative_usage - Defaults to False for backward compatibility

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Fix Anthropic cumulative usage metrics inflation #5276

fix: Fix Anthropic cumulative usage metrics inflation #5276

jtalmi commented Nov 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fix: Fix Anthropic cumulative usage metrics inflation #5276

Are you sure you want to change the base?

fix: Fix Anthropic cumulative usage metrics inflation #5276

Conversation

jtalmi commented Nov 2, 2025

Fix Anthropic cumulative usage metrics inflation

Summary

Type of change

Checklist

Files Modified

Test Results

Additional Notes

Before Fix (Bug)

After Fix (Correct)

Impact

References

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant