-
Notifications
You must be signed in to change notification settings - Fork 940
Description
What Happened?
When Portkey normalizes Anthropic model responses to OpenAI schema, prompt_tokens has different semantics depending on which provider is used to access the same Anthropic model.
| Provider | prompt_tokens includes cache tokens? |
total_tokens includes cache tokens? |
|---|---|---|
anthropic (direct API) |
No | Yes |
| vertex-ai (Anthropic models) | No | Yes |
| bedrock (Anthropic models) | Yes | Yes |
For the same Anthropic model (e.g. claude-sonnet-4-20250514), sending the same prompt with cache:
- Anthropic direct / Vertex AI:
prompt_tokens= 100 (non-cached input only),cache_read_input_tokens= 50,completion_tokens= 100,total_tokens= 250 - Bedrock:
prompt_tokens= 150 (includes cached input),cache_read_input_tokens= 50,completion_tokens= 100,total_tokens= 250
Anthropic direct and Vertex AI set prompt_tokens = input_tokens (excludes cache). Bedrock sets prompt_tokens = inputTokens + cacheReadInputTokens + cacheWriteInputTokens (includes cache).
What Should Have Happened?
All three Anthropic access paths should normalize prompt_tokens consistently. The OpenAI convention (which Portkey normalizes to) is that prompt_tokens includes cached tokens, with the breakdown available in prompt_tokens_details.cached_tokens. Anthropic direct and Vertex AI should match Bedrock's behavior.
Relevant Code Snippet
anthropic/chatComplete.ts#L612C9-L627C11
usage: {
prompt_tokens: input_tokens,
completion_tokens: output_tokens,
total_tokens:
input_tokens +
output_tokens +
(cache_creation_input_tokens ?? 0) +
(cache_read_input_tokens ?? 0),
prompt_tokens_details: {
cached_tokens: cache_read_input_tokens ?? 0,
},
...(shouldSendCacheUsage && {
cache_read_input_tokens: cache_read_input_tokens,
cache_creation_input_tokens: cache_creation_input_tokens,
}),
google-vertex-ai/chatComplete.ts#L898C6-L909C9
usage: {
prompt_tokens: input_tokens,
completion_tokens: output_tokens,
total_tokens: totalTokens,
prompt_tokens_details: {
cached_tokens: cache_read_input_tokens,
},
...(shouldSendCacheUsage && {
cache_read_input_tokens: cache_read_input_tokens,
cache_creation_input_tokens: cache_creation_input_tokens,
}),
},
bedrock/chatComplete.ts#L550C4-L565C9
usage: {
prompt_tokens:
response.usage.inputTokens +
cacheReadInputTokens +
cacheWriteInputTokens,
completion_tokens: response.usage.outputTokens,
total_tokens: response.usage.totalTokens, // contains the cache usage as well
prompt_tokens_details: {
cached_tokens: cacheReadInputTokens,
},
// we only want to be sending this for anthropic models and this is not openai compliant
...((cacheReadInputTokens > 0 || cacheWriteInputTokens > 0) && {
cache_read_input_tokens: cacheReadInputTokens,
cache_creation_input_tokens: cacheWriteInputTokens,
}),
},
Your Twitter/LinkedIn
No response