Skip to content

bug: Inconsistent prompt_tokens normalization across Anthropic providers (direct API vs Vertex AI vs Bedrock) #1564

@cygkam

Description

@cygkam

What Happened?

When Portkey normalizes Anthropic model responses to OpenAI schema, prompt_tokens has different semantics depending on which provider is used to access the same Anthropic model.

Provider prompt_tokens includes cache tokens? total_tokens includes cache tokens?
anthropic (direct API) No Yes
vertex-ai (Anthropic models) No Yes
bedrock (Anthropic models) Yes Yes

For the same Anthropic model (e.g. claude-sonnet-4-20250514), sending the same prompt with cache:

  • Anthropic direct / Vertex AI: prompt_tokens = 100 (non-cached input only), cache_read_input_tokens = 50, completion_tokens = 100, total_tokens = 250
  • Bedrock: prompt_tokens = 150 (includes cached input), cache_read_input_tokens = 50, completion_tokens = 100, total_tokens = 250

Anthropic direct and Vertex AI set prompt_tokens = input_tokens (excludes cache). Bedrock sets prompt_tokens = inputTokens + cacheReadInputTokens + cacheWriteInputTokens (includes cache).

What Should Have Happened?

All three Anthropic access paths should normalize prompt_tokens consistently. The OpenAI convention (which Portkey normalizes to) is that prompt_tokens includes cached tokens, with the breakdown available in prompt_tokens_details.cached_tokens. Anthropic direct and Vertex AI should match Bedrock's behavior.

Relevant Code Snippet

anthropic/chatComplete.ts#L612C9-L627C11

        usage: {
          prompt_tokens: input_tokens,
          completion_tokens: output_tokens,
          total_tokens:
            input_tokens +
            output_tokens +
            (cache_creation_input_tokens ?? 0) +
            (cache_read_input_tokens ?? 0),
          prompt_tokens_details: {
            cached_tokens: cache_read_input_tokens ?? 0,
          },
          ...(shouldSendCacheUsage && {
            cache_read_input_tokens: cache_read_input_tokens,
            cache_creation_input_tokens: cache_creation_input_tokens,
          }),

google-vertex-ai/chatComplete.ts#L898C6-L909C9

  usage: {
        prompt_tokens: input_tokens,
        completion_tokens: output_tokens,
        total_tokens: totalTokens,
        prompt_tokens_details: {
          cached_tokens: cache_read_input_tokens,
        },
        ...(shouldSendCacheUsage && {
          cache_read_input_tokens: cache_read_input_tokens,
          cache_creation_input_tokens: cache_creation_input_tokens,
        }),
      },

bedrock/chatComplete.ts#L550C4-L565C9

   usage: {
        prompt_tokens:
          response.usage.inputTokens +
          cacheReadInputTokens +
          cacheWriteInputTokens,
        completion_tokens: response.usage.outputTokens,
        total_tokens: response.usage.totalTokens, // contains the cache usage as well
        prompt_tokens_details: {
          cached_tokens: cacheReadInputTokens,
        },
        // we only want to be sending this for anthropic models and this is not openai compliant
        ...((cacheReadInputTokens > 0 || cacheWriteInputTokens > 0) && {
          cache_read_input_tokens: cacheReadInputTokens,
          cache_creation_input_tokens: cacheWriteInputTokens,
        }),
      },

Your Twitter/LinkedIn

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingtriage

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions