Skip to content

Conversation

@chitalian
Copy link
Contributor

We should rely on the model provider returning the usage tokens. Calculating usage tokens on our CPU is hurting performance and is not accurate at all. especailly with more complex modalities, function calling etc...

Since we are prioritizing the gateway moving forward, reducing this extra layer of complexity is helpful

@vercel
Copy link

vercel bot commented Oct 8, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
helicone Ready Ready Preview Comment Nov 21, 2025 6:01pm
helicone-bifrost Ready Ready Preview Comment Nov 21, 2025 6:01pm
helicone-eu Ready Ready Preview Comment Nov 21, 2025 6:01pm

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Summary

This PR removes local tokenization dependencies from the Helicone codebase in favor of relying on LLM provider-returned token usage data. The changes eliminate CPU-intensive token counting that was impacting performance and accuracy, particularly for complex scenarios involving function calls and multimodal inputs.

The PR deletes key tokenization files including tokenCounter.ts, gptWorker.ts, and tokenRouter.ts, while removing dependencies on @anthropic-ai/tokenizer, js-tiktoken, and tiktoken from package.json. Body processors for OpenAI, Anthropic, Vercel, and Llama streams have been simplified to rely exclusively on provider-supplied usage data rather than calculating tokens locally.

This architectural shift aligns with Helicone's strategic focus on their AI Gateway, reducing system complexity and computational overhead. The changes maintain backward compatibility by gracefully handling cases where providers don't return usage data, typically returning -1 values with informative error messages directing users to enable stream usage in their provider settings.

Important Files Changed

Changed Files
Filename Score Overview
valhalla/jawn/src/lib/shared/bodyProcessors/openAIStreamProcessor.ts 4/5 Removes manual token counting and relies entirely on OpenAI provider usage data with proper fallback handling
valhalla/jawn/src/lib/shared/bodyProcessors/anthropicStreamBodyProcessor.ts 4/5 Simplifies processor by removing complex model-specific tokenization logic and relying on provider tokens
valhalla/jawn/src/lib/tokens/gptWorker.js 4/5 Complete deletion of Node.js entry point for TypeScript tokenization worker
valhalla/jawn/package.json 4/5 Removes three tokenization library dependencies while keeping gpt-tokenizer for potential other uses
valhalla/jawn/src/lib/tokens/tokenCounter.ts 4/5 Complete removal of all local token counting functions and tokenizer initialization
valhalla/jawn/src/lib/shared/bodyProcessors/vercelStreamProcessor.ts 4/5 Eliminates 66-line fallback token calculation block and relies solely on stream-provided usage
valhalla/jawn/src/lib/shared/bodyProcessors/anthropicBodyProcessor.ts 4/5 Removes legacy token counting for older Claude models and uses only Anthropic-provided usage data
valhalla/jawn/src/lib/shared/bodyProcessors/llamaStreamBodyProcessor.ts 4/5 Removes tokenizer fallback logic and simplifies to use only provider-returned metrics
valhalla/jawn/src/lib/routers/tokenRouter.ts 1/5 Critical issue: Router deleted but still imported in index.ts, will cause runtime errors
valhalla/jawn/src/lib/tokens/gptWorker.ts 4/5 Removes GPT tokenizer worker thread implementation to eliminate computational overhead

Confidence score: 2/5

  • This PR introduces a breaking change that will likely cause immediate runtime failures due to missing tokenRouter import
  • Score significantly lowered due to incomplete cleanup leaving dangling imports that will crash the application
  • Pay close attention to valhalla/jawn/src/lib/routers/tokenRouter.ts deletion and any files importing it

Sequence Diagram

sequenceDiagram
    participant User
Loading

Additional Comments (2)

  1. valhalla/jawn/src/lib/shared/bodyProcessors/anthropicStreamBodyProcessor.ts, line 237-313 (link)

    style: Dead code - these unused functions should be removed entirely instead of keeping them 'for reference'.

  2. valhalla/jawn/src/lib/routers/tokenRouter.ts, line 1 (link)

    logic: Critical issue: The main server file still imports this deleted tokenRouter (line 20 in index.ts) and uses it (line 171). This will cause a runtime error when the server starts. The import and usage need to be removed from index.ts.

10 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

Comment on lines 22 to 26
totalTokens:
parsedResponseBody?.usage?.input_tokens +
parsedResponseBody?.usage?.output_tokens +
parsedResponseBody?.usage?.cache_creation_input_tokens +
parsedResponseBody?.usage?.cache_read_input_tokens,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Consider using nullish coalescing (?? 0) for cache token values to handle undefined values safely in the addition

Prompt To Fix With AI
This is a comment left during a code review.
Path: valhalla/jawn/src/lib/shared/bodyProcessors/anthropicBodyProcessor.ts
Line: 22:26

Comment:
**style:** Consider using nullish coalescing (`?? 0`) for cache token values to handle undefined values safely in the addition

How can I resolve this? If you propose a fix, please make it concise.

@claude
Copy link
Contributor

claude bot commented Oct 8, 2025

Claude finished @chitalian's task —— View job


Code Review Complete

Score: 8/10 - Good to merge with minor suggestions

Summary:
This PR successfully removes local tokenization dependencies to rely on LLM provider-returned token usage data. The approach is sound for performance and accuracy, especially for complex scenarios like function calls and multimodal inputs. The implementation is clean with proper fallback handling.

Analysis

What was removed:

  • ✅ Deleted @anthropic-ai/tokenizer, js-tiktoken, tiktoken dependencies
  • ✅ Removed tokenCounter.ts, gptWorker.ts/js, tokenRouter.ts
  • ✅ Cleaned up index.ts imports properly (Greptile's alert was outdated)
  • ✅ Simplified all body processors to rely on provider usage data

What was improved:

  • ✅ Better error handling with informative messages directing users to documentation
  • ✅ Consistent fallback behavior across all processors
  • ✅ Reduced computational overhead and complexity
  • ✅ More accurate token counts from authoritative sources

Minor Issues & Suggestions

1. Null Safety in Anthropic Body Processor (anthropicBodyProcessor.ts:26-27)

// Consider using nullish coalescing for safer addition
totalTokens: (parsedResponseBody.input_tokens ?? 0),
promptTokens: (parsedResponseBody.input_tokens ?? 0),

2. Error Message Consistency
The OpenAI stream processor provides a helpful error message:

helicone_error: "counting tokens not supported, please see https://docs.helicone.ai/use-cases/enable-stream-usage"

Consider adding similar informative errors in other processors when usage data is missing.

3. Dead Code Removal (Found in original file)
In anthropicStreamBodyProcessor.ts, there appear to be unused function remnants around lines 237-313 that should be removed entirely rather than kept "for reference".

Security & Performance Assessment

✅ Security: No security concerns identified. The change actually improves security by:

  • Reducing attack surface (fewer dependencies)
  • Relying on authoritative provider data instead of local calculations

✅ Performance: Significant performance improvement by:

  • Eliminating CPU-intensive tokenization
  • Reducing memory usage from tokenizer libraries
  • Faster response processing

✅ Accuracy: Improved accuracy by:

  • Using provider-calculated tokens (especially for function calls, multimodal content)
  • Eliminating estimation errors from local tokenization

Breaking Changes Assessment

Low Risk: The change maintains backward compatibility:

  • Same API surface for usage data
  • Graceful fallback when usage data unavailable (-1 values with helpful errors)
  • Existing integrations should continue working

Testing Recommendations

While not blocking, consider testing:

  1. Stream responses from different providers (OpenAI, Anthropic, Vercel)
  2. Non-stream responses with and without usage data
  3. Error scenarios when providers don't return usage information

Suggestions Summary:

  • Minor: Add nullish coalescing in anthropicBodyProcessor.ts:26-27
  • Cleanup: Remove any remaining dead code from stream processors
  • Enhancement: Add consistent error messages across all processors when usage data missing

Overall Assessment: This is a well-executed refactor that meaningfully improves performance and accuracy while maintaining API compatibility. The strategic shift toward provider-supplied usage data aligns well with the AI Gateway focus.

chitalian and others added 2 commits November 20, 2025 10:47
Resolved conflicts:
- .claude/settings.local.json: Kept main's version
- anthropicBodyProcessor.ts: Merged logic from main and fixed unsafe arithmetic with nullish coalescing
- Auto-generated files (swagger.json, routes.ts, jawnTypes): Accepted main's versions
- yarn.lock: Accepted main's version

Also fixed unsafe arithmetic operations in anthropicBodyProcessor.ts by using nullish coalescing (?? 0) for cache token calculations.
- Remove tokenRouter import and usage from index.ts (breaking import fix)
- Fix unsafe arithmetic in anthropicStreamBodyProcessor.ts by using nullish coalescing (?? 0)
- Remove dead code functions (recursivelyConsolidateAnthropicListForClaude3 and recursivelyConsolidateAnthropic)
- Remove heliconeCalculated flag since we're relying on provider tokens

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants