-
Notifications
You must be signed in to change notification settings - Fork 457
remove tokenizer deps #5026
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
remove tokenizer deps #5026
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Summary
This PR removes local tokenization dependencies from the Helicone codebase in favor of relying on LLM provider-returned token usage data. The changes eliminate CPU-intensive token counting that was impacting performance and accuracy, particularly for complex scenarios involving function calls and multimodal inputs.The PR deletes key tokenization files including tokenCounter.ts, gptWorker.ts, and tokenRouter.ts, while removing dependencies on @anthropic-ai/tokenizer, js-tiktoken, and tiktoken from package.json. Body processors for OpenAI, Anthropic, Vercel, and Llama streams have been simplified to rely exclusively on provider-supplied usage data rather than calculating tokens locally.
This architectural shift aligns with Helicone's strategic focus on their AI Gateway, reducing system complexity and computational overhead. The changes maintain backward compatibility by gracefully handling cases where providers don't return usage data, typically returning -1 values with informative error messages directing users to enable stream usage in their provider settings.
Important Files Changed
Changed Files
| Filename | Score | Overview |
|---|---|---|
| valhalla/jawn/src/lib/shared/bodyProcessors/openAIStreamProcessor.ts | 4/5 | Removes manual token counting and relies entirely on OpenAI provider usage data with proper fallback handling |
| valhalla/jawn/src/lib/shared/bodyProcessors/anthropicStreamBodyProcessor.ts | 4/5 | Simplifies processor by removing complex model-specific tokenization logic and relying on provider tokens |
| valhalla/jawn/src/lib/tokens/gptWorker.js | 4/5 | Complete deletion of Node.js entry point for TypeScript tokenization worker |
| valhalla/jawn/package.json | 4/5 | Removes three tokenization library dependencies while keeping gpt-tokenizer for potential other uses |
| valhalla/jawn/src/lib/tokens/tokenCounter.ts | 4/5 | Complete removal of all local token counting functions and tokenizer initialization |
| valhalla/jawn/src/lib/shared/bodyProcessors/vercelStreamProcessor.ts | 4/5 | Eliminates 66-line fallback token calculation block and relies solely on stream-provided usage |
| valhalla/jawn/src/lib/shared/bodyProcessors/anthropicBodyProcessor.ts | 4/5 | Removes legacy token counting for older Claude models and uses only Anthropic-provided usage data |
| valhalla/jawn/src/lib/shared/bodyProcessors/llamaStreamBodyProcessor.ts | 4/5 | Removes tokenizer fallback logic and simplifies to use only provider-returned metrics |
| valhalla/jawn/src/lib/routers/tokenRouter.ts | 1/5 | Critical issue: Router deleted but still imported in index.ts, will cause runtime errors |
| valhalla/jawn/src/lib/tokens/gptWorker.ts | 4/5 | Removes GPT tokenizer worker thread implementation to eliminate computational overhead |
Confidence score: 2/5
- This PR introduces a breaking change that will likely cause immediate runtime failures due to missing tokenRouter import
- Score significantly lowered due to incomplete cleanup leaving dangling imports that will crash the application
- Pay close attention to valhalla/jawn/src/lib/routers/tokenRouter.ts deletion and any files importing it
Sequence Diagram
sequenceDiagram
participant User
Additional Comments (2)
-
valhalla/jawn/src/lib/shared/bodyProcessors/anthropicStreamBodyProcessor.ts, line 237-313 (link)style: Dead code - these unused functions should be removed entirely instead of keeping them 'for reference'.
-
valhalla/jawn/src/lib/routers/tokenRouter.ts, line 1 (link)logic: Critical issue: The main server file still imports this deleted tokenRouter (line 20 in index.ts) and uses it (line 171). This will cause a runtime error when the server starts. The import and usage need to be removed from index.ts.
10 files reviewed, 3 comments
| totalTokens: | ||
| parsedResponseBody?.usage?.input_tokens + | ||
| parsedResponseBody?.usage?.output_tokens + | ||
| parsedResponseBody?.usage?.cache_creation_input_tokens + | ||
| parsedResponseBody?.usage?.cache_read_input_tokens, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Consider using nullish coalescing (?? 0) for cache token values to handle undefined values safely in the addition
Prompt To Fix With AI
This is a comment left during a code review.
Path: valhalla/jawn/src/lib/shared/bodyProcessors/anthropicBodyProcessor.ts
Line: 22:26
Comment:
**style:** Consider using nullish coalescing (`?? 0`) for cache token values to handle undefined values safely in the addition
How can I resolve this? If you propose a fix, please make it concise.|
Claude finished @chitalian's task —— View job Code Review CompleteScore: 8/10 - Good to merge with minor suggestions Summary: AnalysisWhat was removed:
What was improved:
Minor Issues & Suggestions1. Null Safety in Anthropic Body Processor ( // Consider using nullish coalescing for safer addition
totalTokens: (parsedResponseBody.input_tokens ?? 0),
promptTokens: (parsedResponseBody.input_tokens ?? 0),2. Error Message Consistency helicone_error: "counting tokens not supported, please see https://docs.helicone.ai/use-cases/enable-stream-usage"Consider adding similar informative errors in other processors when usage data is missing. 3. Dead Code Removal (Found in original file) Security & Performance Assessment✅ Security: No security concerns identified. The change actually improves security by:
✅ Performance: Significant performance improvement by:
✅ Accuracy: Improved accuracy by:
Breaking Changes AssessmentLow Risk: The change maintains backward compatibility:
Testing RecommendationsWhile not blocking, consider testing:
Suggestions Summary:
Overall Assessment: This is a well-executed refactor that meaningfully improves performance and accuracy while maintaining API compatibility. The strategic shift toward provider-supplied usage data aligns well with the AI Gateway focus. |
Resolved conflicts: - .claude/settings.local.json: Kept main's version - anthropicBodyProcessor.ts: Merged logic from main and fixed unsafe arithmetic with nullish coalescing - Auto-generated files (swagger.json, routes.ts, jawnTypes): Accepted main's versions - yarn.lock: Accepted main's version Also fixed unsafe arithmetic operations in anthropicBodyProcessor.ts by using nullish coalescing (?? 0) for cache token calculations.
- Remove tokenRouter import and usage from index.ts (breaking import fix) - Fix unsafe arithmetic in anthropicStreamBodyProcessor.ts by using nullish coalescing (?? 0) - Remove dead code functions (recursivelyConsolidateAnthropicListForClaude3 and recursivelyConsolidateAnthropic) - Remove heliconeCalculated flag since we're relying on provider tokens 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
We should rely on the model provider returning the usage tokens. Calculating usage tokens on our CPU is hurting performance and is not accurate at all. especailly with more complex modalities, function calling etc...
Since we are prioritizing the gateway moving forward, reducing this extra layer of complexity is helpful