-
Notifications
You must be signed in to change notification settings - Fork 457
Handle token limit exceptions #5056
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…ed body in dashboard
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Claude finished @H2Shami's task —— View job PR Review CompleteReviewed PR #5056: Handle token limit exceptions - Implementation of token limit exception handling with three strategies (truncate, middle-out, fallback). Todo List:
Critical Issues Found🚨 Critical Bug: Truncate Strategy Returns
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Summary
Implements token limit exception handling with three strategies (truncate, middle-out, fallback) to manage requests exceeding LLM context windows. Users can specify handling via Helicone-Token-Limit-Exception-Handler header.
Key Changes
- Added
HeliconeTokenLimitExceptionHandlerenum with three handler types:truncate,middle-out, andfallback - New
tokenLimitException.tsutility with token estimation, model lookup, and message truncation algorithms - Integration in
HeliconeProxyRequestMapper.applyTokenLimitExceptionHandler()to modify request body before forwarding to LLM providers - Token limit calculation accounts for requested completion tokens across multiple provider-specific field names
Issues Found
- Critical:
applyTruncateStrategymutates the parsed body but returnsundefinedinstead ofJSON.stringify(parsedBody), causing the truncate strategy to fail silently - Code Quality: Multiple debug
console.logstatements should be removed before production deployment
Confidence Score: 2/5
- This PR has a critical bug that will cause the truncate strategy to fail silently in production
- Score reflects a critical logic bug in
applyTruncateStrategy(worker/src/lib/util/tokenLimitException.ts:651-666) that returnsundefinedinstead of the modified JSON body, which means the truncate handler won't work. The middle-out and fallback strategies appear to work correctly. Additionally, there are 9 debugconsole.logstatements that should be removed. - Pay close attention to
worker/src/lib/util/tokenLimitException.ts- theapplyTruncateStrategyfunction must be fixed before merge
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| worker/src/lib/models/HeliconeHeaders.ts | 5/5 | Added HeliconeTokenLimitExceptionHandler enum and parsing logic for new header - implementation is clean and correct |
| worker/src/lib/models/HeliconeProxyRequest.ts | 4/5 | Integrates token limit exception handling into request processing flow - logic is sound but condition on line 283-284 could use clarifying comment |
| worker/src/lib/util/tokenLimitException.ts | 2/5 | New utility file implementing three token limit strategies - applyTruncateStrategy has critical bug (returns undefined instead of JSON), multiple debug console.log statements need removal |
Sequence Diagram
sequenceDiagram
participant Client
participant Worker as Cloudflare Worker
participant RW as RequestWrapper
participant HPRM as HeliconeProxyRequestMapper
participant TLE as tokenLimitException
participant Buffer as RequestBodyBuffer
participant LLM as LLM Provider
Client->>Worker: HTTP Request with Helicone-Token-Limit-Exception-Handler header
Worker->>RW: Create RequestWrapper
RW->>RW: Parse HeliconeHeaders (getTokenLimitExceptionHandler)
Worker->>HPRM: Create HeliconeProxyRequestMapper
HPRM->>HPRM: tryToProxyRequest()
HPRM->>Buffer: safelyGetBody()
Buffer-->>HPRM: body (ValidRequestBody)
alt Token Limit Handler Set
HPRM->>HPRM: applyTokenLimitExceptionHandler(body)
HPRM->>TLE: parseRequestPayload(body)
TLE-->>HPRM: ParsedRequestPayload
HPRM->>TLE: resolvePrimaryModel(parsedBody, modelOverride)
TLE-->>HPRM: primaryModel
HPRM->>TLE: estimateTokenCount(parsedBody, primaryModel)
TLE-->>HPRM: estimatedTokens
HPRM->>TLE: getModelTokenLimit(provider, primaryModel)
TLE-->>HPRM: modelContextLimit
HPRM->>HPRM: Calculate tokenLimit (contextLimit - requestedCompletionTokens)
alt estimatedTokens > tokenLimit
alt Handler = Truncate
HPRM->>TLE: applyTruncateStrategy(parsedBody)
TLE->>TLE: truncateAndNormalizeText() for each message
Note over TLE: BUG: Returns undefined instead of JSON string
TLE-->>HPRM: undefined (should be JSON string)
else Handler = MiddleOut
HPRM->>TLE: applyMiddleOutStrategy(parsedBody, primaryModel, tokenLimit)
TLE->>TLE: middleOutMessagesToFitLimit()
TLE->>TLE: Split messages into chunks, remove middle chunks
TLE-->>HPRM: JSON.stringify(finalPayload)
else Handler = Fallback
HPRM->>TLE: applyFallbackStrategy(parsedBody, primaryModel, estimatedTokens, tokenLimit)
TLE->>TLE: selectFallbackModel(parsedBody.model)
TLE->>TLE: Update parsedBody.model to fallback
TLE-->>HPRM: JSON.stringify(parsedBody)
end
HPRM->>Buffer: tempSetBody(modifiedBody)
Buffer-->>HPRM: void
end
end
HPRM-->>Worker: HeliconeProxyRequest
Worker->>LLM: Forward modified request
LLM-->>Worker: Response
Worker-->>Client: Response
3 files reviewed, 10 comments
| export function getModelTokenLimit( | ||
| provider: Provider, | ||
| model: string | null | undefined | ||
| ): number | null { | ||
| if (!model) { | ||
| return null; | ||
| } | ||
|
|
||
| const providerName = heliconeProviderToModelProviderName(provider); | ||
| if (!providerName) { | ||
| return null; | ||
| } | ||
|
|
||
| const config = findModelProviderConfig(model, providerName); | ||
| if (!config || typeof config.contextLength !== "number") { | ||
| return null; | ||
| } | ||
|
|
||
| return config.contextLength; | ||
| } | ||
|
|
||
| export function findModelProviderConfig( | ||
| model: string, | ||
| providerName: ModelProviderName | ||
| ): ModelProviderConfig | null { | ||
| const directConfig = lookupProviderConfig(model, providerName); | ||
| if (directConfig) { | ||
| return directConfig; | ||
| } | ||
| return searchProviderModels(model, providerName); | ||
| } | ||
|
|
||
| export function lookupProviderConfig( | ||
| model: string, | ||
| providerName: ModelProviderName | ||
| ): ModelProviderConfig | null { | ||
| const candidates = buildLookupCandidates(model); | ||
| for (const candidate of candidates) { | ||
| const result = registry.getModelProviderConfigByProviderModelId( | ||
| candidate, | ||
| providerName | ||
| ); | ||
| if (result.error === null && result.data) { | ||
| return result.data; | ||
| } | ||
| } | ||
| return null; | ||
| } | ||
|
|
||
| export function searchProviderModels( | ||
| model: string, | ||
| providerName: ModelProviderName | ||
| ): ModelProviderConfig | null { | ||
| const providerModelsResult = registry.getProviderModels(providerName); | ||
| if (providerModelsResult.error !== null || !providerModelsResult.data) { | ||
| return null; | ||
| } | ||
|
|
||
| for (const canonicalModel of providerModelsResult.data.values()) { | ||
| const configsResult = registry.getModelProviderConfigs(canonicalModel); | ||
| if (configsResult.error !== null || !configsResult.data) { | ||
| continue; | ||
| } | ||
|
|
||
| for (const config of configsResult.data) { | ||
| if (config.provider !== providerName) { | ||
| continue; | ||
| } | ||
|
|
||
| if (modelIdentifierMatches(model, config.providerModelId)) { | ||
| return config; | ||
| } | ||
| } | ||
| } | ||
|
|
||
| return null; | ||
| } | ||
|
|
||
| export function buildLookupCandidates(model: string): string[] { | ||
| const trimmed = model.trim(); | ||
| if (!trimmed) { | ||
| return []; | ||
| } | ||
|
|
||
| const candidates = new Set<string>(); | ||
| candidates.add(trimmed); | ||
|
|
||
| const lower = trimmed.toLowerCase(); | ||
| if (lower !== trimmed) { | ||
| candidates.add(lower); | ||
| } | ||
|
|
||
| const delimiters = [":", "-"]; | ||
| for (const delimiter of delimiters) { | ||
| let current = trimmed; | ||
| while (current.includes(delimiter)) { | ||
| current = current.substring(0, current.lastIndexOf(delimiter)); | ||
| const normalized = current.trim(); | ||
| if (!normalized || candidates.has(normalized)) { | ||
| continue; | ||
| } | ||
| candidates.add(normalized); | ||
| candidates.add(normalized.toLowerCase()); | ||
| } | ||
| } | ||
|
|
||
| return Array.from(candidates); | ||
| } | ||
|
|
||
| export function modelIdentifierMatches( | ||
| requestModel: string, | ||
| providerModelId: string | ||
| ): boolean { | ||
| const requestVariants = buildModelIdentifierVariants(requestModel); | ||
| const providerVariants = buildModelIdentifierVariants(providerModelId); | ||
|
|
||
| for (const requestVariant of requestVariants) { | ||
| for (const providerVariant of providerVariants) { | ||
| if (requestVariant === providerVariant) { | ||
| return true; | ||
| } | ||
|
|
||
| if ( | ||
| requestVariant.endsWith(`/${providerVariant}`) || | ||
| requestVariant.endsWith(`:${providerVariant}`) || | ||
| requestVariant.endsWith(`-${providerVariant}`) | ||
| ) { | ||
| return true; | ||
| } | ||
|
|
||
| if ( | ||
| providerVariant.endsWith(`/${requestVariant}`) || | ||
| providerVariant.endsWith(`:${requestVariant}`) || | ||
| providerVariant.endsWith(`-${requestVariant}`) | ||
| ) { | ||
| return true; | ||
| } | ||
| } | ||
| } | ||
|
|
||
| const sanitizedRequest = sanitizeModelIdentifier(requestModel); | ||
| const sanitizedProvider = sanitizeModelIdentifier(providerModelId); | ||
|
|
||
| if (sanitizedRequest.length === 0 || sanitizedProvider.length === 0) { | ||
| return false; | ||
| } | ||
|
|
||
| const index = sanitizedRequest.indexOf(sanitizedProvider); | ||
| if (index > 0) { | ||
| return true; | ||
| } | ||
|
|
||
| return false; | ||
| } | ||
|
|
||
| export function buildModelIdentifierVariants(identifier: string): string[] { | ||
| const trimmed = identifier.trim(); | ||
| if (!trimmed) { | ||
| return []; | ||
| } | ||
|
|
||
| const lower = trimmed.toLowerCase(); | ||
| const variants = new Set<string>([trimmed, lower]); | ||
|
|
||
| const delimiterParts = lower.split(/[:\/]/); | ||
| if (delimiterParts.length > 1) { | ||
| const lastPart = delimiterParts[delimiterParts.length - 1]; | ||
| if (lastPart) { | ||
| variants.add(lastPart); | ||
| } | ||
| } | ||
|
|
||
| return Array.from(variants).filter((variant) => variant.length > 0); | ||
| } | ||
|
|
||
| export function sanitizeModelIdentifier(identifier: string): string { | ||
| return identifier.toLowerCase().replace(/[^a-z0-9]/g, ""); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have any alternatives for this? shit is heinous to look at
Co-authored-by: Hammad Shami <[email protected]>
Ticket
ENG-3315: Implement Context Length Check
Summary
Add different handlers in the event that a user request exceeds the LLMs context window (more details in the ticket)
More work
I still need to update the docs, but that can be its own, small PR
Screenshots
Middle-out
Fallback
Truncate