Handle token limit exceptions #5056

H2Shami · 2025-10-16T01:08:59Z

Ticket

ENG-3315: Implement Context Length Check

Summary

Add different handlers in the event that a user request exceeds the LLMs context window (more details in the ticket)

More work

I still need to update the docs, but that can be its own, small PR

Screenshots

Middle-out

Fallback

Truncate

…ed body in dashboard

vercel · 2025-10-16T01:09:04Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Preview	Comments	Updated (UTC)
helicone	Ready	Preview	Comment	Oct 22, 2025 5:19pm
helicone-bifrost	Ready	Preview	Comment	Oct 22, 2025 5:19pm
helicone-eu	Ready	Preview	Comment	Oct 22, 2025 5:19pm

claude · 2025-10-16T01:09:16Z

Claude finished @H2Shami's task —— View job

PR Review Complete

Reviewed PR #5056: Handle token limit exceptions - Implementation of token limit exception handling with three strategies (truncate, middle-out, fallback).

Todo List:

Read and analyze changed files
Review token limit exception implementation
Identify critical bugs and security issues
Check code quality and best practices
Provide specific suggestions with examples
Assign confidence score and summary

Critical Issues Found

🚨 Critical Bug: Truncate Strategy Returns `undefined`

File: worker/src/lib/util/tokenLimitException.ts:641-655

The applyTruncateStrategy function has a critical bug where it mutates the parsed body but returns undefined instead of the modified JSON string:

export function applyTruncateStrategy(
  parsedBody: ParsedRequestPayload
): ValidRequestBody | undefined {
  if (!parsedBody.messages) {
    return; // This returns undefined
  }

  for (const message of parsedBody.messages) {
    if (typeof message?.content === "string") {
      message.content = truncateAndNormalizeText(message.content);
    }
  }

  return JSON.stringify(parsedBody); // ✅ This line should be reached
}

Problem: The early return on line 645 should return undefined, but the function should continue to return the stringified body. This will cause the truncate strategy to fail silently.

Fix: Change the early return to:

if (!parsedBody.messages) {
  return undefined;
}

🧹 Code Quality: Debug Console Logs

Files: worker/src/lib/util/tokenLimitException.ts

Found 9 debug console.log statements that should be removed before production:

Lines: 50, 65, 443, 654, 674, 679, 694 (and others)

Fix: Remove all console.log statements.

Architecture & Design Review

✅ Strong Points

Clean Architecture: Well-separated concerns with clear interfaces
Comprehensive Strategy Support: Three different approaches to handle token limits
Provider-Agnostic Design: Handles multiple LLM providers with model-specific token limits
Robust Token Estimation: Good heuristics with model-specific multipliers
Documentation: Excellent documentation with clear examples

⚠️ Areas for Improvement

1. Model Token Limit Logic

File: worker/src/lib/models/HeliconeProxyRequest.ts:283-284

The condition is confusing and has a comment noting special handling for fallback:

if (
  estimatedTokens === null ||
  tokenLimit === null ||
  (estimatedTokens <= tokenLimit &&
    handler != HeliconeTokenLimitExceptionHandler.Fallback) //needed to sort the extra model passed in request
) {
  return;
}

Suggestion: Add a clearer comment explaining why fallback bypasses the token limit check:

// For fallback strategy, we always process the request to normalize the model field
// even when under the token limit (to select primary vs fallback model)
if (
  estimatedTokens === null ||
  tokenLimit === null ||
  (estimatedTokens <= tokenLimit &&
    handler !== HeliconeTokenLimitExceptionHandler.Fallback)
) {
  return;
}

2. Token Estimation Accuracy

File: worker/src/lib/util/tokenLimitException.ts:430-434

The current estimation double-counts tool content:

const combinedText = [toolsText, contentText]
  .filter((segment) => segment.length > 0)
  .join(" ");

const estimated = Math.ceil(
  (combinedText.length + toolsText.length) * heuristic // toolsText counted twice
);

Fix:

const estimated = Math.ceil(combinedText.length * heuristic);

3. Middle-Out Algorithm Complexity

File: worker/src/lib/util/tokenLimitException.ts:625

The middle-out implementation is quite complex (300+ lines). Consider breaking it into smaller, focused functions for better maintainability.

Suggestion: Extract functions like:

splitMessageIntoChunks()
estimateChunkWeights()
selectChunksToKeep()

4. Error Handling

Missing error handling for edge cases:

Invalid JSON parsing
Network failures during token estimation
Malformed model names

5. Fallback Model Selection

File: worker/src/lib/util/tokenLimitException.ts:369-375

Current implementation only supports 2 models. Consider supporting more fallback options:

export function selectFallbackModel(modelField: unknown, attempt: number = 1): string | null {
  const candidates = extractModelCandidates(modelField);
  return candidates[attempt] ?? candidates[candidates.length - 1] ?? null;
}

Security Review

✅ No Critical Security Issues Found

Input Validation: Proper JSON parsing with try-catch
No Code Injection: Safe string manipulation
Rate Limiting Respect: Token limits are respected
Data Sanitization: Good normalization patterns

Performance Considerations

⚡ Optimizations

Token Estimation Caching: Consider caching token estimates for repeated content
Model Registry Lookups: Multiple registry calls could be optimized with caching
Large Message Handling: Middle-out algorithm handles large inputs efficiently

Testing Recommendations

Unit Tests: Add tests for each strategy with edge cases
Integration Tests: Test with various LLM providers and models
Performance Tests: Test with very large prompts
Error Cases: Test malformed inputs and network failures

Score: 3/10 - Critical bug prevents merge

Reason: The truncate strategy contains a critical bug that causes it to fail silently. While the overall architecture and implementation are solid, this bug makes the feature unreliable.

Suggestions Summary:

Critical Fixes (Must Fix Before Merge):

❌ Fix applyTruncateStrategy return value bug
❌ Remove all debug console.log statements

Recommended Improvements:

🔧 Fix token estimation double-counting
🔧 Add clearer comments for fallback logic
🔧 Consider breaking down complex middle-out algorithm
🔧 Add comprehensive error handling
🔧 Add unit tests for all strategies

Once the critical bugs are fixed, this would be a solid 7-8/10 implementation with good architecture and comprehensive functionality.

greptile-apps

Greptile Overview

Summary

Implements token limit exception handling with three strategies (truncate, middle-out, fallback) to manage requests exceeding LLM context windows. Users can specify handling via Helicone-Token-Limit-Exception-Handler header.

Key Changes

Added HeliconeTokenLimitExceptionHandler enum with three handler types: truncate, middle-out, and fallback
New tokenLimitException.ts utility with token estimation, model lookup, and message truncation algorithms
Integration in HeliconeProxyRequestMapper.applyTokenLimitExceptionHandler() to modify request body before forwarding to LLM providers
Token limit calculation accounts for requested completion tokens across multiple provider-specific field names

Issues Found

Critical: applyTruncateStrategy mutates the parsed body but returns undefined instead of JSON.stringify(parsedBody), causing the truncate strategy to fail silently
Code Quality: Multiple debug console.log statements should be removed before production deployment

Confidence Score: 2/5

This PR has a critical bug that will cause the truncate strategy to fail silently in production
Score reflects a critical logic bug in applyTruncateStrategy (worker/src/lib/util/tokenLimitException.ts:651-666) that returns undefined instead of the modified JSON body, which means the truncate handler won't work. The middle-out and fallback strategies appear to work correctly. Additionally, there are 9 debug console.log statements that should be removed.
Pay close attention to worker/src/lib/util/tokenLimitException.ts - the applyTruncateStrategy function must be fixed before merge

Important Files Changed

File Analysis

Filename	Score	Overview
worker/src/lib/models/HeliconeHeaders.ts	5/5	Added `HeliconeTokenLimitExceptionHandler` enum and parsing logic for new header - implementation is clean and correct
worker/src/lib/models/HeliconeProxyRequest.ts	4/5	Integrates token limit exception handling into request processing flow - logic is sound but condition on line 283-284 could use clarifying comment
worker/src/lib/util/tokenLimitException.ts	2/5	New utility file implementing three token limit strategies - `applyTruncateStrategy` has critical bug (returns undefined instead of JSON), multiple debug console.log statements need removal

Sequence Diagram

sequenceDiagram
    participant Client
    participant Worker as Cloudflare Worker
    participant RW as RequestWrapper
    participant HPRM as HeliconeProxyRequestMapper
    participant TLE as tokenLimitException
    participant Buffer as RequestBodyBuffer
    participant LLM as LLM Provider

    Client->>Worker: HTTP Request with Helicone-Token-Limit-Exception-Handler header
    Worker->>RW: Create RequestWrapper
    RW->>RW: Parse HeliconeHeaders (getTokenLimitExceptionHandler)
    Worker->>HPRM: Create HeliconeProxyRequestMapper
    HPRM->>HPRM: tryToProxyRequest()
    HPRM->>Buffer: safelyGetBody()
    Buffer-->>HPRM: body (ValidRequestBody)
    
    alt Token Limit Handler Set
        HPRM->>HPRM: applyTokenLimitExceptionHandler(body)
        HPRM->>TLE: parseRequestPayload(body)
        TLE-->>HPRM: ParsedRequestPayload
        HPRM->>TLE: resolvePrimaryModel(parsedBody, modelOverride)
        TLE-->>HPRM: primaryModel
        HPRM->>TLE: estimateTokenCount(parsedBody, primaryModel)
        TLE-->>HPRM: estimatedTokens
        HPRM->>TLE: getModelTokenLimit(provider, primaryModel)
        TLE-->>HPRM: modelContextLimit
        HPRM->>HPRM: Calculate tokenLimit (contextLimit - requestedCompletionTokens)
        
        alt estimatedTokens > tokenLimit
            alt Handler = Truncate
                HPRM->>TLE: applyTruncateStrategy(parsedBody)
                TLE->>TLE: truncateAndNormalizeText() for each message
                Note over TLE: BUG: Returns undefined instead of JSON string
                TLE-->>HPRM: undefined (should be JSON string)
            else Handler = MiddleOut
                HPRM->>TLE: applyMiddleOutStrategy(parsedBody, primaryModel, tokenLimit)
                TLE->>TLE: middleOutMessagesToFitLimit()
                TLE->>TLE: Split messages into chunks, remove middle chunks
                TLE-->>HPRM: JSON.stringify(finalPayload)
            else Handler = Fallback
                HPRM->>TLE: applyFallbackStrategy(parsedBody, primaryModel, estimatedTokens, tokenLimit)
                TLE->>TLE: selectFallbackModel(parsedBody.model)
                TLE->>TLE: Update parsedBody.model to fallback
                TLE-->>HPRM: JSON.stringify(parsedBody)
            end
            
            HPRM->>Buffer: tempSetBody(modifiedBody)
            Buffer-->>HPRM: void
        end
    end
    
    HPRM-->>Worker: HeliconeProxyRequest
    Worker->>LLM: Forward modified request
    LLM-->>Worker: Response
    Worker-->>Client: Response

_{3 files reviewed, 10 comments}

_{Edit Code Review Agent Settings | Greptile}

worker/src/lib/util/tokenLimitException.ts

worker/src/lib/models/HeliconeProxyRequest.ts

H2Shami · 2025-10-16T03:11:27Z

worker/src/lib/util/tokenLimitException.ts

+export function getModelTokenLimit(
+  provider: Provider,
+  model: string | null | undefined
+): number | null {
+  if (!model) {
+    return null;
+  }
+
+  const providerName = heliconeProviderToModelProviderName(provider);
+  if (!providerName) {
+    return null;
+  }
+
+  const config = findModelProviderConfig(model, providerName);
+  if (!config || typeof config.contextLength !== "number") {
+    return null;
+  }
+
+  return config.contextLength;
+}
+
+export function findModelProviderConfig(
+  model: string,
+  providerName: ModelProviderName
+): ModelProviderConfig | null {
+  const directConfig = lookupProviderConfig(model, providerName);
+  if (directConfig) {
+    return directConfig;
+  }
+  return searchProviderModels(model, providerName);
+}
+
+export function lookupProviderConfig(
+  model: string,
+  providerName: ModelProviderName
+): ModelProviderConfig | null {
+  const candidates = buildLookupCandidates(model);
+  for (const candidate of candidates) {
+    const result = registry.getModelProviderConfigByProviderModelId(
+      candidate,
+      providerName
+    );
+    if (result.error === null && result.data) {
+      return result.data;
+    }
+  }
+  return null;
+}
+
+export function searchProviderModels(
+  model: string,
+  providerName: ModelProviderName
+): ModelProviderConfig | null {
+  const providerModelsResult = registry.getProviderModels(providerName);
+  if (providerModelsResult.error !== null || !providerModelsResult.data) {
+    return null;
+  }
+
+  for (const canonicalModel of providerModelsResult.data.values()) {
+    const configsResult = registry.getModelProviderConfigs(canonicalModel);
+    if (configsResult.error !== null || !configsResult.data) {
+      continue;
+    }
+
+    for (const config of configsResult.data) {
+      if (config.provider !== providerName) {
+        continue;
+      }
+
+      if (modelIdentifierMatches(model, config.providerModelId)) {
+        return config;
+      }
+    }
+  }
+
+  return null;
+}
+
+export function buildLookupCandidates(model: string): string[] {
+  const trimmed = model.trim();
+  if (!trimmed) {
+    return [];
+  }
+
+  const candidates = new Set<string>();
+  candidates.add(trimmed);
+
+  const lower = trimmed.toLowerCase();
+  if (lower !== trimmed) {
+    candidates.add(lower);
+  }
+
+  const delimiters = [":", "-"];
+  for (const delimiter of delimiters) {
+    let current = trimmed;
+    while (current.includes(delimiter)) {
+      current = current.substring(0, current.lastIndexOf(delimiter));
+      const normalized = current.trim();
+      if (!normalized || candidates.has(normalized)) {
+        continue;
+      }
+      candidates.add(normalized);
+      candidates.add(normalized.toLowerCase());
+    }
+  }
+
+  return Array.from(candidates);
+}
+
+export function modelIdentifierMatches(
+  requestModel: string,
+  providerModelId: string
+): boolean {
+  const requestVariants = buildModelIdentifierVariants(requestModel);
+  const providerVariants = buildModelIdentifierVariants(providerModelId);
+
+  for (const requestVariant of requestVariants) {
+    for (const providerVariant of providerVariants) {
+      if (requestVariant === providerVariant) {
+        return true;
+      }
+
+      if (
+        requestVariant.endsWith(`/${providerVariant}`) ||
+        requestVariant.endsWith(`:${providerVariant}`) ||
+        requestVariant.endsWith(`-${providerVariant}`)
+      ) {
+        return true;
+      }
+
+      if (
+        providerVariant.endsWith(`/${requestVariant}`) ||
+        providerVariant.endsWith(`:${requestVariant}`) ||
+        providerVariant.endsWith(`-${requestVariant}`)
+      ) {
+        return true;
+      }
+    }
+  }
+
+  const sanitizedRequest = sanitizeModelIdentifier(requestModel);
+  const sanitizedProvider = sanitizeModelIdentifier(providerModelId);
+
+  if (sanitizedRequest.length === 0 || sanitizedProvider.length === 0) {
+    return false;
+  }
+
+  const index = sanitizedRequest.indexOf(sanitizedProvider);
+  if (index > 0) {
+    return true;
+  }
+
+  return false;
+}
+
+export function buildModelIdentifierVariants(identifier: string): string[] {
+  const trimmed = identifier.trim();
+  if (!trimmed) {
+    return [];
+  }
+
+  const lower = trimmed.toLowerCase();
+  const variants = new Set<string>([trimmed, lower]);
+
+  const delimiterParts = lower.split(/[:\/]/);
+  if (delimiterParts.length > 1) {
+    const lastPart = delimiterParts[delimiterParts.length - 1];
+    if (lastPart) {
+      variants.add(lastPart);
+    }
+  }
+
+  return Array.from(variants).filter((variant) => variant.length > 0);
+}
+
+export function sanitizeModelIdentifier(identifier: string): string {
+  return identifier.toLowerCase().replace(/[^a-z0-9]/g, "");
+}


do we have any alternatives for this? shit is heinous to look at

Co-authored-by: Hammad Shami <[email protected]>

H2Shami added 10 commits October 15, 2025 12:49

add fallbacks to handle tokenLimitException errors

16a7235

use better chunking algo

745e578

move tokenLimitException handling into util file

bd186f6

remove superflous comments

f4bceaf

algo optimizations to middle out technique

7b8ef49

move tokenLimitException handling to request wrapper to reflect handl…

61dab98

…ed body in dashboard

remove logic from requestWrapper

7a859b2

middle-out fixes

e1c84a5

fallback handler fix

6e54ee3

truncate message fix

8b64e35

H2Shami changed the title ~~vvHandle token limit exceptions~~ Handle token limit exceptions Oct 16, 2025

greptile-apps bot reviewed Oct 16, 2025

View reviewed changes

remove console logs

fea7b5e

vercel bot temporarily deployed to Preview – helicone-eu October 16, 2025 03:08 Inactive

vercel bot temporarily deployed to Preview – helicone October 16, 2025 03:08 Inactive

vercel bot temporarily deployed to Preview – helicone-bifrost October 16, 2025 03:08 Inactive

H2Shami commented Oct 16, 2025

View reviewed changes

Helicone deleted a comment from greptile-apps bot Oct 16, 2025

remove unecessary escape char

cb34cf4

vercel bot temporarily deployed to Preview – helicone October 16, 2025 03:17 Inactive

vercel bot temporarily deployed to Preview – helicone-eu October 16, 2025 03:17 Inactive

vercel bot temporarily deployed to Preview – helicone-bifrost October 16, 2025 03:17 Inactive

add docs for token limit exception handler limits (#5063)

84d6775

Co-authored-by: Hammad Shami <[email protected]>

vercel bot deployed to Preview – helicone-eu October 21, 2025 17:29 View deployment

vercel bot deployed to Preview – helicone-bifrost October 21, 2025 17:31 View deployment

vercel bot deployed to Preview – helicone October 21, 2025 18:14 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Handle token limit exceptions #5056

Handle token limit exceptions #5056

Uh oh!

H2Shami commented Oct 16, 2025

Uh oh!

vercel bot commented Oct 16, 2025 •

edited

Loading

Uh oh!

claude bot commented Oct 16, 2025 •

edited

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

H2Shami Oct 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Handle token limit exceptions #5056

Are you sure you want to change the base?

Handle token limit exceptions #5056

Uh oh!

Conversation

H2Shami commented Oct 16, 2025

Ticket

Summary

More work

Screenshots

Middle-out

Fallback

Truncate

Uh oh!

vercel bot commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude bot commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Complete

Todo List:

Critical Issues Found

🚨 Critical Bug: Truncate Strategy Returns undefined

🧹 Code Quality: Debug Console Logs

Architecture & Design Review

✅ Strong Points

⚠️ Areas for Improvement

1. Model Token Limit Logic

2. Token Estimation Accuracy

3. Middle-Out Algorithm Complexity

4. Error Handling

5. Fallback Model Selection

Security Review

✅ No Critical Security Issues Found

Performance Considerations

⚡ Optimizations

Testing Recommendations

Score: 3/10 - Critical bug prevents merge

Suggestions Summary:

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Greptile Overview

Summary

Key Changes

Issues Found

Confidence Score: 2/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

H2Shami Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vercel bot commented Oct 16, 2025 •

edited

Loading

claude bot commented Oct 16, 2025 •

edited

Loading

🚨 Critical Bug: Truncate Strategy Returns `undefined`