Skip to content

Conversation

@H2Shami
Copy link
Collaborator

@H2Shami H2Shami commented Oct 16, 2025

Ticket

ENG-3315: Implement Context Length Check

Summary

Add different handlers in the event that a user request exceeds the LLMs context window (more details in the ticket)

More work

I still need to update the docs, but that can be its own, small PR

Screenshots

Middle-out

Screenshot 2025-10-15 at 9 36 55 AM Screenshot 2025-10-15 at 5 49 29 PM

Fallback

Screenshot 2025-10-15 at 5 51 52 PM Screenshot 2025-10-15 at 5 57 20 PM

Truncate

Screenshot 2025-10-15 at 6 08 33 PM

@vercel
Copy link

vercel bot commented Oct 16, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
helicone Ready Ready Preview Comment Oct 22, 2025 5:19pm
helicone-bifrost Ready Ready Preview Comment Oct 22, 2025 5:19pm
helicone-eu Ready Ready Preview Comment Oct 22, 2025 5:19pm

@claude
Copy link
Contributor

claude bot commented Oct 16, 2025

Claude finished @H2Shami's task —— View job


PR Review Complete

Reviewed PR #5056: Handle token limit exceptions - Implementation of token limit exception handling with three strategies (truncate, middle-out, fallback).

Todo List:

  • Read and analyze changed files
  • Review token limit exception implementation
  • Identify critical bugs and security issues
  • Check code quality and best practices
  • Provide specific suggestions with examples
  • Assign confidence score and summary

Critical Issues Found

🚨 Critical Bug: Truncate Strategy Returns undefined

File: worker/src/lib/util/tokenLimitException.ts:641-655

The applyTruncateStrategy function has a critical bug where it mutates the parsed body but returns undefined instead of the modified JSON string:

export function applyTruncateStrategy(
  parsedBody: ParsedRequestPayload
): ValidRequestBody | undefined {
  if (!parsedBody.messages) {
    return; // This returns undefined
  }

  for (const message of parsedBody.messages) {
    if (typeof message?.content === "string") {
      message.content = truncateAndNormalizeText(message.content);
    }
  }

  return JSON.stringify(parsedBody); // ✅ This line should be reached
}

Problem: The early return on line 645 should return undefined, but the function should continue to return the stringified body. This will cause the truncate strategy to fail silently.

Fix: Change the early return to:

if (!parsedBody.messages) {
  return undefined;
}

🧹 Code Quality: Debug Console Logs

Files: worker/src/lib/util/tokenLimitException.ts

Found 9 debug console.log statements that should be removed before production:

  • Lines: 50, 65, 443, 654, 674, 679, 694 (and others)

Fix: Remove all console.log statements.


Architecture & Design Review

Strong Points

  1. Clean Architecture: Well-separated concerns with clear interfaces
  2. Comprehensive Strategy Support: Three different approaches to handle token limits
  3. Provider-Agnostic Design: Handles multiple LLM providers with model-specific token limits
  4. Robust Token Estimation: Good heuristics with model-specific multipliers
  5. Documentation: Excellent documentation with clear examples

⚠️ Areas for Improvement

1. Model Token Limit Logic

File: worker/src/lib/models/HeliconeProxyRequest.ts:283-284

The condition is confusing and has a comment noting special handling for fallback:

if (
  estimatedTokens === null ||
  tokenLimit === null ||
  (estimatedTokens <= tokenLimit &&
    handler != HeliconeTokenLimitExceptionHandler.Fallback) //needed to sort the extra model passed in request
) {
  return;
}

Suggestion: Add a clearer comment explaining why fallback bypasses the token limit check:

// For fallback strategy, we always process the request to normalize the model field
// even when under the token limit (to select primary vs fallback model)
if (
  estimatedTokens === null ||
  tokenLimit === null ||
  (estimatedTokens <= tokenLimit &&
    handler !== HeliconeTokenLimitExceptionHandler.Fallback)
) {
  return;
}

2. Token Estimation Accuracy

File: worker/src/lib/util/tokenLimitException.ts:430-434

The current estimation double-counts tool content:

const combinedText = [toolsText, contentText]
  .filter((segment) => segment.length > 0)
  .join(" ");

const estimated = Math.ceil(
  (combinedText.length + toolsText.length) * heuristic // toolsText counted twice
);

Fix:

const estimated = Math.ceil(combinedText.length * heuristic);

3. Middle-Out Algorithm Complexity

File: worker/src/lib/util/tokenLimitException.ts:625

The middle-out implementation is quite complex (300+ lines). Consider breaking it into smaller, focused functions for better maintainability.

Suggestion: Extract functions like:

  • splitMessageIntoChunks()
  • estimateChunkWeights()
  • selectChunksToKeep()

4. Error Handling

Missing error handling for edge cases:

  • Invalid JSON parsing
  • Network failures during token estimation
  • Malformed model names

5. Fallback Model Selection

File: worker/src/lib/util/tokenLimitException.ts:369-375

Current implementation only supports 2 models. Consider supporting more fallback options:

export function selectFallbackModel(modelField: unknown, attempt: number = 1): string | null {
  const candidates = extractModelCandidates(modelField);
  return candidates[attempt] ?? candidates[candidates.length - 1] ?? null;
}

Security Review

No Critical Security Issues Found

  1. Input Validation: Proper JSON parsing with try-catch
  2. No Code Injection: Safe string manipulation
  3. Rate Limiting Respect: Token limits are respected
  4. Data Sanitization: Good normalization patterns

Performance Considerations

Optimizations

  1. Token Estimation Caching: Consider caching token estimates for repeated content
  2. Model Registry Lookups: Multiple registry calls could be optimized with caching
  3. Large Message Handling: Middle-out algorithm handles large inputs efficiently

Testing Recommendations

  1. Unit Tests: Add tests for each strategy with edge cases
  2. Integration Tests: Test with various LLM providers and models
  3. Performance Tests: Test with very large prompts
  4. Error Cases: Test malformed inputs and network failures

Score: 3/10 - Critical bug prevents merge

Reason: The truncate strategy contains a critical bug that causes it to fail silently. While the overall architecture and implementation are solid, this bug makes the feature unreliable.

Suggestions Summary:

Critical Fixes (Must Fix Before Merge):

  • ❌ Fix applyTruncateStrategy return value bug
  • ❌ Remove all debug console.log statements

Recommended Improvements:

  • 🔧 Fix token estimation double-counting
  • 🔧 Add clearer comments for fallback logic
  • 🔧 Consider breaking down complex middle-out algorithm
  • 🔧 Add comprehensive error handling
  • 🔧 Add unit tests for all strategies

Once the critical bugs are fixed, this would be a solid 7-8/10 implementation with good architecture and comprehensive functionality.


@H2Shami H2Shami changed the title vvHandle token limit exceptions Handle token limit exceptions Oct 16, 2025
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Summary

Implements token limit exception handling with three strategies (truncate, middle-out, fallback) to manage requests exceeding LLM context windows. Users can specify handling via Helicone-Token-Limit-Exception-Handler header.

Key Changes

  • Added HeliconeTokenLimitExceptionHandler enum with three handler types: truncate, middle-out, and fallback
  • New tokenLimitException.ts utility with token estimation, model lookup, and message truncation algorithms
  • Integration in HeliconeProxyRequestMapper.applyTokenLimitExceptionHandler() to modify request body before forwarding to LLM providers
  • Token limit calculation accounts for requested completion tokens across multiple provider-specific field names

Issues Found

  • Critical: applyTruncateStrategy mutates the parsed body but returns undefined instead of JSON.stringify(parsedBody), causing the truncate strategy to fail silently
  • Code Quality: Multiple debug console.log statements should be removed before production deployment

Confidence Score: 2/5

  • This PR has a critical bug that will cause the truncate strategy to fail silently in production
  • Score reflects a critical logic bug in applyTruncateStrategy (worker/src/lib/util/tokenLimitException.ts:651-666) that returns undefined instead of the modified JSON body, which means the truncate handler won't work. The middle-out and fallback strategies appear to work correctly. Additionally, there are 9 debug console.log statements that should be removed.
  • Pay close attention to worker/src/lib/util/tokenLimitException.ts - the applyTruncateStrategy function must be fixed before merge

Important Files Changed

File Analysis

Filename Score Overview
worker/src/lib/models/HeliconeHeaders.ts 5/5 Added HeliconeTokenLimitExceptionHandler enum and parsing logic for new header - implementation is clean and correct
worker/src/lib/models/HeliconeProxyRequest.ts 4/5 Integrates token limit exception handling into request processing flow - logic is sound but condition on line 283-284 could use clarifying comment
worker/src/lib/util/tokenLimitException.ts 2/5 New utility file implementing three token limit strategies - applyTruncateStrategy has critical bug (returns undefined instead of JSON), multiple debug console.log statements need removal

Sequence Diagram

sequenceDiagram
    participant Client
    participant Worker as Cloudflare Worker
    participant RW as RequestWrapper
    participant HPRM as HeliconeProxyRequestMapper
    participant TLE as tokenLimitException
    participant Buffer as RequestBodyBuffer
    participant LLM as LLM Provider

    Client->>Worker: HTTP Request with Helicone-Token-Limit-Exception-Handler header
    Worker->>RW: Create RequestWrapper
    RW->>RW: Parse HeliconeHeaders (getTokenLimitExceptionHandler)
    Worker->>HPRM: Create HeliconeProxyRequestMapper
    HPRM->>HPRM: tryToProxyRequest()
    HPRM->>Buffer: safelyGetBody()
    Buffer-->>HPRM: body (ValidRequestBody)
    
    alt Token Limit Handler Set
        HPRM->>HPRM: applyTokenLimitExceptionHandler(body)
        HPRM->>TLE: parseRequestPayload(body)
        TLE-->>HPRM: ParsedRequestPayload
        HPRM->>TLE: resolvePrimaryModel(parsedBody, modelOverride)
        TLE-->>HPRM: primaryModel
        HPRM->>TLE: estimateTokenCount(parsedBody, primaryModel)
        TLE-->>HPRM: estimatedTokens
        HPRM->>TLE: getModelTokenLimit(provider, primaryModel)
        TLE-->>HPRM: modelContextLimit
        HPRM->>HPRM: Calculate tokenLimit (contextLimit - requestedCompletionTokens)
        
        alt estimatedTokens > tokenLimit
            alt Handler = Truncate
                HPRM->>TLE: applyTruncateStrategy(parsedBody)
                TLE->>TLE: truncateAndNormalizeText() for each message
                Note over TLE: BUG: Returns undefined instead of JSON string
                TLE-->>HPRM: undefined (should be JSON string)
            else Handler = MiddleOut
                HPRM->>TLE: applyMiddleOutStrategy(parsedBody, primaryModel, tokenLimit)
                TLE->>TLE: middleOutMessagesToFitLimit()
                TLE->>TLE: Split messages into chunks, remove middle chunks
                TLE-->>HPRM: JSON.stringify(finalPayload)
            else Handler = Fallback
                HPRM->>TLE: applyFallbackStrategy(parsedBody, primaryModel, estimatedTokens, tokenLimit)
                TLE->>TLE: selectFallbackModel(parsedBody.model)
                TLE->>TLE: Update parsedBody.model to fallback
                TLE-->>HPRM: JSON.stringify(parsedBody)
            end
            
            HPRM->>Buffer: tempSetBody(modifiedBody)
            Buffer-->>HPRM: void
        end
    end
    
    HPRM-->>Worker: HeliconeProxyRequest
    Worker->>LLM: Forward modified request
    LLM-->>Worker: Response
    Worker-->>Client: Response
Loading

3 files reviewed, 10 comments

Edit Code Review Agent Settings | Greptile

@vercel vercel bot temporarily deployed to Preview – helicone-eu October 16, 2025 03:08 Inactive
@vercel vercel bot temporarily deployed to Preview – helicone October 16, 2025 03:08 Inactive
@vercel vercel bot temporarily deployed to Preview – helicone-bifrost October 16, 2025 03:08 Inactive
Comment on lines 448 to 625
export function getModelTokenLimit(
provider: Provider,
model: string | null | undefined
): number | null {
if (!model) {
return null;
}

const providerName = heliconeProviderToModelProviderName(provider);
if (!providerName) {
return null;
}

const config = findModelProviderConfig(model, providerName);
if (!config || typeof config.contextLength !== "number") {
return null;
}

return config.contextLength;
}

export function findModelProviderConfig(
model: string,
providerName: ModelProviderName
): ModelProviderConfig | null {
const directConfig = lookupProviderConfig(model, providerName);
if (directConfig) {
return directConfig;
}
return searchProviderModels(model, providerName);
}

export function lookupProviderConfig(
model: string,
providerName: ModelProviderName
): ModelProviderConfig | null {
const candidates = buildLookupCandidates(model);
for (const candidate of candidates) {
const result = registry.getModelProviderConfigByProviderModelId(
candidate,
providerName
);
if (result.error === null && result.data) {
return result.data;
}
}
return null;
}

export function searchProviderModels(
model: string,
providerName: ModelProviderName
): ModelProviderConfig | null {
const providerModelsResult = registry.getProviderModels(providerName);
if (providerModelsResult.error !== null || !providerModelsResult.data) {
return null;
}

for (const canonicalModel of providerModelsResult.data.values()) {
const configsResult = registry.getModelProviderConfigs(canonicalModel);
if (configsResult.error !== null || !configsResult.data) {
continue;
}

for (const config of configsResult.data) {
if (config.provider !== providerName) {
continue;
}

if (modelIdentifierMatches(model, config.providerModelId)) {
return config;
}
}
}

return null;
}

export function buildLookupCandidates(model: string): string[] {
const trimmed = model.trim();
if (!trimmed) {
return [];
}

const candidates = new Set<string>();
candidates.add(trimmed);

const lower = trimmed.toLowerCase();
if (lower !== trimmed) {
candidates.add(lower);
}

const delimiters = [":", "-"];
for (const delimiter of delimiters) {
let current = trimmed;
while (current.includes(delimiter)) {
current = current.substring(0, current.lastIndexOf(delimiter));
const normalized = current.trim();
if (!normalized || candidates.has(normalized)) {
continue;
}
candidates.add(normalized);
candidates.add(normalized.toLowerCase());
}
}

return Array.from(candidates);
}

export function modelIdentifierMatches(
requestModel: string,
providerModelId: string
): boolean {
const requestVariants = buildModelIdentifierVariants(requestModel);
const providerVariants = buildModelIdentifierVariants(providerModelId);

for (const requestVariant of requestVariants) {
for (const providerVariant of providerVariants) {
if (requestVariant === providerVariant) {
return true;
}

if (
requestVariant.endsWith(`/${providerVariant}`) ||
requestVariant.endsWith(`:${providerVariant}`) ||
requestVariant.endsWith(`-${providerVariant}`)
) {
return true;
}

if (
providerVariant.endsWith(`/${requestVariant}`) ||
providerVariant.endsWith(`:${requestVariant}`) ||
providerVariant.endsWith(`-${requestVariant}`)
) {
return true;
}
}
}

const sanitizedRequest = sanitizeModelIdentifier(requestModel);
const sanitizedProvider = sanitizeModelIdentifier(providerModelId);

if (sanitizedRequest.length === 0 || sanitizedProvider.length === 0) {
return false;
}

const index = sanitizedRequest.indexOf(sanitizedProvider);
if (index > 0) {
return true;
}

return false;
}

export function buildModelIdentifierVariants(identifier: string): string[] {
const trimmed = identifier.trim();
if (!trimmed) {
return [];
}

const lower = trimmed.toLowerCase();
const variants = new Set<string>([trimmed, lower]);

const delimiterParts = lower.split(/[:\/]/);
if (delimiterParts.length > 1) {
const lastPart = delimiterParts[delimiterParts.length - 1];
if (lastPart) {
variants.add(lastPart);
}
}

return Array.from(variants).filter((variant) => variant.length > 0);
}

export function sanitizeModelIdentifier(identifier: string): string {
return identifier.toLowerCase().replace(/[^a-z0-9]/g, "");
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have any alternatives for this? shit is heinous to look at

@Helicone Helicone deleted a comment from greptile-apps bot Oct 16, 2025
@Helicone Helicone deleted a comment from greptile-apps bot Oct 16, 2025
@vercel vercel bot temporarily deployed to Preview – helicone October 16, 2025 03:17 Inactive
@vercel vercel bot temporarily deployed to Preview – helicone-eu October 16, 2025 03:17 Inactive
@vercel vercel bot temporarily deployed to Preview – helicone-bifrost October 16, 2025 03:17 Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants