Skip to content

Feature: Real-time Context Budget Guards #71

@Sahil5963

Description

@Sahil5963

Summary

Implement configurable real-time context budget enforcement to prevent context overflow during chat sessions.

Problem

Without budget enforcement:

  • Large tool results can overflow context mid-conversation
  • No visibility into current context usage
  • Failures happen unexpectedly when limits are hit
  • Difficult to debug context-related issues

Proposed Solution

Configuration Options

interface ContextBudgetConfig {
  enabled?: boolean;                    // Default: false
  
  budget?: {
    contextWindowTokens?: number;       // Your model's context window
    inputHeadroomRatio?: number;        // Default: 0.75 (reserve 25% for output)
    
    // Per-category limits (% of available budget)
    systemPromptShare?: number;         // Default: 0.15 (15%)
    historyShare?: number;              // Default: 0.50 (50%)
    toolResultsShare?: number;          // Default: 0.30 (30%)
    skillsShare?: number;               // Default: 0.05 (5%)
  };
  
  enforcement?: {
    mode?: 'warn' | 'truncate' | 'error';  // Default: 'truncate'
    onBudgetExceeded?: (info: BudgetInfo) => void;
  };
  
  monitoring?: {
    enabled?: boolean;                  // Track usage over time
    onUsageUpdate?: (usage: ContextUsage) => void;
  };
}

// Usage
const chat = createChatWithTools({
  contextBudget: {
    enabled: true,
    budget: {
      contextWindowTokens: 128_000,     // e.g., GPT-4 Turbo
      inputHeadroomRatio: 0.75,
      historyShare: 0.5,
      toolResultsShare: 0.3,
    },
    enforcement: {
      mode: 'truncate',
      onBudgetExceeded: (info) => {
        console.warn('Context budget exceeded:', info);
      }
    },
    monitoring: {
      enabled: true,
      onUsageUpdate: (usage) => {
        // Update UI with current usage
        updateUsageIndicator(usage);
      }
    }
  }
});

Context Usage Tracking

interface ContextUsage {
  total: { tokens: number; percent: number };
  breakdown: {
    systemPrompt: { tokens: number; percent: number };
    history: { tokens: number; percent: number };
    toolResults: { tokens: number; percent: number };
    skills: { tokens: number; percent: number };
  };
  budget: {
    available: number;
    remaining: number;
  };
}

// Get current usage
const usage = chat.getContextUsage();
console.log(`Using ${usage.total.percent}% of context budget`);

Enforcement Modes

Mode Behavior Use Case
warn Log warning, continue anyway Development/debugging
truncate Auto-truncate to fit budget Production (recommended)
error Throw error when exceeded Strict environments

Budget Guard Middleware

// Automatically installed when enabled
// Runs before every LLM call to ensure budget compliance

contextBudget: {
  enforcement: {
    mode: 'truncate',
    // Order of truncation when over budget:
    // 1. Oldest tool results
    // 2. Oldest history messages  
    // 3. Trim current tool results
  }
}

Use Cases

  • Production deployments needing reliability
  • Cost-conscious applications
  • Real-time usage monitoring in UI
  • Debugging context-related issues

Benefits

  • Fully optional - disabled by default
  • Configurable budgets - tune per model/use case
  • ✅ Prevents unexpected context overflow
  • ✅ Real-time usage visibility
  • ✅ Multiple enforcement modes
  • ✅ Callback hooks for custom handling
  • ✅ Per-category budget allocation

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions