Skip to content

Conversation

@connortbot
Copy link
Collaborator

https://linear.app/helicone/issue/ENG-3548/fallback-for-keys-when-kvcache-dies
Affects Jawn + Worker

Context

In the past we've experienced times when KVCache would fail. Given the traffic of the AI Gateway these days and the way its trending, if this ever happens again our Postgres will die.
This adds another layer of caching for provider keys by falling back to a Durable Object namespace built for this use case.

@connortbot connortbot requested a review from chitalian November 18, 2025 01:47
@vercel
Copy link

vercel bot commented Nov 18, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

3 Skipped Deployments
Project Deployment Preview Comments Updated (UTC)
helicone Skipped Skipped Nov 18, 2025 1:47am
helicone-bifrost Skipped Skipped Nov 18, 2025 1:47am
helicone-eu Skipped Skipped Nov 18, 2025 1:47am

@claude
Copy link
Contributor

claude bot commented Nov 18, 2025

Claude finished @connortbot's task —— View job


PR Review: Durable Object fallback for AI Gateway provider keys

Todo List:

  • Read repository guidelines (CLAUDE.md)
  • Analyze new Durable Object implementation
  • Review Jawn backend changes
  • Review Worker integration changes
  • Check error handling and fallback logic
  • Assess security implications
  • Evaluate performance impact
  • Provide comprehensive feedback with confidence score

Score: 6/10 - Minor issues and improvements suggested, merge with consideration

Critical Issues & Security Concerns:

1. Type Safety Issue in ProviderKeyCache worker/src/lib/durable-objects/ProviderKeyCache.ts:156

provider: row.provider as any, // Cast from string to ModelProviderName

Problem: Unsafe type casting without validation could lead to runtime errors if invalid provider names are stored.

Suggestion: Add validation:

private validateProvider(provider: string): ModelProviderName {
  const validProviders = ['openai', 'anthropic', 'azure']; // Add all valid providers
  if (!validProviders.includes(provider)) {
    throw new Error(`Invalid provider: ${provider}`);
  }
  return provider as ModelProviderName;
}

// Then use: provider: this.validateProvider(row.provider)

2. Missing Error Boundaries in ProviderKeysManager worker/src/lib/managers/ProviderKeysManager.ts:118-133

Problem: If the Durable Object binding is undefined or DO operations fail, the system may not gracefully degrade.

Suggestion: Add better null checks and error handling:

if (kvFailed && this.env.PROVIDER_KEY_CACHE) {
  try {
    const doId = this.env.PROVIDER_KEY_CACHE.idFromName(orgId);
    const doStub: any = this.env.PROVIDER_KEY_CACHE.get(doId);
    
    // Add timeout and better error handling
    const doKeys: ProviderKey[] | null = await Promise.race([
      doStub.getProviderKeys(orgId),
      new Promise((_, reject) => setTimeout(() => reject(new Error('DO timeout')), 5000))
    ]);
    // ... rest of logic
  } catch (error) {
    console.error("DO cache failed:", error);
    // Consider adding metrics/alerting here
  }
}

Performance & Optimization Issues:

3. Inefficient Cache Update Strategy worker/src/lib/managers/ProviderKeysManager.ts:157-183

Problem: The KV cache update logic fetches existing keys and appends new ones, which could lead to duplicates and bloated cache.

Suggestion: Replace the cache entirely instead of appending:

// Instead of appending, replace the entire cache
await storeInCache(
  `provider_keys_${orgId}`,
  JSON.stringify(keysFromDb), // Use fresh data from DB
  this.env,
  43200
);

4. Database Schema Optimization worker/src/lib/durable-objects/ProviderKeyCache.ts:26-50

Problem: The schema creates a composite primary key that might not be optimal for the query patterns.

Suggestion: Consider adding a dedicated primary key:

CREATE TABLE IF NOT EXISTS provider_keys (
  id TEXT PRIMARY KEY DEFAULT (hex(randomblob(16))),
  org_id TEXT NOT NULL,
  provider TEXT NOT NULL,
  -- ... other fields
  UNIQUE(org_id, provider, cuid)
);

Code Quality Improvements:

5. Hardcoded TTL Values - Multiple files

Problem: TTL value (43200 seconds / 12 hours) is hardcoded in multiple places.

Suggestion: Create a constant:

export const PROVIDER_KEY_CACHE_TTL = 43200; // 12 hours

6. Inconsistent Error Handling worker/src/routers/api/doRouter.ts:54-60

Problem: Error handling returns different response formats.

Suggestion: Standardize error responses:

catch (e) {
  const error = e instanceof Error ? e.message : "Internal server error";
  console.error("Error updating provider keys in DO:", e);
  return InternalResponse.newError(error, 500);
}

7. Missing Input Validation valhalla/jawn/src/lib/clients/cloudflareKV.ts:186-231

Problem: The updateProviderKeysInDO function doesn't validate provider keys before sending.

Suggestion: Add validation:

export async function updateProviderKeysInDO(
  orgId: string,
  providerKeys: ProviderKey[]
): Promise<{ success: boolean; error?: string }> {
  // Validate inputs
  if (!orgId || typeof orgId !== 'string') {
    return { success: false, error: 'Invalid orgId' };
  }
  
  if (!Array.isArray(providerKeys)) {
    return { success: false, error: 'Invalid providerKeys format' };
  }
  
  // ... rest of implementation
}

Architectural Concerns:

8. Authentication Security worker/src/routers/api/auth.ts:22-31

Problem: While timing-safe comparison is good, the function could be more robust.

Suggestion: Add rate limiting and additional security headers:

export function validateAdminAuth(
  requestWrapper: RequestWrapper,
  env: Env,
  rateLimitKey?: string // Add optional rate limiting
): Response | null {
  // Add rate limiting logic here if rateLimitKey provided
  
  const authHeader = requestWrapper.headers.get("Authorization");
  if (!authHeader?.startsWith("Bearer ")) {
    return InternalResponse.unauthorized();
  }
  // ... rest of validation
}

Minor Improvements:

9. Transaction Usage worker/src/lib/durable-objects/ProviderKeyCache.ts:99

The transaction usage is good, but consider adding error handling within the transaction.

10. Documentation - Multiple files

Consider adding JSDoc comments to public methods for better maintainability.


Suggestions Summary:

Critical Fixes Needed:

Recommended Improvements:

Performance Optimizations:

Security Enhancements:

The implementation provides a solid fallback mechanism for provider key caching, which addresses the core requirement. The Durable Object approach is architecturally sound and the three-tier fallback strategy (KV → DO → Postgres) is well-designed for reliability.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Nov 18, 2025

Greptile Summary

  • Adds Durable Object-based fallback cache for provider keys when KV namespace fails, protecting Postgres from traffic spikes
  • Implements three-tier caching strategy: KV Cache → DO Cache → Postgres with automatic fallback and cache repopulation
  • Creates new authenticated API endpoints for Jawn to update DO cache alongside KV updates

Confidence Score: 3/5

  • Safe to merge with caution - solid architecture but has cache consistency bugs that need fixing
  • Score reflects well-designed fallback architecture and proper authentication, but deducted points for cache duplication bug in ProviderKeysManager.ts:162-164 where DB keys are appended instead of merged, and missing error handling for JSON.parse in ProviderKeyCache.ts:161 that could crash the Durable Object
  • Pay close attention to worker/src/lib/managers/ProviderKeysManager.ts for the cache merging logic that creates duplicates

Important Files Changed

Filename Overview
worker/src/lib/durable-objects/ProviderKeyCache.ts New Durable Object implementation for provider key caching with SQL storage, TTL checking, and transaction support
worker/src/lib/managers/ProviderKeysManager.ts Implements fallback logic: KV → DO → Postgres, with cache writes after DB fetch but potential cache consistency issues
valhalla/jawn/src/lib/clients/cloudflareKV.ts Added DO update and invalidation functions for provider keys with HTTP calls to worker endpoints

Sequence Diagram

sequenceDiagram
    participant User
    participant Worker
    participant KVCache
    participant DOCache
    participant Postgres
    
    User->>Worker: Request with provider key needed
    Worker->>KVCache: Get provider keys for org
    alt KV Success
        KVCache-->>Worker: Return cached keys
        Worker-->>User: Process request with key
    else KV Fails
        KVCache--xWorker: Error/Timeout
        Worker->>DOCache: Fallback: Get provider keys
        alt DO Success
            DOCache-->>Worker: Return cached keys
            Worker-->>User: Process request with key
        else DO Fails/Empty
            DOCache--xWorker: Error/Empty
            Worker->>Postgres: Fetch from database
            Postgres-->>Worker: Return keys from DB
            Worker->>DOCache: Update DO cache
            Worker->>KVCache: Update KV cache (if not failed)
            Worker-->>User: Process request with key
        end
    end
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

12 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile
React with 👍 or 👎 to share your feedback on this new summary format

Comment on lines +162 to +164
if (existingKeys) {
const existingKeysData = JSON.parse(existingKeys) as ProviderKey[];
existingKeysData.push(...keysFromDb);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: After fetching from DB, keysFromDb is appended to existing KV cache, which can create duplicates if the same keys are already cached.

Suggested change
if (existingKeys) {
const existingKeysData = JSON.parse(existingKeys) as ProviderKey[];
existingKeysData.push(...keysFromDb);
const existingKeysData = JSON.parse(existingKeys) as ProviderKey[];
// Merge keys, preferring fresh data from DB
const keyMap = new Map(existingKeysData.map(k => [`${k.provider}-${k.cuid || ''}`, k]));
keysFromDb.forEach(k => keyMap.set(`${k.provider}-${k.cuid || ''}`, k));
await storeInCache(
`provider_keys_${orgId}`,
JSON.stringify(Array.from(keyMap.values())),
Prompt To Fix With AI
This is a comment left during a code review.
Path: worker/src/lib/managers/ProviderKeysManager.ts
Line: 162:164

Comment:
**logic:** After fetching from DB, `keysFromDb` is appended to existing KV cache, which can create duplicates if the same keys are already cached.

```suggestion
          const existingKeysData = JSON.parse(existingKeys) as ProviderKey[];
          // Merge keys, preferring fresh data from DB
          const keyMap = new Map(existingKeysData.map(k => [`${k.provider}-${k.cuid || ''}`, k]));
          keysFromDb.forEach(k => keyMap.set(`${k.provider}-${k.cuid || ''}`, k));
          await storeInCache(
            `provider_keys_${orgId}`,
            JSON.stringify(Array.from(keyMap.values())),
```

How can I resolve this? If you propose a fix, please make it concise.

decrypted_provider_secret_key: row.decrypted_provider_secret_key,
auth_type: row.auth_type as "key" | "session_token",
byok_enabled: row.byok_enabled !== null ? row.byok_enabled === 1 : null,
config: row.config ? JSON.parse(row.config) : null,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: JSON.parse on potentially malformed config data lacks error handling - parsing failure will crash the DO

Suggested change
config: row.config ? JSON.parse(row.config) : null,
config: row.config ? (() => { try { return JSON.parse(row.config); } catch { return null; } })() : null,
Prompt To Fix With AI
This is a comment left during a code review.
Path: worker/src/lib/durable-objects/ProviderKeyCache.ts
Line: 161:161

Comment:
**logic:** JSON.parse on potentially malformed config data lacks error handling - parsing failure will crash the DO

```suggestion
      config: row.config ? (() => { try { return JSON.parse(row.config); } catch { return null; } })() : null,
```

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants