Skip to content

P1: Add LLM burst guard (concurrency caps + jittered queueing) #99

@bleuropa

Description

@bleuropa

Summary

Add burst guard controls to reduce provider 429 spikes during multi-agent fan-out.

Context

Current behavior uses retry/backoff and token bucket acquisition, but lacks request-concurrency shaping at burst time.

Relevant code:

  • lib/loomkin/teams/rate_limiter.ex#L23-L80
  • lib/loomkin/llm_retry.ex#L43-L48

Scope

  • Introduce per-provider and per-team in-flight request caps.
  • Queue with jittered dispatch to avoid synchronized retries.
  • Keep existing retry behavior but reduce coordinated burst pressure.

Acceptance Criteria

  • Under synthetic multi-agent load, 429 rate decreases compared to baseline.
  • Requests are queued and drained predictably instead of burst-failing.
  • Telemetry exposes queue depth, wait time, and throttling events.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions