Summary
Add burst guard controls to reduce provider 429 spikes during multi-agent fan-out.
Context
Current behavior uses retry/backoff and token bucket acquisition, but lacks request-concurrency shaping at burst time.
Relevant code:
lib/loomkin/teams/rate_limiter.ex#L23-L80
lib/loomkin/llm_retry.ex#L43-L48
Scope
- Introduce per-provider and per-team in-flight request caps.
- Queue with jittered dispatch to avoid synchronized retries.
- Keep existing retry behavior but reduce coordinated burst pressure.
Acceptance Criteria
- Under synthetic multi-agent load,
429 rate decreases compared to baseline.
- Requests are queued and drained predictably instead of burst-failing.
- Telemetry exposes queue depth, wait time, and throttling events.