Skip to content

[Bug] Streaming completion path skips num_retries / retry_strategy — no retry on rate limit errors #9459

@felipesaezreyes

Description

@felipesaezreyes

Description

When using dspy.streamify() (or any configuration where dspy.settings.send_stream is set), all LLM calls go through _get_stream_completion_fn in dspy/clients/lm.py. This streaming path calls litellm.acompletion(stream=True, ...) without passing num_retries or retry_strategy, so rate limit errors (429) crash immediately instead of retrying with exponential backoff.

The non-streaming path correctly passes these parameters:

# Non-streaming (line ~410 in alitellm_completion) — HAS retries ✅
return await litellm.acompletion(
    cache=cache,
    num_retries=num_retries,
    retry_strategy="exponential_backoff_retry",
    headers=headers,
    **request,
)
# Streaming (_get_stream_completion_fn, line ~325) — NO retries ❌
response = await litellm.acompletion(
    cache=cache_kwargs,
    stream=True,
    headers=headers,
    **request,  # num_retries and retry_strategy are missing
)

Reproduction

import dspy

# Configure with retries
lm = dspy.LM("anthropic/claude-opus-4-6", num_retries=5)
dspy.configure(lm=lm)

class SimpleQA(dspy.Signature):
    question: str = dspy.InputField()
    answer: str = dspy.OutputField()

module = dspy.ChainOfThought(SimpleQA)

# Without streamify — retries work correctly
result = module(question="What is 2+2?")  # Will retry on 429

# With streamify — retries are skipped
streamed = dspy.streamify(module)
result = streamed(question="What is 2+2?")  # Crashes immediately on 429

To trigger the rate limit error, run many concurrent calls against an API with a low rate limit, or temporarily set a very low TPM limit.

Impact

Any application using dspy.streamify() has zero retry protection against rate limit errors, even when num_retries is explicitly configured. This is particularly severe for:

  • Production workloads using streaming for real-time output
  • Batch processing with many concurrent LLM calls
  • Any deployment hitting API rate limits

Root Cause

_get_stream_completion_fn (line 305) does not accept or forward num_retries:

  1. litellm_completion() and alitellm_completion() receive num_retries as a parameter
  2. They call _get_stream_completion_fn() which does not accept num_retries
  3. The closure returned by _get_stream_completion_fn calls litellm.acompletion() without retry params
  4. The caller functions then call stream_completion() directly, bypassing the retry logic that exists in the non-streaming else branch

Suggested Fix

Pass num_retries through _get_stream_completion_fn and apply it to the litellm.acompletion call, or wrap the streaming call with tenacity retry logic matching the non-streaming path:

def _get_stream_completion_fn(
    request: dict[str, Any],
    cache_kwargs: dict[str, Any],
    num_retries: int = 0,  # Add this parameter
    sync=True,
    headers: dict[str, Any] | None = None,
):
    # ... existing code ...

    async def stream_completion(request, cache_kwargs):
        response = await litellm.acompletion(
            cache=cache_kwargs,
            stream=True,
            headers=headers,
            num_retries=num_retries,                    # Add this
            retry_strategy="exponential_backoff_retry",  # Add this
            **request,
        )
        # ... rest unchanged ...

And update callers:

# In litellm_completion():
stream_completion = _get_stream_completion_fn(request, cache, num_retries=num_retries, sync=True, headers=headers)

# In alitellm_completion():
stream_completion = _get_stream_completion_fn(request, cache, num_retries=num_retries, sync=False)

Note: If litellm.acompletion(stream=True) doesn't honor num_retries internally (since the stream connection is already opened), the alternative is wrapping the entire stream_completion closure with tenacity.retry matching the same exponential backoff strategy.

Environment

  • DSPy version: 3.1.3 (also confirmed on latest main branch)
  • Python: 3.13
  • litellm: 1.x

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions