Description
When using dspy.streamify() (or any configuration where dspy.settings.send_stream is set), all LLM calls go through _get_stream_completion_fn in dspy/clients/lm.py. This streaming path calls litellm.acompletion(stream=True, ...) without passing num_retries or retry_strategy, so rate limit errors (429) crash immediately instead of retrying with exponential backoff.
The non-streaming path correctly passes these parameters:
# Non-streaming (line ~410 in alitellm_completion) — HAS retries ✅
return await litellm.acompletion(
cache=cache,
num_retries=num_retries,
retry_strategy="exponential_backoff_retry",
headers=headers,
**request,
)
# Streaming (_get_stream_completion_fn, line ~325) — NO retries ❌
response = await litellm.acompletion(
cache=cache_kwargs,
stream=True,
headers=headers,
**request, # num_retries and retry_strategy are missing
)
Reproduction
import dspy
# Configure with retries
lm = dspy.LM("anthropic/claude-opus-4-6", num_retries=5)
dspy.configure(lm=lm)
class SimpleQA(dspy.Signature):
question: str = dspy.InputField()
answer: str = dspy.OutputField()
module = dspy.ChainOfThought(SimpleQA)
# Without streamify — retries work correctly
result = module(question="What is 2+2?") # Will retry on 429
# With streamify — retries are skipped
streamed = dspy.streamify(module)
result = streamed(question="What is 2+2?") # Crashes immediately on 429
To trigger the rate limit error, run many concurrent calls against an API with a low rate limit, or temporarily set a very low TPM limit.
Impact
Any application using dspy.streamify() has zero retry protection against rate limit errors, even when num_retries is explicitly configured. This is particularly severe for:
- Production workloads using streaming for real-time output
- Batch processing with many concurrent LLM calls
- Any deployment hitting API rate limits
Root Cause
_get_stream_completion_fn (line 305) does not accept or forward num_retries:
litellm_completion() and alitellm_completion() receive num_retries as a parameter
- They call
_get_stream_completion_fn() which does not accept num_retries
- The closure returned by
_get_stream_completion_fn calls litellm.acompletion() without retry params
- The caller functions then call
stream_completion() directly, bypassing the retry logic that exists in the non-streaming else branch
Suggested Fix
Pass num_retries through _get_stream_completion_fn and apply it to the litellm.acompletion call, or wrap the streaming call with tenacity retry logic matching the non-streaming path:
def _get_stream_completion_fn(
request: dict[str, Any],
cache_kwargs: dict[str, Any],
num_retries: int = 0, # Add this parameter
sync=True,
headers: dict[str, Any] | None = None,
):
# ... existing code ...
async def stream_completion(request, cache_kwargs):
response = await litellm.acompletion(
cache=cache_kwargs,
stream=True,
headers=headers,
num_retries=num_retries, # Add this
retry_strategy="exponential_backoff_retry", # Add this
**request,
)
# ... rest unchanged ...
And update callers:
# In litellm_completion():
stream_completion = _get_stream_completion_fn(request, cache, num_retries=num_retries, sync=True, headers=headers)
# In alitellm_completion():
stream_completion = _get_stream_completion_fn(request, cache, num_retries=num_retries, sync=False)
Note: If litellm.acompletion(stream=True) doesn't honor num_retries internally (since the stream connection is already opened), the alternative is wrapping the entire stream_completion closure with tenacity.retry matching the same exponential backoff strategy.
Environment
- DSPy version: 3.1.3 (also confirmed on latest
main branch)
- Python: 3.13
- litellm: 1.x
Description
When using
dspy.streamify()(or any configuration wheredspy.settings.send_streamis set), all LLM calls go through_get_stream_completion_fnindspy/clients/lm.py. This streaming path callslitellm.acompletion(stream=True, ...)without passingnum_retriesorretry_strategy, so rate limit errors (429) crash immediately instead of retrying with exponential backoff.The non-streaming path correctly passes these parameters:
Reproduction
To trigger the rate limit error, run many concurrent calls against an API with a low rate limit, or temporarily set a very low TPM limit.
Impact
Any application using
dspy.streamify()has zero retry protection against rate limit errors, even whennum_retriesis explicitly configured. This is particularly severe for:Root Cause
_get_stream_completion_fn(line 305) does not accept or forwardnum_retries:litellm_completion()andalitellm_completion()receivenum_retriesas a parameter_get_stream_completion_fn()which does not acceptnum_retries_get_stream_completion_fncallslitellm.acompletion()without retry paramsstream_completion()directly, bypassing the retry logic that exists in the non-streamingelsebranchSuggested Fix
Pass
num_retriesthrough_get_stream_completion_fnand apply it to thelitellm.acompletioncall, or wrap the streaming call with tenacity retry logic matching the non-streaming path:And update callers:
Note: If
litellm.acompletion(stream=True)doesn't honornum_retriesinternally (since the stream connection is already opened), the alternative is wrapping the entirestream_completionclosure withtenacity.retrymatching the same exponential backoff strategy.Environment
mainbranch)