fix(clients): pass num_retries and retry_strategy to streaming LLM calls by abdelhadi703 · Pull Request #9507 · stanfordnlp/dspy

abdelhadi703 · 2026-03-25T04:41:34Z

Summary

Fix streaming LLM calls to respect num_retries and retry_strategy parameters.

Problem: dspy.streamify() uses _get_stream_completion_fn() which calls litellm.acompletion(stream=True) without num_retries or retry_strategy. Rate limit errors (429) crash immediately instead of retrying with exponential backoff.

Fix:

Add num_retries parameter to _get_stream_completion_fn()
Forward num_retries and retry_strategy="exponential_backoff_retry" to litellm.acompletion(stream=True)
Update both litellm_completion() and alitellm_completion() to pass num_retries
Bonus fix: alitellm_completion() was also missing headers — now added

Changes

dspy/clients/lm.py:
- _get_stream_completion_fn: added num_retries=0 parameter
- stream_completion() closure: pass num_retries and retry_strategy to litellm.acompletion(stream=True)
- litellm_completion: pass num_retries to _get_stream_completion_fn
- alitellm_completion: pass num_retries and headers to _get_stream_completion_fn

Reproduction (from issue)

import dspy
lm = dspy.LM("anthropic/claude-opus-4-6", num_retries=5)
dspy.configure(lm=lm)

# Without streamify — retries work ✅
result = module(question="What is 2+2?")

# With streamify — now retries work too ✅
streamed = dspy.streamify(module)
result = streamed(question="What is 2+2?")

Security/Reliability

No new dependencies
Graceful degradation: retry_strategy=None when num_retries=0 preserves existing behavior
Matches pattern used in non-streaming code path

Contribution by abdelhadisalmaoui0909@outlook.fr

The streaming path via _get_stream_completion_fn() was calling litellm.acompletion(stream=True) without num_retries or retry_strategy, causing rate limit errors (429) to crash immediately instead of retrying with exponential backoff. Fix: pass num_retries and retry_strategy="exponential_backoff_retry" to _get_stream_completion_fn() and forward them to litellm.acompletion() in the stream_completion closure. Additionally, alitellm_completion() was not passing headers to _get_stream_completion_fn() — fixed alongside. Fixes stanfordnlp#9459

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(clients): pass num_retries and retry_strategy to streaming LLM calls#9507

fix(clients): pass num_retries and retry_strategy to streaming LLM calls#9507
abdelhadi703 wants to merge 1 commit intostanfordnlp:mainfrom
abdelhadi703:fix/streaming-num-retries

abdelhadi703 commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

abdelhadi703 commented Mar 25, 2026

Summary

Changes

Reproduction (from issue)

Security/Reliability

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant