Skip to content

fix(clients): pass num_retries and retry_strategy to streaming LLM calls#9507

Open
abdelhadi703 wants to merge 1 commit intostanfordnlp:mainfrom
abdelhadi703:fix/streaming-num-retries
Open

fix(clients): pass num_retries and retry_strategy to streaming LLM calls#9507
abdelhadi703 wants to merge 1 commit intostanfordnlp:mainfrom
abdelhadi703:fix/streaming-num-retries

Conversation

@abdelhadi703
Copy link
Copy Markdown

Summary

Fix streaming LLM calls to respect num_retries and retry_strategy parameters.

Problem: dspy.streamify() uses _get_stream_completion_fn() which calls litellm.acompletion(stream=True) without num_retries or retry_strategy. Rate limit errors (429) crash immediately instead of retrying with exponential backoff.

Fix:

  1. Add num_retries parameter to _get_stream_completion_fn()
  2. Forward num_retries and retry_strategy="exponential_backoff_retry" to litellm.acompletion(stream=True)
  3. Update both litellm_completion() and alitellm_completion() to pass num_retries
  4. Bonus fix: alitellm_completion() was also missing headers — now added

Changes

  • dspy/clients/lm.py:
    • _get_stream_completion_fn: added num_retries=0 parameter
    • stream_completion() closure: pass num_retries and retry_strategy to litellm.acompletion(stream=True)
    • litellm_completion: pass num_retries to _get_stream_completion_fn
    • alitellm_completion: pass num_retries and headers to _get_stream_completion_fn

Reproduction (from issue)

import dspy
lm = dspy.LM("anthropic/claude-opus-4-6", num_retries=5)
dspy.configure(lm=lm)

# Without streamify — retries work ✅
result = module(question="What is 2+2?")

# With streamify — now retries work too ✅
streamed = dspy.streamify(module)
result = streamed(question="What is 2+2?")

Security/Reliability

  • No new dependencies
  • Graceful degradation: retry_strategy=None when num_retries=0 preserves existing behavior
  • Matches pattern used in non-streaming code path

Contribution by abdelhadisalmaoui0909@outlook.fr

The streaming path via _get_stream_completion_fn() was calling
litellm.acompletion(stream=True) without num_retries or retry_strategy,
causing rate limit errors (429) to crash immediately instead of retrying
with exponential backoff.

Fix: pass num_retries and retry_strategy="exponential_backoff_retry"
to _get_stream_completion_fn() and forward them to litellm.acompletion()
in the stream_completion closure.

Additionally, alitellm_completion() was not passing headers to
_get_stream_completion_fn() — fixed alongside.

Fixes stanfordnlp#9459
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant