Skip to content

fix: pass num_retries to streaming completion path#9460

Open
saivedant169 wants to merge 2 commits intostanfordnlp:mainfrom
saivedant169:fix/streaming-retry-passthrough
Open

fix: pass num_retries to streaming completion path#9460
saivedant169 wants to merge 2 commits intostanfordnlp:mainfrom
saivedant169:fix/streaming-retry-passthrough

Conversation

@saivedant169
Copy link
Copy Markdown

@saivedant169 saivedant169 commented Mar 16, 2026

Fixes #9459

Description

The streaming completion path via _get_stream_completion_fn called litellm.acompletion(stream=True) without num_retries or retry_strategy, so rate limit errors (429) crashed immediately instead of retrying with exponential backoff. The non-streaming path correctly passes both parameters.

Changes

  1. Added num_retries parameter to _get_stream_completion_fn()
  2. Pass num_retries and retry_strategy="exponential_backoff_retry" to litellm.acompletion(stream=True) inside the stream closure
  3. Updated litellm_completion() caller to forward num_retries
  4. Updated alitellm_completion() caller to forward num_retries and headers (headers were also previously missing from the streaming path inconsistent with the sync caller and the non-streaming path)

Before vs After

# BEFORE: streaming no retries, crashes on 429
response = await litellm.acompletion(
    cache=cache_kwargs,
    stream=True,
    headers=headers,
    **request,
)

# AFTER: streaming retries match non-streaming path
response = await litellm.acompletion(
    cache=cache_kwargs,
    stream=True,
    num_retries=num_retries,
    retry_strategy="exponential_backoff_retry",
    headers=headers,
    **request,
)

Testing

ruff check and ruff format pass clean.

The streaming path via _get_stream_completion_fn called
litellm.acompletion(stream=True) without num_retries or
retry_strategy, so rate limit errors (429) crashed immediately
instead of retrying with exponential backoff.

Thread num_retries through _get_stream_completion_fn and pass it
along with retry_strategy to the litellm.acompletion call, matching
the non-streaming path behavior.

Also fix alitellm_completion to pass headers to the streaming path
(previously missing, inconsistent with litellm_completion).

Fixes stanfordnlp#9459
def _get_stream_completion_fn(
request: dict[str, Any],
cache_kwargs: dict[str, Any],
num_retries: int = 0,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's have num_retries as a required function parameter

headers = request.pop("headers", None)
stream_completion = _get_stream_completion_fn(request, cache, sync=False)
stream_completion = _get_stream_completion_fn(
request, cache, num_retries=num_retries, sync=False, headers=_add_dspy_identifier_to_headers(headers)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make the syntax should match lm:355

messages: list[dict[str, Any]] | None = None,
**kwargs
):
def forward(self, prompt: str | None = None, messages: list[dict[str, Any]] | None = None, **kwargs):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

undo change unless necessary for ruff

…evert formatting

- Make num_retries a required parameter in _get_stream_completion_fn
- Match alitellm_completion syntax with litellm_completion (line 355 style)
- Revert unrelated ruff formatting change on forward() signature
@saivedant169
Copy link
Copy Markdown
Author

Hey @isaacbmiller , wanted to see if there's anything else you need from me on this. Happy to make changes if something's off.

@saivedant169
Copy link
Copy Markdown
Author

Hey @isaacbmiller, all three items from your review are in. num_retries is required, the async caller matches the sync style, and the formatting is reverted. Let me know if anything else needs changing.

@saivedant169
Copy link
Copy Markdown
Author

ForgeArena Review

Decision: PASS (score: 89/100)

Evaluation

  • Tests: 0 passed, 0 failed
  • Lint: passed
  • Build: passed
  • Commands: 2/2 passed

Risk Signals

  • None detected

Policy

  • No violations

approved by ForgeArena - the merge gate for AI-generated code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Streaming completion path skips num_retries / retry_strategy — no retry on rate limit errors

2 participants