[Bug] Streaming completion path skips num_retries / retry_strategy — no retry on rate limit errors

## Description

When using `dspy.streamify()` (or any configuration where `dspy.settings.send_stream` is set), all LLM calls go through `_get_stream_completion_fn` in `dspy/clients/lm.py`. This streaming path calls `litellm.acompletion(stream=True, ...)` **without** passing `num_retries` or `retry_strategy`, so rate limit errors (429) crash immediately instead of retrying with exponential backoff.

The non-streaming path correctly passes these parameters:

```python
# Non-streaming (line ~410 in alitellm_completion) — HAS retries ✅
return await litellm.acompletion(
    cache=cache,
    num_retries=num_retries,
    retry_strategy="exponential_backoff_retry",
    headers=headers,
    **request,
)
```

```python
# Streaming (_get_stream_completion_fn, line ~325) — NO retries ❌
response = await litellm.acompletion(
    cache=cache_kwargs,
    stream=True,
    headers=headers,
    **request,  # num_retries and retry_strategy are missing
)
```

## Reproduction

```python
import dspy

# Configure with retries
lm = dspy.LM("anthropic/claude-opus-4-6", num_retries=5)
dspy.configure(lm=lm)

class SimpleQA(dspy.Signature):
    question: str = dspy.InputField()
    answer: str = dspy.OutputField()

module = dspy.ChainOfThought(SimpleQA)

# Without streamify — retries work correctly
result = module(question="What is 2+2?")  # Will retry on 429

# With streamify — retries are skipped
streamed = dspy.streamify(module)
result = streamed(question="What is 2+2?")  # Crashes immediately on 429
```

To trigger the rate limit error, run many concurrent calls against an API with a low rate limit, or temporarily set a very low TPM limit.

## Impact

Any application using `dspy.streamify()` has **zero retry protection** against rate limit errors, even when `num_retries` is explicitly configured. This is particularly severe for:

- Production workloads using streaming for real-time output
- Batch processing with many concurrent LLM calls
- Any deployment hitting API rate limits

## Root Cause

`_get_stream_completion_fn` (line 305) does not accept or forward `num_retries`:

1. `litellm_completion()` and `alitellm_completion()` receive `num_retries` as a parameter
2. They call `_get_stream_completion_fn()` which does not accept `num_retries`
3. The closure returned by `_get_stream_completion_fn` calls `litellm.acompletion()` without retry params
4. The caller functions then call `stream_completion()` directly, bypassing the retry logic that exists in the non-streaming `else` branch

## Suggested Fix

Pass `num_retries` through `_get_stream_completion_fn` and apply it to the `litellm.acompletion` call, or wrap the streaming call with tenacity retry logic matching the non-streaming path:

```python
def _get_stream_completion_fn(
    request: dict[str, Any],
    cache_kwargs: dict[str, Any],
    num_retries: int = 0,  # Add this parameter
    sync=True,
    headers: dict[str, Any] | None = None,
):
    # ... existing code ...

    async def stream_completion(request, cache_kwargs):
        response = await litellm.acompletion(
            cache=cache_kwargs,
            stream=True,
            headers=headers,
            num_retries=num_retries,                    # Add this
            retry_strategy="exponential_backoff_retry",  # Add this
            **request,
        )
        # ... rest unchanged ...
```

And update callers:

```python
# In litellm_completion():
stream_completion = _get_stream_completion_fn(request, cache, num_retries=num_retries, sync=True, headers=headers)

# In alitellm_completion():
stream_completion = _get_stream_completion_fn(request, cache, num_retries=num_retries, sync=False)
```

**Note:** If `litellm.acompletion(stream=True)` doesn't honor `num_retries` internally (since the stream connection is already opened), the alternative is wrapping the entire `stream_completion` closure with `tenacity.retry` matching the same exponential backoff strategy.

## Environment

- DSPy version: 3.1.3 (also confirmed on latest `main` branch)
- Python: 3.13
- litellm: 1.x

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Streaming completion path skips num_retries / retry_strategy — no retry on rate limit errors #9459

Description

Reproduction

Impact

Root Cause

Suggested Fix

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] Streaming completion path skips num_retries / retry_strategy — no retry on rate limit errors #9459

Description

Description

Reproduction

Impact

Root Cause

Suggested Fix

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions