Bedrock Provider Non-Streaming and Streaming Inference Failures



### 🐛 Describe the bug

The Bedrock provider in llama-stack is returning None for both streaming and non-streaming inference requests, causing 'NoneType' object has no attribute 'choices' errors and TypeError: 'async for' requires an object with __aiter__ method, got NoneType failures.

### System Info

llama-stack version: Latest main branch
Python version: 3.12.11
Provider: remote::bedrock
Model: meta.llama3-1-8b-instruct-v1:0
Region: us-east-2
Authentication: AWS credentials file

### Information

- [ ] The official example scripts
- [x] My own modified scripts

### Error logs

Streaming Error:
```
TypeError: 'async for' requires an object with __aiter__ method, got NoneType
HTTP 200 with: data: {"error": {"message": "500: Internal server error: An unexpected error occurred."}}
```

Error Logs:
```
INFO     2025-09-30 10:41:56,425 uvicorn.access:476 uncategorized: ::1:36210 - "POST /v1/openai/v1/chat/completions HTTP/1.1" 200                     
ERROR    2025-09-30 10:41:56,426 llama_stack.core.server.server:204 core::server: Error in sse_generator                                              
         ╭──────────────────────────────────────────────────── Traceback (most recent call last) ────────────────────────────────────────────────────╮
         │ /home/skamenan/git/code/llama-stack/.worktrees/bug/bedrock-2nd-error-jira-741/llama_stack/core/server/server.py:197 in sse_generator      │
         │                                                                                                                                           │
         │   194 │   event_gen = None                                                                                                                │
         │   195 │   try:                                                                                                                            │
         │   196 │   │   event_gen = await event_gen_coroutine                                                                                       │
         │ ❱ 197 │   │   async for item in event_gen:                                                                                                │
         │   198 │   │   │   yield create_sse_event(item)                                                                                            │
         │   199 │   except asyncio.CancelledError:                                                                                                  │
         │   200 │   │   logger.info("Generator cancelled")                                                                                          │
         │                                                                                                                                           │
         │ /home/skamenan/git/code/llama-stack/.worktrees/bug/bedrock-2nd-error-jira-741/llama_stack/providers/utils/telemetry/trace_protocol.py:87  │
         │ in async_gen_wrapper                                                                                                                      │
         │                                                                                                                                           │
         │    84 │   │   │   with tracing.span(f"{class_name}.{method_name}", span_attributes) as span:                                              │
         │    85 │   │   │   │   try:                                                                                                                │
         │    86 │   │   │   │   │   count = 0                                                                                                       │
         │ ❱  87 │   │   │   │   │   async for item in method(self, *args, **kwargs):                                                                │
         │    88 │   │   │   │   │   │   yield item                                                                                                  │
         │    89 │   │   │   │   │   │   count += 1                                                                                                  │
         │    90 │   │   │   │   finally:                                                                                                            │
         │                                                                                                                                           │
         │ /home/skamenan/git/code/llama-stack/.worktrees/bug/bedrock-2nd-error-jira-741/llama_stack/core/routers/inference.py:681 in                │
         │ stream_tokens_and_compute_metrics_openai_chat                                                                                             │
         │                                                                                                                                           │
         │   678 │   │   choices_data: dict[int, dict[str, Any]] = {}                                                                                │
         │   679 │   │                                                                                                                               │
         │   680 │   │   try:                                                                                                                        │
         │ ❱ 681 │   │   │   async for chunk in response:                                                                                            │
         │   682 │   │   │   │   # Skip None chunks                                                                                                  │
         │   683 │   │   │   │   if chunk is None:                                                                                                   │
         │   684 │   │   │   │   │   continue                                                                                                        │
         ╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
         TypeError: 'async for' requires an object with __aiter__ method, got NoneType                    
```

Non-Streaming Error:
```
ERROR: 'NoneType' object has no attribute 'choices'
HTTP 500: {"detail":"Internal server error: An unexpected error occurred."}
```
 Error Log:
```
ERROR    2025-09-30 08:47:53,602 llama_stack.core.server.server:263 core::server: Error executing endpoint route='/v1/openai/v1/chat/completions'     
         method='post': 'NoneType' object has no attribute 'choices'                                                                                  
INFO     2025-09-30 08:47:53,603 console_span_processor:28 telemetry: 12:47:53.603 [START] /v1/openai/v1/chat/completions                             
INFO     2025-09-30 08:47:53,603 uvicorn.access:476 uncategorized: ::1:59550 - "POST /v1/openai/v1/chat/completions HTTP/1.1" 500                     
INFO     2025-09-30 08:47:53,609 console_span_processor:39 telemetry: 12:47:53.604 [END] ModelsRoutingTable.get_model [StatusCode.OK] (0.03ms)        
INFO     2025-09-30 08:47:53,610 console_span_processor:48 telemetry:     output: {'identifier': 'bedrock-inference/meta.llama3-1-8b-instruct-v1:0',  
         'provider_resource_id': 'meta.llama3-1-8b-instruct-v1:0', 'provider_id': 'bedrock-inference', 'type': 'model', 'owner': None, 'source':      
         'via_register_api', 'metadata': {}, 'model_type': 'llm'}                                                                                     
INFO     2025-09-30 08:47:53,613 console_span_processor:39 telemetry: 12:47:53.611 [END] ModelsRoutingTable.get_provider_impl [StatusCode.OK] (0.04ms)
INFO     2025-09-30 08:47:53,614 console_span_processor:48 telemetry:     output:                                                                     
         <llama_stack.providers.remote.inference.bedrock.bedrock.BedrockInferenceAdapter object at 0x7fcc023409b0>                                    
INFO     2025-09-30 08:47:53,617 console_span_processor:39 telemetry: 12:47:53.615 [END] InferenceRouter.openai_chat_completion [StatusCode.OK]       
         (10.93ms)                                                                                                                                    
INFO     2025-09-30 08:47:53,618 console_span_processor:48 telemetry:     error: 'NoneType' object has no attribute 'choices'                         
INFO     2025-09-30 08:47:53,621 console_span_processor:39 telemetry: 12:47:53.619 [END] /v1/openai/v1/chat/completions [StatusCode.OK] (15.53ms)     
INFO     2025-09-30 08:47:53,621 console_span_processor:48 telemetry:     raw_path: /v1/openai/v1/chat/completions                                    
INFO     2025-09-30 08:47:53,622 console_span_processor:62 telemetry:  12:47:53.603 [ERROR] Error executing endpoint                                  
         route='/v1/openai/v1/chat/completions' method='post': 'NoneType' object has no attribute 'choices'                                           
INFO     2025-09-30 08:47:53,623 console_span_processor:62 telemetry:  12:47:53.605 [INFO] ::1:59550 - "POST /v1/openai/v1/chat/completions HTTP/1.1" 
         500                                                                                                                                          
```

### Expected behavior

Non-streaming: Should return a complete ChatCompletionResponse with generated text
Streaming: Should return server-sent events with streaming response chunks


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bedrock Provider Non-Streaming and Streaming Inference Failures #3621

🐛 Describe the bug

System Info

Information

Error logs

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bedrock Provider Non-Streaming and Streaming Inference Failures #3621

Description

🐛 Describe the bug

System Info

Information

Error logs

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions