Skip to content

Bedrock Provider Non-Streaming and Streaming Inference Failures #3621

@skamenan7

Description

@skamenan7

🐛 Describe the bug

The Bedrock provider in llama-stack is returning None for both streaming and non-streaming inference requests, causing 'NoneType' object has no attribute 'choices' errors and TypeError: 'async for' requires an object with aiter method, got NoneType failures.

System Info

llama-stack version: Latest main branch
Python version: 3.12.11
Provider: remote::bedrock
Model: meta.llama3-1-8b-instruct-v1:0
Region: us-east-2
Authentication: AWS credentials file

Information

  • The official example scripts
  • My own modified scripts

Error logs

Streaming Error:

TypeError: 'async for' requires an object with __aiter__ method, got NoneType
HTTP 200 with: data: {"error": {"message": "500: Internal server error: An unexpected error occurred."}}

Error Logs:

INFO     2025-09-30 10:41:56,425 uvicorn.access:476 uncategorized: ::1:36210 - "POST /v1/openai/v1/chat/completions HTTP/1.1" 200                     
ERROR    2025-09-30 10:41:56,426 llama_stack.core.server.server:204 core::server: Error in sse_generator                                              
         ╭──────────────────────────────────────────────────── Traceback (most recent call last) ────────────────────────────────────────────────────╮
         │ /home/skamenan/git/code/llama-stack/.worktrees/bug/bedrock-2nd-error-jira-741/llama_stack/core/server/server.py:197 in sse_generator      │
         │                                                                                                                                           │
         │   194 │   event_gen = None                                                                                                                │
         │   195 │   try:                                                                                                                            │
         │   196 │   │   event_gen = await event_gen_coroutine                                                                                       │
         │ ❱ 197 │   │   async for item in event_gen:                                                                                                │
         │   198 │   │   │   yield create_sse_event(item)                                                                                            │
         │   199 │   except asyncio.CancelledError:                                                                                                  │
         │   200 │   │   logger.info("Generator cancelled")                                                                                          │
         │                                                                                                                                           │
         │ /home/skamenan/git/code/llama-stack/.worktrees/bug/bedrock-2nd-error-jira-741/llama_stack/providers/utils/telemetry/trace_protocol.py:87  │
         │ in async_gen_wrapper                                                                                                                      │
         │                                                                                                                                           │
         │    84 │   │   │   with tracing.span(f"{class_name}.{method_name}", span_attributes) as span:                                              │
         │    85 │   │   │   │   try:                                                                                                                │
         │    86 │   │   │   │   │   count = 0                                                                                                       │
         │ ❱  87 │   │   │   │   │   async for item in method(self, *args, **kwargs):                                                                │
         │    88 │   │   │   │   │   │   yield item                                                                                                  │
         │    89 │   │   │   │   │   │   count += 1                                                                                                  │
         │    90 │   │   │   │   finally:                                                                                                            │
         │                                                                                                                                           │
         │ /home/skamenan/git/code/llama-stack/.worktrees/bug/bedrock-2nd-error-jira-741/llama_stack/core/routers/inference.py:681 in                │
         │ stream_tokens_and_compute_metrics_openai_chat                                                                                             │
         │                                                                                                                                           │
         │   678 │   │   choices_data: dict[int, dict[str, Any]] = {}                                                                                │
         │   679 │   │                                                                                                                               │
         │   680 │   │   try:                                                                                                                        │
         │ ❱ 681 │   │   │   async for chunk in response:                                                                                            │
         │   682 │   │   │   │   # Skip None chunks                                                                                                  │
         │   683 │   │   │   │   if chunk is None:                                                                                                   │
         │   684 │   │   │   │   │   continue                                                                                                        │
         ╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
         TypeError: 'async for' requires an object with __aiter__ method, got NoneType                    

Non-Streaming Error:

ERROR: 'NoneType' object has no attribute 'choices'
HTTP 500: {"detail":"Internal server error: An unexpected error occurred."}

Error Log:

ERROR    2025-09-30 08:47:53,602 llama_stack.core.server.server:263 core::server: Error executing endpoint route='/v1/openai/v1/chat/completions'     
         method='post': 'NoneType' object has no attribute 'choices'                                                                                  
INFO     2025-09-30 08:47:53,603 console_span_processor:28 telemetry: 12:47:53.603 [START] /v1/openai/v1/chat/completions                             
INFO     2025-09-30 08:47:53,603 uvicorn.access:476 uncategorized: ::1:59550 - "POST /v1/openai/v1/chat/completions HTTP/1.1" 500                     
INFO     2025-09-30 08:47:53,609 console_span_processor:39 telemetry: 12:47:53.604 [END] ModelsRoutingTable.get_model [StatusCode.OK] (0.03ms)        
INFO     2025-09-30 08:47:53,610 console_span_processor:48 telemetry:     output: {'identifier': 'bedrock-inference/meta.llama3-1-8b-instruct-v1:0',  
         'provider_resource_id': 'meta.llama3-1-8b-instruct-v1:0', 'provider_id': 'bedrock-inference', 'type': 'model', 'owner': None, 'source':      
         'via_register_api', 'metadata': {}, 'model_type': 'llm'}                                                                                     
INFO     2025-09-30 08:47:53,613 console_span_processor:39 telemetry: 12:47:53.611 [END] ModelsRoutingTable.get_provider_impl [StatusCode.OK] (0.04ms)
INFO     2025-09-30 08:47:53,614 console_span_processor:48 telemetry:     output:                                                                     
         <llama_stack.providers.remote.inference.bedrock.bedrock.BedrockInferenceAdapter object at 0x7fcc023409b0>                                    
INFO     2025-09-30 08:47:53,617 console_span_processor:39 telemetry: 12:47:53.615 [END] InferenceRouter.openai_chat_completion [StatusCode.OK]       
         (10.93ms)                                                                                                                                    
INFO     2025-09-30 08:47:53,618 console_span_processor:48 telemetry:     error: 'NoneType' object has no attribute 'choices'                         
INFO     2025-09-30 08:47:53,621 console_span_processor:39 telemetry: 12:47:53.619 [END] /v1/openai/v1/chat/completions [StatusCode.OK] (15.53ms)     
INFO     2025-09-30 08:47:53,621 console_span_processor:48 telemetry:     raw_path: /v1/openai/v1/chat/completions                                    
INFO     2025-09-30 08:47:53,622 console_span_processor:62 telemetry:  12:47:53.603 [ERROR] Error executing endpoint                                  
         route='/v1/openai/v1/chat/completions' method='post': 'NoneType' object has no attribute 'choices'                                           
INFO     2025-09-30 08:47:53,623 console_span_processor:62 telemetry:  12:47:53.605 [INFO] ::1:59550 - "POST /v1/openai/v1/chat/completions HTTP/1.1" 
         500                                                                                                                                          

Expected behavior

Non-streaming: Should return a complete ChatCompletionResponse with generated text
Streaming: Should return server-sent events with streaming response chunks

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions