Support Tool Calling with Llama 3.3 on Bedrock #1649

tamir-alltrue-ai · 2025-05-05T20:30:35Z

Question

Description

I'm using the BedrockConverseModel to use models hosted on Bedrock. When working with Llama-3.3-70B, tool calls do not work.

This seems to be a similar issue to #1623, and following @DouweM advice I've attempted to create a custom GenerateToolJsonSchema to match Llama 3.3's schema (which is the same as llama 3.1) but found it confusing. I'm hoping someone with more knowledge on the what's going on under the hood can help me out here. Considering that Llama3 is one of the top 5 most popular Llms, I hope others would find this useful too.

Code snippet

Below is an example showing that tools work fine when working with Anthropic on Bedrock, but not Llama. This is how I convinced myself it's not a problem with bedrock specifically or my prompt, but rather a difference in the underlying model

from pydantic_ai.providers.bedrock import BedrockProvider
from pydantic_ai.models.bedrock import BedrockConverseModel
import boto3
import random

from pydantic_ai import RunContext, Agent

# create an agent with the model id parameterized.
# uses the "dice rolling" example from the pydantic ai docs
# https://ai.pydantic.dev/tools/#registering-function-tools-via-agent-argument
def agent_with_tool_use(model: str):
    
    bedrock_client = boto3.client("bedrock-runtime")
    model = BedrockConverseModel(
        model,
        provider=BedrockProvider(bedrock_client=bedrock_client),
    )
    
    agent = Agent(
        model,
        system_prompt=(
            "You're a dice game, you should roll the die and see if the number "
            "you get back matches the user's guess. If so, tell them they're a winner. "
            "Use the player's name in the response."
        ),
        instrument=True,
    )
    
    
    @agent.tool_plain  
    def roll_die() -> str:
        """Roll a six-sided die and return the result."""
        return str(random.randint(1, 6))
    
    
    @agent.tool  
    def get_player_name(ctx: RunContext[str]) -> str:
        """Get the player's name."""
        return ctx.deps
    
    return agent


# when calling with anthropic, tools are invoked
anthropic_agent = agent_with_tool_use("anthropic.claude-3-5-sonnet-20241022-v2:0")
anthropic_agent.run_sync('My guess is 4', deps='Anne')
#  > AgentRunResult(output='Sorry Anne! You guessed 4, but the die rolled a 5. Better luck next time!')


# when calling with llama, it does not invoke the tool
llama_agent = agent_with_tool_use("us.meta.llama3-3-70b-instruct-v1:0")
llama_agent.run_sync('My guess is 4', deps='Anne')
#  > AgentRunResult(output='{"type": "function", "name": "roll_die", "parameters": {}}')

Any advice, code snippets or help on resolving this would be super appreciated. thanks.

Logfire output

When calling anthropic, you can see that the tool is registered and called

Full conversation json

{
    "agent_name": "anthropic_agent",
    "all_messages_events": [
        {
            "content": "You're a dice game, you should roll the die and see if the number you get back matches the user's guess. If so, tell them they're a winner. Use the player's name in the response.",
            "role": "system",
            "gen_ai.message.index": 0,
            "event.name": "gen_ai.system.message"
        },
        {
            "content": "My guess is 4",
            "role": "user",
            "gen_ai.message.index": 0,
            "event.name": "gen_ai.user.message"
        },
        {
            "role": "assistant",
            "content": "Let me get your name and roll the die to see if you guessed correctly!",
            "tool_calls": [
                {
                    "id": "tooluse_el55vVyWT2WM-h6oQo1zDA",
                    "type": "function",
                    "function": {
                        "name": "get_player_name",
                        "arguments": {}
                    }
                }
            ],
            "gen_ai.message.index": 1,
            "event.name": "gen_ai.assistant.message"
        },
        {
            "content": "Anne",
            "role": "tool",
            "id": "tooluse_el55vVyWT2WM-h6oQo1zDA",
            "name": "get_player_name",
            "gen_ai.message.index": 2,
            "event.name": "gen_ai.tool.message",
            "functionName": "get_player_name"
        },
        {
            "role": "assistant",
            "tool_calls": [
                {
                    "id": "tooluse_dWbCrxm6TteRD-S8e-0eVQ",
                    "type": "function",
                    "function": {
                        "name": "roll_die",
                        "arguments": {}
                    }
                }
            ],
            "gen_ai.message.index": 3,
            "event.name": "gen_ai.assistant.message"
        },
        {
            "content": "2",
            "role": "tool",
            "id": "tooluse_dWbCrxm6TteRD-S8e-0eVQ",
            "name": "roll_die",
            "gen_ai.message.index": 4,
            "event.name": "gen_ai.tool.message",
            "functionName": "roll_die"
        },
        {
            "role": "assistant",
            "content": "Sorry Anne! You guessed 4, but the die landed on 2. Better luck next time!",
            "gen_ai.message.index": 5,
            "event.name": "gen_ai.assistant.message"
        }
    ],
    "final_result": "Sorry Anne! You guessed 4, but the die landed on 2. Better luck next time!",
    "gen_ai.usage.input_tokens": 1592,
    "gen_ai.usage.output_tokens": 119,
    "model_name": "anthropic.claude-3-5-sonnet-20241022-v2:0"
}

But when we call with Llama, the tool does not seem to be correctly registered and it is not invoked:

Full conversation history

{
    "agent_name": "llama_agent",
    "all_messages_events": [
        {
            "content": "You're a dice game, you should roll the die and see if the number you get back matches the user's guess. If so, tell them they're a winner. Use the player's name in the response.",
            "role": "system",
            "gen_ai.message.index": 0,
            "event.name": "gen_ai.system.message"
        },
        {
            "content": "My guess is 4",
            "role": "user",
            "gen_ai.message.index": 0,
            "event.name": "gen_ai.user.message"
        },
        {
            "role": "assistant",
            "content": "{\"type\": \"function\", \"name\": \"roll_die\", \"parameters\": {}}",
            "gen_ai.message.index": 1,
            "event.name": "gen_ai.assistant.message"
        }
    ],
    "final_result": {
        "type": "function",
        "name": "roll_die",
        "parameters": {}
    },
    "gen_ai.usage.input_tokens": 242,
    "gen_ai.usage.output_tokens": 19,
    "model_name": "us.meta.llama3-3-70b-instruct-v1:0"
}

Configuration

Relevant libraries

pydantic                                        2.10.6
pydantic-ai                                   0.0.55
pydantic-ai-slim                           0.1.9
pydantic_core                               2.27.2
pydantic-evals                              0.0.55
pydantic-graph                             0.1.9
pydantic-settings                         2.9.1
boto3                                             1.38.8
boto3-stubs                                  1.34.162
botocore                                        1.38.8
botocore-stubs                             1.37.38

Python version 3.11.6

Additional Context

No response

The text was updated successfully, but these errors were encountered:

DouweM · 2025-05-06T16:31:01Z

@tamir-alltrue-ai It seems like the tool is being registered to the model, and the model is trying to call it, but as it shows in the second screenshot and JSON snippet, the model is doing so through a regular text message containing JSON, rather than a special tool_calls property on the message like Anthropic does. Since we’re not getting an error and the model uses the correct tool function name, I don’t think this is something that can be solved via modifying the tool registration JSON schema.

(The issue where I recommended overriding the schema generation mentioned “All other fields are disallowed and result in a hard API error.“, which is not the case here.)

To debug this further, I'd like to see the complete HTTP response to the request where the model is supposed to explicitly pass tool calls, but is passing regular text instead. It's possible there's another property to indicate that PydanticAI should be interpreting the message as a tool call.

Can you please add the following to the top of your code?

import logfire
logfire.configure()
logfire.instrument_httpx(capture_all=True)

Then, from the traces, I'd like to see the response JSON of the httpx trace that will be under chat us.meta.....

tamir-alltrue-ai · 2025-05-06T17:00:12Z

Thanks for your response @DouweM

Turns out that boto3 doesn't use httpx nor requests - it uses botocore which in turn uses urllib3 under the hood (source).

There is no instrumentation for urllib3 to my knowledge, so I just added instrumentation to the built in logging to try and see what botocore is doing:

from logfire.integrations.logging import LogfireLoggingHandler

logfire_handler = LogfireLoggingHandler()
logfire.configure(send_to_logfire="if-token-present", scrubbing=False, metrics=False)

basicConfig(handlers=[logfire_handler], level=DEBUG)

I found this log for the response body to the POST request to made by botocore:

Response body:
b'{"metrics":{"latencyMs":426},"output":{"message":{"content":[{"text":"{\\"type\\": \\"function\\", \\"name\\": \\"roll_die\\", \\"parameters\\": {}}"}],"role":"assistant"}},"stopReason":"end_turn","usage":{"inputTokens":242,"outputTokens":19,"totalTokens":261}}'

Or, parsed:

{
  "metrics": {
    "latencyMs": 426
  },
  "output": {
    "message": {
      "content": [
        {
          "text": "{\"type\": \"function\", \"name\": \"roll_die\", \"parameters\": {}}"
        }
      ],
      "role": "assistant"
    }
  },
  "stopReason": "end_turn",
  "usage": {
    "inputTokens": 242,
    "outputTokens": 19,
    "totalTokens": 261
  }
}

Again you can see that the output is serialized JSON for the function call.

tamir-alltrue-ai · 2025-05-06T21:25:51Z

@DouweM I've looked into this a bit further, and I'm starting to think it's a limitation with bedrock. According to this:

Claude, Command and Mistral Large models supports native function calling through AWS Bedrock Converse... Not all models from AWS Bedrock support function calling and the Converse API.

As a result, I tried to use Mistral instead of Llama. I get this error from the bedrock API

{\"message\":\"Expected toolResult blocks at messages.2.content for the following Ids: tooluse_1XnGK0qiT1WBzqTOqA0NYA\"}

It looks as if the tool is being invoked correctly, but then when the results are passed back into the model they are done so in a format bedrock doesn't expect.

According to the Bedrock documentation, The tool result should be fed back to the model after it's invoked with the following format:

{
    "role": "user",
    "content": [
        {
            "toolResult": {
                "toolUseId": "tooluse_kZJMlvQmRJ6eAyJE5GIl7Q",
                "content": [
                    {
                        "json": {
                            "song": "Elemental Hotel",
                            "artist": "8 Storey Hike"
                        }
                    }
                ]
            }
        }
    ]
}

Is this something that I can fix by overriding the default GenerateToolJsonSchema or some other way?

DouweM · 2025-05-08T08:06:43Z

@tamir-alltrue-ai We build the toolResult block here:

pydantic-ai/pydantic_ai_slim/pydantic_ai/models/bedrock.py

Lines 384 to 395 in dd22595

    
           { 
        
               'role': 'user', 
        
               'content': [ 
        
                   { 
        
                       'toolResult': { 
        
                           'toolUseId': part.tool_call_id, 
        
                           'content': [{'text': part.model_response_str()}], 
        
                           'status': 'success', 
        
                       } 
        
                   } 
        
               ], 
        
           }

As you can see, the difference with your example is that we pass {'text': '<json string>'} instead of {'json': <json>}.

Can you try changing that bit from [{'text': part.model_response_str()}] to [{'json': part.model_response_object()}] and seeing if that works?

The tricky thing we're seeing here is that Bedrock runs many different models, and we only have one Bedrock model class that aims to work with all of them. So while this change may work for one model, it may break others. We may need to start implementing some feature flags inside the Bedrock model so we can switch behaviors based on the specific model used.

tamir-alltrue-ai · 2025-05-08T14:00:23Z

Thanks @DouweM ,

Yeah I agree the challenge is that different models give different response types. The change you propose is definitely a breaking one - the tests start failing (nova stops working for structured output) and anthropic also stops working for tool calling. I also haven't seen it fix the problem with llama - in the end the assistant returns:

{"type": "function", "name": "roll_die", "parameters": {}}

But the function is not called

tamir-alltrue-ai · 2025-05-08T14:09:50Z

If supporting different models differently in bedrock becomes a new undertaking, I would strongly express my interest in this. Our company is trying hard to adopt PydanticAI but due to data-sharing concerns we need to stay within the AWS environment, and this is a big blocker for us from adoption pydantic-ai.

My intuition is that treating bedrock as its own model (like OpenAI, Anthropic, etc) is not quite the right model. Rather, it's just a place where you can host models, like Anthropic and others. So I would kind of expect that using Anthropic on Bedrock would use the AnthropicModel class with the httpx client overridden, or perhaps using a sublcass of AnthropicModel for bedrock which just changes how request are made, but nothing about the input/output parsing.

In any case - if there's some other hack I can try here in the meantime, or something I can do to help bring attention to this issue that it's put in your roadmap promptly, I'd love to do so! THanks for your help @DouweM

aasthavar · 2025-05-16T13:00:48Z

my team and i were experimenting a bit, and this setup ended up working for us:

response = client.converse(
    modelId="us.meta.llama3-3-70b-instruct-v1:0",
    messages=messages,
    system=[{"text": system_prompt}],
    inferenceConfig={
        'maxTokens': 4096,
        'temperature': 0,
        'topP': 1,
    },
    toolConfig={
        "tools": tools,
        "toolChoice": {
            "auto": {}
        }
    },
)

@tamir-alltrue-ai feel free to give it a shot — let me know how it goes!

tamir-alltrue-ai added the question Further information is requested label May 5, 2025

DouweM self-assigned this May 6, 2025

tamir-alltrue-ai mentioned this issue May 7, 2025

Intermittent 400 Errors with Tool Calling in Groq API (qwen-qwq-32b) #1233

Open

2 tasks

DouweM mentioned this issue May 8, 2025

Amazon Nova (Bedrock) limitations with tool schema #1623

Closed

2 tasks

This was referenced May 19, 2025

Tool Calls Occasionally Fail Silently or Return Incomplete Outputs (Groq + Pydantic AI Integration) #1714

Closed

Add ModelProfile to let model-specific behaviors be configured independent of the model class #1782

Closed

DouweM closed this as completed May 20, 2025

DouweM mentioned this issue May 27, 2025

Add ModelProfile to let model-specific behaviors be configured independent of the model class #1835

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support Tool Calling with Llama 3.3 on Bedrock #1649

Support Tool Calling with Llama 3.3 on Bedrock #1649

tamir-alltrue-ai commented May 5, 2025 •

edited

Loading

DouweM commented May 6, 2025

Uh oh!

tamir-alltrue-ai commented May 6, 2025

Uh oh!

tamir-alltrue-ai commented May 6, 2025

Uh oh!

DouweM commented May 8, 2025

Uh oh!

tamir-alltrue-ai commented May 8, 2025

Uh oh!

tamir-alltrue-ai commented May 8, 2025 •

edited

Loading

Uh oh!

aasthavar commented May 16, 2025

Uh oh!

Support Tool Calling with Llama 3.3 on Bedrock #1649

Support Tool Calling with Llama 3.3 on Bedrock #1649

Comments

tamir-alltrue-ai commented May 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Question

Description

Code snippet

Logfire output

Configuration

Additional Context

DouweM commented May 6, 2025

Uh oh!

tamir-alltrue-ai commented May 6, 2025

Uh oh!

tamir-alltrue-ai commented May 6, 2025

Uh oh!

DouweM commented May 8, 2025

Uh oh!

tamir-alltrue-ai commented May 8, 2025

Uh oh!

tamir-alltrue-ai commented May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aasthavar commented May 16, 2025

Uh oh!

tamir-alltrue-ai commented May 5, 2025 •

edited

Loading

tamir-alltrue-ai commented May 8, 2025 •

edited

Loading