Skip to content

Support Tool Calling with Llama 3.3 on Bedrock #1649

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
tamir-alltrue-ai opened this issue May 5, 2025 · 7 comments
Closed

Support Tool Calling with Llama 3.3 on Bedrock #1649

tamir-alltrue-ai opened this issue May 5, 2025 · 7 comments
Assignees
Labels
question Further information is requested

Comments

@tamir-alltrue-ai
Copy link

tamir-alltrue-ai commented May 5, 2025

Question

Description

I'm using the BedrockConverseModel to use models hosted on Bedrock. When working with Llama-3.3-70B, tool calls do not work.

This seems to be a similar issue to #1623, and following @DouweM advice I've attempted to create a custom GenerateToolJsonSchema to match Llama 3.3's schema (which is the same as llama 3.1) but found it confusing. I'm hoping someone with more knowledge on the what's going on under the hood can help me out here. Considering that Llama3 is one of the top 5 most popular Llms, I hope others would find this useful too.

Code snippet

Below is an example showing that tools work fine when working with Anthropic on Bedrock, but not Llama. This is how I convinced myself it's not a problem with bedrock specifically or my prompt, but rather a difference in the underlying model

from pydantic_ai.providers.bedrock import BedrockProvider
from pydantic_ai.models.bedrock import BedrockConverseModel
import boto3
import random

from pydantic_ai import RunContext, Agent

# create an agent with the model id parameterized.
# uses the "dice rolling" example from the pydantic ai docs
# https://ai.pydantic.dev/tools/#registering-function-tools-via-agent-argument
def agent_with_tool_use(model: str):
    
    bedrock_client = boto3.client("bedrock-runtime")
    model = BedrockConverseModel(
        model,
        provider=BedrockProvider(bedrock_client=bedrock_client),
    )
    
    agent = Agent(
        model,
        system_prompt=(
            "You're a dice game, you should roll the die and see if the number "
            "you get back matches the user's guess. If so, tell them they're a winner. "
            "Use the player's name in the response."
        ),
        instrument=True,
    )
    
    
    @agent.tool_plain  
    def roll_die() -> str:
        """Roll a six-sided die and return the result."""
        return str(random.randint(1, 6))
    
    
    @agent.tool  
    def get_player_name(ctx: RunContext[str]) -> str:
        """Get the player's name."""
        return ctx.deps
    
    return agent


# when calling with anthropic, tools are invoked
anthropic_agent = agent_with_tool_use("anthropic.claude-3-5-sonnet-20241022-v2:0")
anthropic_agent.run_sync('My guess is 4', deps='Anne')
#  > AgentRunResult(output='Sorry Anne! You guessed 4, but the die rolled a 5. Better luck next time!')


# when calling with llama, it does not invoke the tool
llama_agent = agent_with_tool_use("us.meta.llama3-3-70b-instruct-v1:0")
llama_agent.run_sync('My guess is 4', deps='Anne')
#  > AgentRunResult(output='{"type": "function", "name": "roll_die", "parameters": {}}') 

Any advice, code snippets or help on resolving this would be super appreciated. thanks.

Logfire output

When calling anthropic, you can see that the tool is registered and called

Image

Full conversation json
{
    "agent_name": "anthropic_agent",
    "all_messages_events": [
        {
            "content": "You're a dice game, you should roll the die and see if the number you get back matches the user's guess. If so, tell them they're a winner. Use the player's name in the response.",
            "role": "system",
            "gen_ai.message.index": 0,
            "event.name": "gen_ai.system.message"
        },
        {
            "content": "My guess is 4",
            "role": "user",
            "gen_ai.message.index": 0,
            "event.name": "gen_ai.user.message"
        },
        {
            "role": "assistant",
            "content": "Let me get your name and roll the die to see if you guessed correctly!",
            "tool_calls": [
                {
                    "id": "tooluse_el55vVyWT2WM-h6oQo1zDA",
                    "type": "function",
                    "function": {
                        "name": "get_player_name",
                        "arguments": {}
                    }
                }
            ],
            "gen_ai.message.index": 1,
            "event.name": "gen_ai.assistant.message"
        },
        {
            "content": "Anne",
            "role": "tool",
            "id": "tooluse_el55vVyWT2WM-h6oQo1zDA",
            "name": "get_player_name",
            "gen_ai.message.index": 2,
            "event.name": "gen_ai.tool.message",
            "functionName": "get_player_name"
        },
        {
            "role": "assistant",
            "tool_calls": [
                {
                    "id": "tooluse_dWbCrxm6TteRD-S8e-0eVQ",
                    "type": "function",
                    "function": {
                        "name": "roll_die",
                        "arguments": {}
                    }
                }
            ],
            "gen_ai.message.index": 3,
            "event.name": "gen_ai.assistant.message"
        },
        {
            "content": "2",
            "role": "tool",
            "id": "tooluse_dWbCrxm6TteRD-S8e-0eVQ",
            "name": "roll_die",
            "gen_ai.message.index": 4,
            "event.name": "gen_ai.tool.message",
            "functionName": "roll_die"
        },
        {
            "role": "assistant",
            "content": "Sorry Anne! You guessed 4, but the die landed on 2. Better luck next time!",
            "gen_ai.message.index": 5,
            "event.name": "gen_ai.assistant.message"
        }
    ],
    "final_result": "Sorry Anne! You guessed 4, but the die landed on 2. Better luck next time!",
    "gen_ai.usage.input_tokens": 1592,
    "gen_ai.usage.output_tokens": 119,
    "model_name": "anthropic.claude-3-5-sonnet-20241022-v2:0"
}

But when we call with Llama, the tool does not seem to be correctly registered and it is not invoked:

Image
Full conversation history
{
    "agent_name": "llama_agent",
    "all_messages_events": [
        {
            "content": "You're a dice game, you should roll the die and see if the number you get back matches the user's guess. If so, tell them they're a winner. Use the player's name in the response.",
            "role": "system",
            "gen_ai.message.index": 0,
            "event.name": "gen_ai.system.message"
        },
        {
            "content": "My guess is 4",
            "role": "user",
            "gen_ai.message.index": 0,
            "event.name": "gen_ai.user.message"
        },
        {
            "role": "assistant",
            "content": "{\"type\": \"function\", \"name\": \"roll_die\", \"parameters\": {}}",
            "gen_ai.message.index": 1,
            "event.name": "gen_ai.assistant.message"
        }
    ],
    "final_result": {
        "type": "function",
        "name": "roll_die",
        "parameters": {}
    },
    "gen_ai.usage.input_tokens": 242,
    "gen_ai.usage.output_tokens": 19,
    "model_name": "us.meta.llama3-3-70b-instruct-v1:0"
}

Configuration

Relevant libraries

pydantic                                        2.10.6
pydantic-ai                                   0.0.55
pydantic-ai-slim                           0.1.9
pydantic_core                               2.27.2
pydantic-evals                              0.0.55
pydantic-graph                             0.1.9
pydantic-settings                         2.9.1
boto3                                             1.38.8
boto3-stubs                                  1.34.162
botocore                                        1.38.8
botocore-stubs                             1.37.38

Python version 3.11.6

Additional Context

No response

@tamir-alltrue-ai tamir-alltrue-ai added the question Further information is requested label May 5, 2025
@DouweM
Copy link
Contributor

DouweM commented May 6, 2025

@tamir-alltrue-ai It seems like the tool is being registered to the model, and the model is trying to call it, but as it shows in the second screenshot and JSON snippet, the model is doing so through a regular text message containing JSON, rather than a special tool_calls property on the message like Anthropic does. Since we’re not getting an error and the model uses the correct tool function name, I don’t think this is something that can be solved via modifying the tool registration JSON schema.

(The issue where I recommended overriding the schema generation mentioned “All other fields are disallowed and result in a hard API error.“, which is not the case here.)

To debug this further, I'd like to see the complete HTTP response to the request where the model is supposed to explicitly pass tool calls, but is passing regular text instead. It's possible there's another property to indicate that PydanticAI should be interpreting the message as a tool call.

Can you please add the following to the top of your code?

import logfire
logfire.configure()
logfire.instrument_httpx(capture_all=True)

Then, from the traces, I'd like to see the response JSON of the httpx trace that will be under chat us.meta.....

@DouweM DouweM self-assigned this May 6, 2025
@tamir-alltrue-ai
Copy link
Author

Thanks for your response @DouweM

Turns out that boto3 doesn't use httpx nor requests - it uses botocore which in turn uses urllib3 under the hood (source).

There is no instrumentation for urllib3 to my knowledge, so I just added instrumentation to the built in logging to try and see what botocore is doing:

from logfire.integrations.logging import LogfireLoggingHandler

logfire_handler = LogfireLoggingHandler()
logfire.configure(send_to_logfire="if-token-present", scrubbing=False, metrics=False)

basicConfig(handlers=[logfire_handler], level=DEBUG)

I found this log for the response body to the POST request to made by botocore:

Response body:
b'{"metrics":{"latencyMs":426},"output":{"message":{"content":[{"text":"{\\"type\\": \\"function\\", \\"name\\": \\"roll_die\\", \\"parameters\\": {}}"}],"role":"assistant"}},"stopReason":"end_turn","usage":{"inputTokens":242,"outputTokens":19,"totalTokens":261}}'

Or, parsed:

{
  "metrics": {
    "latencyMs": 426
  },
  "output": {
    "message": {
      "content": [
        {
          "text": "{\"type\": \"function\", \"name\": \"roll_die\", \"parameters\": {}}"
        }
      ],
      "role": "assistant"
    }
  },
  "stopReason": "end_turn",
  "usage": {
    "inputTokens": 242,
    "outputTokens": 19,
    "totalTokens": 261
  }
}

Again you can see that the output is serialized JSON for the function call.

@tamir-alltrue-ai
Copy link
Author

@DouweM I've looked into this a bit further, and I'm starting to think it's a limitation with bedrock. According to this:

Claude, Command and Mistral Large models supports native function calling through AWS Bedrock Converse... Not all models from AWS Bedrock support function calling and the Converse API.

As a result, I tried to use Mistral instead of Llama. I get this error from the bedrock API

{\"message\":\"Expected toolResult blocks at messages.2.content for the following Ids: tooluse_1XnGK0qiT1WBzqTOqA0NYA\"}

It looks as if the tool is being invoked correctly, but then when the results are passed back into the model they are done so in a format bedrock doesn't expect.

Image

According to the Bedrock documentation, The tool result should be fed back to the model after it's invoked with the following format:

{
    "role": "user",
    "content": [
        {
            "toolResult": {
                "toolUseId": "tooluse_kZJMlvQmRJ6eAyJE5GIl7Q",
                "content": [
                    {
                        "json": {
                            "song": "Elemental Hotel",
                            "artist": "8 Storey Hike"
                        }
                    }
                ]
            }
        }
    ]
}

Is this something that I can fix by overriding the default GenerateToolJsonSchema or some other way?

@DouweM
Copy link
Contributor

DouweM commented May 8, 2025

@tamir-alltrue-ai We build the toolResult block here:

{
'role': 'user',
'content': [
{
'toolResult': {
'toolUseId': part.tool_call_id,
'content': [{'text': part.model_response_str()}],
'status': 'success',
}
}
],
}

As you can see, the difference with your example is that we pass {'text': '<json string>'} instead of {'json': <json>}.

Can you try changing that bit from [{'text': part.model_response_str()}] to [{'json': part.model_response_object()}] and seeing if that works?

The tricky thing we're seeing here is that Bedrock runs many different models, and we only have one Bedrock model class that aims to work with all of them. So while this change may work for one model, it may break others. We may need to start implementing some feature flags inside the Bedrock model so we can switch behaviors based on the specific model used.

@tamir-alltrue-ai
Copy link
Author

Thanks @DouweM ,

Yeah I agree the challenge is that different models give different response types. The change you propose is definitely a breaking one - the tests start failing (nova stops working for structured output) and anthropic also stops working for tool calling. I also haven't seen it fix the problem with llama - in the end the assistant returns:

{"type": "function", "name": "roll_die", "parameters": {}}

But the function is not called

Image

@tamir-alltrue-ai
Copy link
Author

tamir-alltrue-ai commented May 8, 2025

If supporting different models differently in bedrock becomes a new undertaking, I would strongly express my interest in this. Our company is trying hard to adopt PydanticAI but due to data-sharing concerns we need to stay within the AWS environment, and this is a big blocker for us from adoption pydantic-ai.

My intuition is that treating bedrock as its own model (like OpenAI, Anthropic, etc) is not quite the right model. Rather, it's just a place where you can host models, like Anthropic and others. So I would kind of expect that using Anthropic on Bedrock would use the AnthropicModel class with the httpx client overridden, or perhaps using a sublcass of AnthropicModel for bedrock which just changes how request are made, but nothing about the input/output parsing.

In any case - if there's some other hack I can try here in the meantime, or something I can do to help bring attention to this issue that it's put in your roadmap promptly, I'd love to do so! THanks for your help @DouweM

@aasthavar
Copy link

my team and i were experimenting a bit, and this setup ended up working for us:

response = client.converse(
    modelId="us.meta.llama3-3-70b-instruct-v1:0",
    messages=messages,
    system=[{"text": system_prompt}],
    inferenceConfig={
        'maxTokens': 4096,
        'temperature': 0,
        'topP': 1,
    },
    toolConfig={
        "tools": tools,
        "toolChoice": {
            "auto": {}
        }
    },
)

@tamir-alltrue-ai feel free to give it a shot — let me know how it goes!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants