Skip to content

Tool Calls Occasionally Fail Silently or Return Incomplete Outputs (Groq + Pydantic AI Integration)Β #1714

Closed as not planned
@Rikhil-Nell

Description

@Rikhil-Nell

Question

Hi team,

I've been working with pydantic_ai for some time now and integrating it into an agentic architecture using both Groq (LLaMA 3.3 70B Versatile) and OpenAI providers.

Here's the repository where the full setup is implemented:
πŸ”— https://github.com/Rikhil-Nell/Multi-Agentic-RAG

Also here is a deployed streamlit link:
πŸ”— Streamlit App
(Please note: I'm a student with very limited API credits, so please be mindful if testing.)

The agent is fairly minimal right now β€” it uses:

  • A RAG vector search tool (retrieve_relevant_documentation)
  • A dictionary lookup tool (call_dictionary using the Merriam-Webster API)

❗ The Problem

Tool usage is highly inconsistent. At times, everything works perfectly β€” the agent recognizes intent, calls the right tool, parses arguments, and returns the result correctly.

However, during certain stretches (seemingly random), the model stops making actual tool calls and instead returns placeholder-style outputs like:

<function=call_dictionary({"word": "suburbs"})</function>

This happens despite:

  • The tool functions being cleanly defined using @tool decorators
  • Well-structured prompts that explicitly direct the model to use tools when needed
  • Confirmed success of the same codebase and logic at other times

A sample system prompt looks like this:

Be concise, reply professionally. Use the tool call_dictionary when asked to define a word. Use retrieve_relevant_documentation for queries out of scope. Never respond with the tool call string β€” only invoke tools directly and wait for the result. Never fabricate an answer. Always begin with RAG if unsure.

I’ve verified that:

  • The code path is not skipping function execution
  • There are no exceptions thrown during successful runs
  • This issue does not seem to stem from the tool logic itself

It feels like either:

  • The model isn't parsing the system prompt consistently
  • Or there's something flaky in the tool call orchestration layer with pydantic_ai or Groq integration

πŸ’‘ Questions / Help Needed

  • Is this a known limitation when using Groq models through pydantic_ai?
  • Are there internal retry mechanisms or validation steps I can hook into?
  • Are tool calls non-deterministic across inference providers?
  • How can I debug cases where the model outputs a tool call string instead of invoking the function?

I'm not sure if this is a problem with the model, the inference provider, pydantic_ai, or my orchestration β€” but I'd greatly appreciate any pointers or support.


yes I used AI for this, I am too emotional and frustrated right now to be able to write a good issue, so I do apologize, I will provide any and all information required for diagnosis if it means my code works please.

Additional Context

Pydantic_ai version: 0.1.10
Python version: 3.13

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions