Skip to content

[Bug] dspy.ReAct logs a spurious execution error when the model provides final structured outputs in finish #9424

@supermancantfly007

Description

@supermancantfly007

What happened?

dspy.ReAct injects a built-in finish tool with no arguments (args={}), but the agent prompt tells the model to produce
next_thought, next_tool_name, and next_tool_args "also when finishing the task".

When the signature has structured output fields, some models naturally place the final output values into next_tool_args for the
finish step. For example, with a signature that includes fields like:

  • type
  • start_expression
  • end_expression
  • granularity
  • confidence

the model may emit something like:

{
  "next_tool_name": "finish",
  "next_tool_args": {
    "type": "relative",
    "start_expression": "-1 week",
    "end_expression": "-1 week",
    "granularity": "week",
    "confidence": 0.95
  }
}

At runtime, DSPy validates finish against args={} and records an execution error similar to:

ValueError: Arg type is not in the tool's args.

However, ReAct still breaks out of the loop after finish and then runs the extraction step, so the final prediction can still be
correct.

This creates a misleading failure mode:

- the trajectory contains an execution error even though the run may succeed
- logs become noisy and look like a tool failure
- users are forced to add prompt workarounds such as "finish must always use {}"

From a framework perspective, this feels like an abstraction leak in ReAct, because users need to understand the internal split
between the tool loop and the later extraction pass to avoid this.

### Steps to reproduce


import dspy

class MySig(dspy.Signature):
    text: str = dspy.InputField()
    type: str = dspy.OutputField()
    start_expression: str = dspy.OutputField()
    end_expression: str = dspy.OutputField()
    granularity: str = dspy.OutputField()
    confidence: float = dspy.OutputField()

def ask_human_time(prompt: str) -> str:
    return "last week"

lm = dspy.LM("openai/gpt-4o-mini")  # replace with any model that tends to put final outputs into finish args
dspy.configure(lm=lm)

react = dspy.ReAct(MySig, tools=[ask_human_time], max_iters=3)
pred = react(text="What was the FPY for ADPRR at PTI?")

print(pred.trajectory)
print(pred)

Observed behavior:

1. the model may call ask_human_time
2. then it may call finish with structured output fields in next_tool_args
3. DSPy records an execution error for finish
4. the final extraction step may still return a valid prediction

### Expected behavior

One of the following should happen:

1. finish should accept structured final outputs, or
2. extra args passed to finish should be ignored instead of being treated as a tool execution error, or
3. ReAct should provide a clearer structured way to separate "end the tool loop" from "return final outputs"

At minimum, the current behavior should not produce a misleading execution error for a common model behavior.

### Additional context

There is a related open feature request about finish and the extraction step:
https://github.com/stanfordnlp/dspy/issues/8484

Also, the current main branch still defines finish as a no-arg tool in dspy/predict/react.py.

### DSPy version

3.0.4

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions