Skip to content

Fix/tool call accuracy (#2079) #2092

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

sdivye92
Copy link

  • Fix ToolCallAccuracy returns score higher than 1.0 (ToolCallAccuracy returns score higher than 1.0 #2079)
    • Using zip to combine reference_tool_calls and pred_tool_calls
  • Updated is_sequence_aligned method to check for same len of pred and ref seq
    • This fixes issue when tool call accuracy is 1 even when pred_tool_calls has more tool calls than reference_tool_calls.
    • equal check between pred_sequence and ref_sequence will check that both should be same length.
    • when equal order of occurence of their elements are same.

sdivye92 added 2 commits June 26, 2025 08:28
- equal check between pred_sequence and ref_sequence will check that
  both should be same length
- when equal order of occurence of their elements are same
@dosubot dosubot bot added the size:S This PR changes 10-29 lines, ignoring generated files. label Jun 26, 2025
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

Fixed critical scoring bug in ToolCallAccuracy metric where it could incorrectly return scores higher than 1.0 due to double-counting matches.

  • Modified ragas/src/ragas/metrics/_tool_call_accuracy.py to use zip() for one-to-one matching between reference and prediction tool calls
  • Updated is_sequence_aligned to enforce equal lengths between prediction and reference sequences
  • Added validation to ensure tool calls are made in correct order and quantity
  • Fixed edge case where perfect scores were given despite extra predicted tool calls

1 file reviewed, 1 comment
Edit PR Review Bot Settings | Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:S This PR changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant