Skip to content

Conversation

@razkenari
Copy link

@razkenari razkenari commented Dec 30, 2025

The TraceLevelInput type supports list[ToolExecution] in session_history, but _extract_trace_level was not populating it. This caused FaithfulnessEvaluator to evaluate without tool context, even though its prompt mentions tool outputs.

This fix adds tool execution extraction to match _extract_tool_level behavior.

Description

_extract_trace_level() now includes tool executions in session_history, matching the behavior of _extract_tool_level(). This ensures FaithfulnessEvaluator receives tool call/result data for proper faithfulness evaluation.

Related Issues

Fixes #76

Documentation PR

N/A - no documentation changes needed

Type of Change

Bug fix

Testing

  • Verified tool executions now appear in TraceLevelInput.session_history

  • Compared output with _extract_tool_level() to confirm consistency

  • I ran hatch run prepare

Checklist

  • I have read the CONTRIBUTING document
  • I have added any necessary tests that prove my fix is effective or my feature works
  • I have updated the documentation accordingly
  • I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

The TraceLevelInput type supports list[ToolExecution] in session_history,
but _extract_trace_level was not populating it. This caused FaithfulnessEvaluator
to evaluate without tool context, even though its prompt mentions tool outputs.

This fix adds tool execution extraction to match _extract_tool_level behavior.
Copy link
Member

@cagataycali cagataycali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! ✅

This PR correctly addresses issue #76 where FaithfulnessEvaluator was missing tool execution context.

Review Summary

Change Analysis:

  • Adds tool execution extraction to _extract_trace_level()
  • Matches existing behavior in _extract_tool_level()
  • Small, focused change (+14/-1 lines)
  • Single file modified (trace_extractor.py)

Code Quality:

  • ✅ Fix is consistent with existing patterns in the codebase
  • ✅ PR description clearly explains the problem and solution
  • ✅ Links to related issue (#76)
  • ✅ Checklist completed

Impact:

  • FaithfulnessEvaluator will now properly receive tool call/result data
  • Enables accurate faithfulness evaluation for agents using tools
  • No breaking changes expected

Suggestion for Maintainers:
Consider if tests should be added to verify tool executions appear in TraceLevelInput.session_history. The PR mentions manual verification but automated tests would prevent regression.


Review by strands-coder autonomous agent 🤖

@cagataycali
Copy link
Member

🤖 Merge Readiness Check

Status: ⚠️ Almost Ready (CI Pending)

Criteria Status
Review Decision ✅ APPROVED
CI Status ⏳ PENDING
Mergeable ✅ No conflicts

This fix addresses tool execution extraction in trace-level operations - important for accurate evaluation traces.

Recommendation: CI needs to run. Once CI passes, this PR is ready for merge.


Automated analysis by strands-coder 🤖

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] FaithfulnessEvaluator doesn't receive tool outputs due to missing extraction in TRACE_LEVEL

2 participants