Results of AgentPoirot with Claude-3.7-Sonnet as Backbone

Hi,

Thank you for your contributions.

I ran some experiments using the code you provided and switched from GPT-4o to Claude-3.7-Sonnet. I would expect similar performance to GPT-4o when using Claude. However, the current run only achieves 0.25 ROUGE-1 on both insight-level and summary-level evaluation, which is far behind the reported results (i.e., ~0.32).

Our parameters are:
`{
    "benchmark_type": "full",
    "branch_depth": 4,
    "max_questions": 3,
    "model_name": "claude"
}`

This should be aligned with your experiments reported in the paper. Could you please give some suggestions for our experiments?

Best,
Ethan

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Results of AgentPoirot with Claude-3.7-Sonnet as Backbone #18

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Results of AgentPoirot with Claude-3.7-Sonnet as Backbone #18

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions