Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluation Metrics seem to be over-counting/ inflating the counts of true positives? #28

Open
keanepotato opened this issue Nov 7, 2024 · 1 comment

Comments

@keanepotato
Copy link

Hi Kevin,

Thanks for your contribution to the ABSA task. I just wanted to bring your attention to the following code block in your utils.py file within the InstructABSA folder. Seems that because each matched prediction isn't removed from the pred_val list, in the case where gt_val contains repeated instances ['food', 'food'], but pred_val contains only one instance ['food'], the model is considered to predict all instances correctly, despite missing out the second 'food'?

  def get_metrics(self, y_true, y_pred, is_triplet_extraction=False):
    total_pred = 0
    total_gt = 0
    tp = 0
    if not is_triplet_extraction:
        for gt, pred in zip(y_true, y_pred):
            gt_list = gt.split(', ')
            pred_list = pred.split(', ')
            total_pred+=len(pred_list)
            total_gt+=len(gt_list)
            for gt_val in gt_list:
                for pred_val in pred_list:
                    if pred_val in gt_val or gt_val in pred_val:
                        tp+=1
                        break

    else:
        for gt, pred in zip(y_true, y_pred):
            gt_list = gt.split(', ')
            pred_list = pred.split(', ')
            total_pred+=len(pred_list)
            total_gt+=len(gt_list)
            for gt_val in gt_list:
                gt_asp = gt_val.split(':')[0]

                try:
                    gt_op = gt_val.split(':')[1]
                except:
                    continue

                try:
                    gt_sent = gt_val.split(':')[2]
                except:
                    continue

                for pred_val in pred_list:
                    pr_asp = pred_val.split(':')[0]

                    try:
                        pr_op = pred_val.split(':')[1]
                    except:
                        continue
@keanepotato keanepotato changed the title Evaluation Metrics seem to be double-counting/ inflating the counts of true positives? Evaluation Metrics seem to be over-counting/ inflating the counts of true positives? Nov 7, 2024
@keanepotato
Copy link
Author

Additionally, I believe this doesn't count the exact matches of aspects, but rather only if a string is contained within another string.

So, in the case where the model matches "of", but the ground truth is "bowl of sushi", this is marked as a true positive as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant