Evaluation Metrics seem to be over-counting/ inflating the counts of true positives? #28

keanepotato · 2024-11-07T01:59:35Z

Hi Kevin,

Thanks for your contribution to the ABSA task. I just wanted to bring your attention to the following code block in your utils.py file within the InstructABSA folder. Seems that because each matched prediction isn't removed from the pred_val list, in the case where gt_val contains repeated instances ['food', 'food'], but pred_val contains only one instance ['food'], the model is considered to predict all instances correctly, despite missing out the second 'food'?

  def get_metrics(self, y_true, y_pred, is_triplet_extraction=False):
    total_pred = 0
    total_gt = 0
    tp = 0
    if not is_triplet_extraction:
        for gt, pred in zip(y_true, y_pred):
            gt_list = gt.split(', ')
            pred_list = pred.split(', ')
            total_pred+=len(pred_list)
            total_gt+=len(gt_list)
            for gt_val in gt_list:
                for pred_val in pred_list:
                    if pred_val in gt_val or gt_val in pred_val:
                        tp+=1
                        break

    else:
        for gt, pred in zip(y_true, y_pred):
            gt_list = gt.split(', ')
            pred_list = pred.split(', ')
            total_pred+=len(pred_list)
            total_gt+=len(gt_list)
            for gt_val in gt_list:
                gt_asp = gt_val.split(':')[0]

                try:
                    gt_op = gt_val.split(':')[1]
                except:
                    continue

                try:
                    gt_sent = gt_val.split(':')[2]
                except:
                    continue

                for pred_val in pred_list:
                    pr_asp = pred_val.split(':')[0]

                    try:
                        pr_op = pred_val.split(':')[1]
                    except:
                        continue

The text was updated successfully, but these errors were encountered:

keanepotato · 2024-11-07T02:17:28Z

Additionally, I believe this doesn't count the exact matches of aspects, but rather only if a string is contained within another string.

So, in the case where the model matches "of", but the ground truth is "bowl of sushi", this is marked as a true positive as well.

keanepotato changed the title ~~Evaluation Metrics seem to be double-counting/ inflating the counts of true positives?~~ Evaluation Metrics seem to be over-counting/ inflating the counts of true positives? Nov 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation Metrics seem to be over-counting/ inflating the counts of true positives? #28

Evaluation Metrics seem to be over-counting/ inflating the counts of true positives? #28

keanepotato commented Nov 7, 2024

keanepotato commented Nov 7, 2024

Evaluation Metrics seem to be over-counting/ inflating the counts of true positives? #28

Evaluation Metrics seem to be over-counting/ inflating the counts of true positives? #28

Comments

keanepotato commented Nov 7, 2024

keanepotato commented Nov 7, 2024