Skip to content
This repository has been archived by the owner on Mar 1, 2022. It is now read-only.

What is the bug mentioned in the Colab notebook? #24

Open
Manuel-Medina opened this issue Jan 6, 2022 · 7 comments
Open

What is the bug mentioned in the Colab notebook? #24

Manuel-Medina opened this issue Jan 6, 2022 · 7 comments

Comments

@Manuel-Medina
Copy link

Hello,
First of all, great job with the library! Congratulations! It is really useful and easy to use.

I was checking the Colab Notebook and saw this comment:

# we can loop through and look at each pairwise comparison to see whats going on
# except we can't trust the probabilities order, because of a bug in fitbert

Could you detail what the problem is, please? Does it happen only when using with_prob=True?

@sam-writer
Copy link
Contributor

glad it is useful!

It has been a while, but IIRC the bug is that the probabilities are sorted before being returned

@Manuel-Medina
Copy link
Author

Thank you for your reply!

I see. Does this mean that, in the current state, fitbert is not reliable to get the probabilities of candidates?

Is this bug something you plan to address?

@sam-writer
Copy link
Contributor

it's possible the bug isn't there anymore. I looked at the code and don't see anything suspicious, but it has been a while. I would test it and see if they make sense. I do plan on rewriting fitbert, but it likely won't happen for... a while, sorry.

@Manuel-Medina
Copy link
Author

I've taken a look at the code and found the places were the words are ranked:

  1. rank_single
            ranked_pairs = (
                seq(words_ids)
                .map(lambda x: float(probs[0][target_idx][x].item()))
                .zip(words)
                .sorted(key=lambda x: x[0], reverse=True)
            )
  1. rank_multi
        ranked_pairs = (
            seq(options)
            .map(lambda x: masked_sent.replace(self.mask_token, x))
            .map(lambda x: self._get_sentence_probability(x))
            .zip(options)
            .sorted(key=lambda x: x[0], reverse=True)
        )

If I'm reading this correctly, you get the probabilities for each word, create a list of tuples (prob, word) and sort it at the end by probability in descending order. This seems to me the correct flow:

Say you have

options = ['a', 'an']
masked_sent = 'I am ***mask*** student'

Then each option is replaced in the masked sentence, so you have a sequence like

['I am a student', 'I am an student']

Probabilities are retrieved for each sentence, say

[0.02343, 0.000000123]

zipping
[(0.02343, 'a'), (0.000000123, 'an')]

and sorting by the first element of the tuple, which in this case will give the same sequence as result.

Am I looking at the correct place?

@Manuel-Medina
Copy link
Author

it's possible the bug isn't there anymore. I looked at the code and don't see anything suspicious, but it has been a while. I would test it and see if they make sense. I do plan on rewriting fitbert, but it likely won't happen for... a while, sorry.

No problem! Thank you for your efforts!
I'm interested in the library, so if I can find the bug and address it, I would like to submit a pull request to fix it 😃

@sam-writer
Copy link
Contributor

yup, you're looking in the right place and are thinking about it right. so... add some print statements and use an option that really doesn't fit and see if anything looks off?

@Manuel-Medina
Copy link
Author

Great! I will check it then and report what I could find.

Thank you!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants