What is the bug mentioned in the Colab notebook? #24

Manuel-Medina · 2022-01-06T07:25:45Z

Hello,
First of all, great job with the library! Congratulations! It is really useful and easy to use.

I was checking the Colab Notebook and saw this comment:

# we can loop through and look at each pairwise comparison to see whats going on
# except we can't trust the probabilities order, because of a bug in fitbert

Could you detail what the problem is, please? Does it happen only when using with_prob=True?

The text was updated successfully, but these errors were encountered:

sam-writer · 2022-01-06T23:01:31Z

glad it is useful!

It has been a while, but IIRC the bug is that the probabilities are sorted before being returned

Manuel-Medina · 2022-01-07T00:00:26Z

Thank you for your reply!

I see. Does this mean that, in the current state, fitbert is not reliable to get the probabilities of candidates?

Is this bug something you plan to address?

sam-writer · 2022-01-07T00:23:43Z

it's possible the bug isn't there anymore. I looked at the code and don't see anything suspicious, but it has been a while. I would test it and see if they make sense. I do plan on rewriting fitbert, but it likely won't happen for... a while, sorry.

Manuel-Medina · 2022-01-07T00:29:14Z

I've taken a look at the code and found the places were the words are ranked:

rank_single

            ranked_pairs = (
                seq(words_ids)
                .map(lambda x: float(probs[0][target_idx][x].item()))
                .zip(words)
                .sorted(key=lambda x: x[0], reverse=True)
            )

rank_multi

        ranked_pairs = (
            seq(options)
            .map(lambda x: masked_sent.replace(self.mask_token, x))
            .map(lambda x: self._get_sentence_probability(x))
            .zip(options)
            .sorted(key=lambda x: x[0], reverse=True)
        )

If I'm reading this correctly, you get the probabilities for each word, create a list of tuples (prob, word) and sort it at the end by probability in descending order. This seems to me the correct flow:

Say you have

options = ['a', 'an']
masked_sent = 'I am ***mask*** student'

Then each option is replaced in the masked sentence, so you have a sequence like

['I am a student', 'I am an student']

Probabilities are retrieved for each sentence, say

[0.02343, 0.000000123]

zipping
[(0.02343, 'a'), (0.000000123, 'an')]

and sorting by the first element of the tuple, which in this case will give the same sequence as result.

Am I looking at the correct place?

Manuel-Medina · 2022-01-07T00:31:10Z

it's possible the bug isn't there anymore. I looked at the code and don't see anything suspicious, but it has been a while. I would test it and see if they make sense. I do plan on rewriting fitbert, but it likely won't happen for... a while, sorry.

No problem! Thank you for your efforts!
I'm interested in the library, so if I can find the bug and address it, I would like to submit a pull request to fix it 😃

sam-writer · 2022-01-07T00:38:57Z

yup, you're looking in the right place and are thinking about it right. so... add some print statements and use an option that really doesn't fit and see if anything looks off?

Manuel-Medina · 2022-01-07T00:39:59Z

Great! I will check it then and report what I could find.

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is the bug mentioned in the Colab notebook? #24

What is the bug mentioned in the Colab notebook? #24

Manuel-Medina commented Jan 6, 2022

sam-writer commented Jan 6, 2022

Manuel-Medina commented Jan 7, 2022

sam-writer commented Jan 7, 2022

Manuel-Medina commented Jan 7, 2022

Manuel-Medina commented Jan 7, 2022

sam-writer commented Jan 7, 2022

Manuel-Medina commented Jan 7, 2022

What is the bug mentioned in the Colab notebook? #24

What is the bug mentioned in the Colab notebook? #24

Comments

Manuel-Medina commented Jan 6, 2022

sam-writer commented Jan 6, 2022

Manuel-Medina commented Jan 7, 2022

sam-writer commented Jan 7, 2022

Manuel-Medina commented Jan 7, 2022

Manuel-Medina commented Jan 7, 2022

sam-writer commented Jan 7, 2022

Manuel-Medina commented Jan 7, 2022