-
Notifications
You must be signed in to change notification settings - Fork 14
What is the bug mentioned in the Colab notebook? #24
Comments
glad it is useful! It has been a while, but IIRC the bug is that the probabilities are sorted before being returned |
Thank you for your reply! I see. Does this mean that, in the current state, fitbert is not reliable to get the probabilities of candidates? Is this bug something you plan to address? |
it's possible the bug isn't there anymore. I looked at the code and don't see anything suspicious, but it has been a while. I would test it and see if they make sense. I do plan on rewriting fitbert, but it likely won't happen for... a while, sorry. |
I've taken a look at the code and found the places were the words are ranked:
ranked_pairs = (
seq(words_ids)
.map(lambda x: float(probs[0][target_idx][x].item()))
.zip(words)
.sorted(key=lambda x: x[0], reverse=True)
)
ranked_pairs = (
seq(options)
.map(lambda x: masked_sent.replace(self.mask_token, x))
.map(lambda x: self._get_sentence_probability(x))
.zip(options)
.sorted(key=lambda x: x[0], reverse=True)
) If I'm reading this correctly, you get the probabilities for each word, create a list of tuples (prob, word) and sort it at the end by probability in descending order. This seems to me the correct flow: Say you have options = ['a', 'an']
masked_sent = 'I am ***mask*** student' Then each option is replaced in the masked sentence, so you have a sequence like
Probabilities are retrieved for each sentence, say
zipping and sorting by the first element of the tuple, which in this case will give the same sequence as result. Am I looking at the correct place? |
No problem! Thank you for your efforts! |
yup, you're looking in the right place and are thinking about it right. so... add some print statements and use an option that really doesn't fit and see if anything looks off? |
Great! I will check it then and report what I could find. Thank you! |
Hello,
First of all, great job with the library! Congratulations! It is really useful and easy to use.
I was checking the Colab Notebook and saw this comment:
Could you detail what the problem is, please? Does it happen only when using
with_prob=True
?The text was updated successfully, but these errors were encountered: