Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducing the results with LLaMa and TriviaQA (Figure 8) #12

Open
YasamanJafari opened this issue Jul 23, 2024 · 3 comments
Open

Reproducing the results with LLaMa and TriviaQA (Figure 8) #12

YasamanJafari opened this issue Jul 23, 2024 · 3 comments

Comments

@YasamanJafari
Copy link

Hi,

Thank you for the excellent paper and for providing the code! I have been trying to reproduce the results from Figure 8 of the paper using LLaMa-7B and LLaMa-13B and the TriviaQA dataset I downloaded using the command in ReadMe.
However, I get the following values:

7B:
0 docs: 50.8, 1 doc: 54.1, 2 docs: 55.9, 3 docs: 56.4

13B:
0 docs: 57.8, 1 doc: 58.8, 2 docs: 59.8, 3 docs: 60.4

Can you please provide some insights/information that explains this discrepancy?
(The numbers for 1-3 documents are similar but there is a ~3% gap for 0 documents.)

Screenshot 2024-07-23 at 12 29 37 PM

@oriram
Copy link
Contributor

oriram commented Jul 24, 2024 via email

@YasamanJafari
Copy link
Author

Thank you for your response!

  1. The same thing happens when experimenting with the NQ dataset. The results for 1-3 documents are very similar, but there is a noticeable difference between the results for 0 documents. The results I get are as follows:

LLaMa1-7B:
0 docs: 14.6%, 1 doc: 28.4%, 2 docs: 28.6%, 3 docs: 28.1%

LLaMa1-13B:
0 docs: 18.3%, 1 doc: 30.4%, 2 docs: 30.3%, 3 docs: 30.5%

  1. I am using LLaMa 1. Is there a specific checkpoint you are using that may explain the discrepancy?

@YasamanJafari
Copy link
Author

Hi again, I just wanted to follow up on this and check if there have been any updates about this discrepancy!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants