Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Impossible to reproduce results, model performs poorly #32

Open
maximegmd opened this issue Sep 14, 2023 · 2 comments
Open
Labels
question Further information is requested

Comments

@maximegmd
Copy link

❓ General Questions

I evaluated the model using lm-evaluation-harness on MedMCQA, MedQA-USMLE and PubMedQA and the model performs barely above llama2 7b with only 38% on the USMLE, 36% on MedMCQa and 73.9% on PubMedQA.

Could you describe how you got your results?

@maximegmd maximegmd added the question Further information is requested label Sep 14, 2023
@s1ghhh
Copy link

s1ghhh commented Sep 14, 2023

emmmm, me too.

@llSourcell
Copy link
Owner

Hey Doc! The evaluation function i used is in the .ipynb attached in the repository. I created a semantic similarity threshold for all responses congruent with possible responses in the USMLE. So it doesn't have to be a verbatim response, thus the accuracy was higher. Also, i am about to release a new fine-tuned model next week. the goal here is to keep on improving. i just merged my first PR. posted a paid bounty last week for UI issues. would love your help!

pterameta pushed a commit to pterameta/DoctorGPT that referenced this issue Sep 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants