[Question] Impossible to reproduce results, model performs poorly #32

maximegmd · 2023-09-14T08:32:32Z

❓ General Questions

I evaluated the model using lm-evaluation-harness on MedMCQA, MedQA-USMLE and PubMedQA and the model performs barely above llama2 7b with only 38% on the USMLE, 36% on MedMCQa and 73.9% on PubMedQA.

Could you describe how you got your results?

s1ghhh · 2023-09-14T12:13:36Z

emmmm, me too.

llSourcell · 2023-09-14T16:54:08Z

Hey Doc! The evaluation function i used is in the .ipynb attached in the repository. I created a semantic similarity threshold for all responses congruent with possible responses in the USMLE. So it doesn't have to be a verbatim response, thus the accuracy was higher. Also, i am about to release a new fine-tuned model next week. the goal here is to keep on improving. i just merged my first PR. posted a paid bounty last week for UI issues. would love your help!

maximegmd added the question Further information is requested label Sep 14, 2023

pterameta pushed a commit to pterameta/DoctorGPT that referenced this issue Sep 20, 2023

. (llSourcell#32)

7f349a4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Impossible to reproduce results, model performs poorly #32

[Question] Impossible to reproduce results, model performs poorly #32

maximegmd commented Sep 14, 2023

s1ghhh commented Sep 14, 2023

llSourcell commented Sep 14, 2023

[Question] Impossible to reproduce results, model performs poorly #32

[Question] Impossible to reproduce results, model performs poorly #32

Comments

maximegmd commented Sep 14, 2023

❓ General Questions

s1ghhh commented Sep 14, 2023

llSourcell commented Sep 14, 2023