unable to reproduce the same accuracy #16

beomseokg · 2025-02-03T03:40:57Z

Hello, thank you for the great repo.

Unfortunately, I'm facing some issues in reproducing the similar accuracy in the paper, particularly llama-2 13b chat models. I've downloaded the synthetic trajectories provided in google drive and followed all the steps. It seems self-differentiation and group planning work, but the accuracy is just lower than the one in the paper.

llama-2 13b chat model's average F1 score on easy data in HotpotQA reaches ~38% (trajectory generated from 13b) and ~44% (trajectory generated from 70b). The number of trajectories are 200, so I expected them to be ~50% and ~60%, as shown in Figure 3(b) and (f).

could you please provide the 13B model files (PEFT adapters) by any chance?
just to make sure, could you please confirm I'm loading the correct model for fine-tuning. My command was like this:

first, "python3 -m fastchat.serve.model_worker --port 21002 --worker http://localhost:21002 --model-names llama-2-13b-chat --model-path meta-llama/Llama-2-13b-chat-hf", and then, "Scripts/fastchat_lora.sh".

zxlzr · 2025-02-03T04:14:53Z

hi, we will address this issue as soon as possible. We suggest you retry running it a few times, as different GPUs and environments may introduce some variance.

beomseokg · 2025-02-03T04:35:07Z

Thank you for the prompt response and suggestion! I'm retrying running it. I saw some variance (e.g., ~41%, ~42%, ~44% for 70b trajectory and 13b model) but hard to get the similar level of accuracy (~60%). It would be really nice if we can use the model checkpoints for PEFT to check how losses evolve and evaluate them.

Rolnand · 2025-02-03T05:06:14Z

You can refer to this link to deploy the model.Scripts

If you still have problems, you can leave your email address and we will send you the trajectory results we saved in the experiment.

beomseokg · 2025-02-03T05:11:40Z

Appreciate it! I understood there are Scripts for loading models. But, could you please share the lora checkpoints as well? (i.e., saved files after lora fine-tuning). My email is [email protected].

I think trajectory results are already shared in the repo (Google Drive).

zxlzr added the question Further information is requested label Feb 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unable to reproduce the same accuracy #16

unable to reproduce the same accuracy #16

beomseokg commented Feb 3, 2025

zxlzr commented Feb 3, 2025

beomseokg commented Feb 3, 2025

Rolnand commented Feb 3, 2025

beomseokg commented Feb 3, 2025

unable to reproduce the same accuracy #16

unable to reproduce the same accuracy #16

Comments

beomseokg commented Feb 3, 2025

zxlzr commented Feb 3, 2025

beomseokg commented Feb 3, 2025

Rolnand commented Feb 3, 2025

beomseokg commented Feb 3, 2025