New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Mistral-7b, Zephyr-7b-alpha #52

Open

YeonwooSung opened this issue Oct 25, 2023 · 1 comment

Owner

YeonwooSung commented Oct 25, 2023 •

edited

Loading

Mistral-7b-v0.1, Zephyr-7b-alpha

Mistral-7b outperformed Llama2-13b-hf and gpt-3.5-turbo
Zephyr-7b-alpha outperformed mistral-7b, and beat Llama2-70b

DPO vs PPO (DPO is better for finetuning?)

Zephyr-7b-alpha is a finetuned model of the Mistral-7b with DPO trainer.
- Uses subset of UltraFeedback
HuggingFace team found that PPO is fragile with hyperparameters, while DPO is robust for hyperparameters

Owner Author

YeonwooSung commented Oct 25, 2023

source code for mistral llm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment