Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The implementation of stage 2 with axolotl #107

Open
boxiaowave opened this issue May 24, 2024 · 0 comments
Open

The implementation of stage 2 with axolotl #107

boxiaowave opened this issue May 24, 2024 · 0 comments

Comments

@boxiaowave
Copy link

Thanks for the wonderful work.

I am trying to improve the performance with medusa2. But when I start the training of stage 2 based on the model from stage 1, I found the medusa loss is still high. According to the source code of load_model() function, it seems that the medusa heads don't use the weights of base model to initialize. Is it a bug?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant