The implementation of stage 2 with axolotl #107

boxiaowave · 2024-05-24T02:42:30Z

Thanks for the wonderful work.

I am trying to improve the performance with medusa2. But when I start the training of stage 2 based on the model from stage 1, I found the medusa loss is still high. According to the source code of load_model() function, it seems that the medusa heads don't use the weights of base model to initialize. Is it a bug?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The implementation of stage 2 with axolotl #107

The implementation of stage 2 with axolotl #107

boxiaowave commented May 24, 2024

The implementation of stage 2 with axolotl #107

The implementation of stage 2 with axolotl #107

Comments

boxiaowave commented May 24, 2024