You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to improve the performance with medusa2. But when I start the training of stage 2 based on the model from stage 1, I found the medusa loss is still high. According to the source code of load_model() function, it seems that the medusa heads don't use the weights of base model to initialize. Is it a bug?
The text was updated successfully, but these errors were encountered:
Thanks for the wonderful work.
I am trying to improve the performance with medusa2. But when I start the training of stage 2 based on the model from stage 1, I found the medusa loss is still high. According to the source code of load_model() function, it seems that the medusa heads don't use the weights of base model to initialize. Is it a bug?
The text was updated successfully, but these errors were encountered: