You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I can't find the part for "per-token KL penalty from the SFT model" during the PPO training in the file model/model_training/trainer_rl.py, maybe I missed something. Could you tell me how these two loss combined?
I found the loss function "PolyLoss" in the model/model_training/losses.py. Is this the loss function for the "per-token KL penalty from the SFT model" part? If so, I am wondering why there is a CE function combined?
This discussion was converted from issue #2608 on June 09, 2023 11:46.
Heading
Bold
Italic
Quote
Code
Link
Numbered list
Unordered list
Task list
Attach files
Mention
Reference
Menu
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
model/model_training/trainer_rl.py
, maybe I missed something. Could you tell me how these two loss combined?model/model_training/losses.py
. Is this the loss function for the "per-token KL penalty from the SFT model" part? If so, I am wondering why there is a CE function combined?Thanks a lot.
Beta Was this translation helpful? Give feedback.
All reactions