Releases: lucidrains/PaLM-rlhf-pytorch
Releases · lucidrains/PaLM-rlhf-pytorch
0.0.32
allow for fine tuning entire model without LoRA, project management
0.0.31
in case weight decay is helpful
0.0.30
add gradient clipping for actor critic
0.0.29
be able to generate multiple sequences at the end and pick the one wi…
0.0.28
make sure non-binned rewards model works
0.0.27
make sure non-binned rewards model works
0.0.26
more cleanup
0.0.25
its just plain ppo now
0.0.24
cleanup further
0.0.23
rename to ActorCritic and cleanup