DPO Fine-Tuning #73

AnirudhJM24 · 2024-10-04T11:49:54Z

The repository contains examples to fine-tune the model using Supervised Fine Tuning. I wish to add examples of Transformer Reinforcement Learning (TRL) particulary Direct Policy Optimization (DPO)

ariG23498 · 2024-10-07T11:39:36Z

Hey @AnirudhJM24

I really like the idea, but would also ask you to share a rough colab notebook for this. I don't want a very complicated setup for SFT in the repository. Having said that, if you can showcase the workflow in a very simple way, I would be open to adding it.

Also do take a look at the /fine_tune directory for inspiration.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DPO Fine-Tuning #73

DPO Fine-Tuning #73

AnirudhJM24 commented Oct 4, 2024

ariG23498 commented Oct 7, 2024

DPO Fine-Tuning #73

DPO Fine-Tuning #73

Comments

AnirudhJM24 commented Oct 4, 2024

ariG23498 commented Oct 7, 2024