Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fine-tune Llama 3.1 8B with ORPO #82

Open
2 tasks
Haleshot opened this issue Oct 25, 2024 · 0 comments
Open
2 tasks

Fine-tune Llama 3.1 8B with ORPO #82

Haleshot opened this issue Oct 25, 2024 · 0 comments

Comments

@Haleshot
Copy link

Haleshot commented Oct 25, 2024

📋 Overview

Aiming to implement a unified fine-tuning approach for Llama 3 using ORPO (Odds Ratio Preference Optimization), showcasing it through interactive marimo notebooks. ORPO combines SFT + DPO into a single efficient process, making it more resource-efficient while maintaining or improving performance.

🎯 Objectives

  • Implement ORPO fine-tuning for Llama 3.1 8B model using the TRL library

🔍 Technical Details

ORPO Implementation

  • Combine instruction tuning and preference alignment in a single stage
  • Use TRL library for implementation
  • Target model: Llama 3 8B
  • Based on approaches from:

💡 Additional Contributions

  • Plans to create additional demos using Marimo for other repository tasks/issues
  • Complement existing Gradio demos with Marimo alternatives

📝 Todo List

  • Set up initial marimo notebook structure
  • Implement ORPO fine-tuning pipeline

🤝 Related Issues

📚 Resources

Looking forward to feedback and suggestions from the community!

@Haleshot Haleshot mentioned this issue Oct 25, 2024
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant