You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Aiming to implement a unified fine-tuning approach for Llama 3 using ORPO (Odds Ratio Preference Optimization), showcasing it through interactive marimo notebooks. ORPO combines SFT + DPO into a single efficient process, making it more resource-efficient while maintaining or improving performance.
🎯 Objectives
Implement ORPO fine-tuning for Llama 3.1 8B model using the TRL library
🔍 Technical Details
ORPO Implementation
Combine instruction tuning and preference alignment in a single stage
📋 Overview
Aiming to implement a unified fine-tuning approach for Llama 3 using
ORPO
(Odds Ratio Preference Optimization), showcasing it through interactive marimo notebooks. ORPO combinesSFT + DPO
into a single efficient process, making it more resource-efficient while maintaining or improving performance.🎯 Objectives
🔍 Technical Details
ORPO Implementation
💡 Additional Contributions
📝 Todo List
🤝 Related Issues
📚 Resources
Looking forward to feedback and suggestions from the community!
The text was updated successfully, but these errors were encountered: