Fine-tune `Llama 3.1` 8B with `ORPO` #82

Haleshot · 2024-10-25T17:02:51Z

📋 Overview

Aiming to implement a unified fine-tuning approach for Llama 3 using ORPO (Odds Ratio Preference Optimization), showcasing it through interactive marimo notebooks. ORPO combines SFT + DPO into a single efficient process, making it more resource-efficient while maintaining or improving performance.

🎯 Objectives

Implement ORPO fine-tuning for Llama 3.1 8B model using the TRL library

🔍 Technical Details

ORPO Implementation

Combine instruction tuning and preference alignment in a single stage
Use TRL library for implementation
Target model: Llama 3 8B
Based on approaches from:
- MLAbonne's ORPO Guide

💡 Additional Contributions

Plans to create additional demos using Marimo for other repository tasks/issues
Complement existing Gradio demos with Marimo alternatives

📝 Todo List

Set up initial marimo notebook structure
Implement ORPO fine-tuning pipeline

🤝 Related Issues

References Call for Contributions (Call for contributions #43)
Complements the Gradio Demos initiative (Building Gradio Demos with Llama #44)

📚 Resources

Looking forward to feedback and suggestions from the community!

The text was updated successfully, but these errors were encountered:

Haleshot mentioned this issue Oct 25, 2024

Call for contributions #43

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fine-tune `Llama 3.1` 8B with `ORPO` #82

Fine-tune `Llama 3.1` 8B with `ORPO` #82

Haleshot commented Oct 25, 2024 •

edited

Loading

Fine-tune Llama 3.1 8B with ORPO #82

Fine-tune Llama 3.1 8B with ORPO #82

Comments

Haleshot commented Oct 25, 2024 • edited Loading

📋 Overview

🎯 Objectives

🔍 Technical Details

ORPO Implementation

💡 Additional Contributions

📝 Todo List

🤝 Related Issues

📚 Resources

Fine-tune `Llama 3.1` 8B with `ORPO` #82

Fine-tune `Llama 3.1` 8B with `ORPO` #82

Haleshot commented Oct 25, 2024 •

edited

Loading