Skip to content

Implementation of Step-by-Step Distillation to train smaller models with less data while outperforming LLMs.

Notifications You must be signed in to change notification settings

anaumghori/step-distillation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 

Repository files navigation

This project explores a custom distillation approach designed to improve the performance of smaller models using rationale-based supervision. The goal is to assess whether incorporating step-by-step reasoning during training can help student models perform better with less labeled data.

Workflow Overview

  1. Model Selection: Used Qwen as the teacher model to generate rationales and T5-small as the student model to be trained.

  2. Dataset Preparation: Adapted the Salesforce/cos_e dataset from Hugging Face by keeping only the input and label columns for simplicity.

  3. Rationale Generation: Used 10-shot prompting with the teacher model to generate explanations (rationales) for the training examples. This step was performed only on the training split.

  4. Student Model Training: Trained the T5-small model using a custom "step-by-step distillation" process, which incorporates both the original inputs and the generated rationales.

  5. Performance Evaluation: Compared this approach with two baselines:

    • Standard fine-tuning on the original dataset.
    • Few-shot Chain-of-Thought (CoT) inference using Qwen.

Notebooks

Notebook Description
data_preparation.ipynb Prepares data by modifying the Salesforce/cos_e dataset, generating rationales with Qwen, and saving the updated dataset to Hugging Face.
standard_finetuning.ipynb Performs standard fine-tuning on the original Salesforce/cos_e dataset using T5-small.
step_distillation.ipynb Trains T5-small using the rationale-augmented distillation approach on the modified dataset.

Results

Method Data Used Accuracy
Standard Fine-tuning 100% 41.20%
Step-by-Step Distillation ~74% 52.60%

The rationale-augmented distillation approach achieved 52.60% accuracy while using approximately 26% less data than standard fine-tuning, which reached 41.20% accuracy. This suggests that training with intermediate reasoning steps can enhance the efficiency and effectiveness of smaller models.

About

Implementation of Step-by-Step Distillation to train smaller models with less data while outperforming LLMs.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published