This project explores a custom distillation approach designed to improve the performance of smaller models using rationale-based supervision. The goal is to assess whether incorporating step-by-step reasoning during training can help student models perform better with less labeled data.
-
Model Selection: Used Qwen as the teacher model to generate rationales and T5-small as the student model to be trained.
-
Dataset Preparation: Adapted the
Salesforce/cos_edataset from Hugging Face by keeping only theinputandlabelcolumns for simplicity. -
Rationale Generation: Used 10-shot prompting with the teacher model to generate explanations (rationales) for the training examples. This step was performed only on the training split.
-
Student Model Training: Trained the T5-small model using a custom "step-by-step distillation" process, which incorporates both the original inputs and the generated rationales.
-
Performance Evaluation: Compared this approach with two baselines:
- Standard fine-tuning on the original dataset.
- Few-shot Chain-of-Thought (CoT) inference using Qwen.
| Notebook | Description |
|---|---|
| data_preparation.ipynb | Prepares data by modifying the Salesforce/cos_e dataset, generating rationales with Qwen, and saving the updated dataset to Hugging Face. |
| standard_finetuning.ipynb | Performs standard fine-tuning on the original Salesforce/cos_e dataset using T5-small. |
| step_distillation.ipynb | Trains T5-small using the rationale-augmented distillation approach on the modified dataset. |
| Method | Data Used | Accuracy |
|---|---|---|
| Standard Fine-tuning | 100% | 41.20% |
| Step-by-Step Distillation | ~74% | 52.60% |
The rationale-augmented distillation approach achieved 52.60% accuracy while using approximately 26% less data than standard fine-tuning, which reached 41.20% accuracy. This suggests that training with intermediate reasoning steps can enhance the efficiency and effectiveness of smaller models.