-
Notifications
You must be signed in to change notification settings - Fork 0
Distill training pairs for summarizer and explainer #99
Copy link
Copy link
Open
Labels
area:aiAI/ML, NLQ featuresAI/ML, NLQ featuresfine-tuning: student-explainabilityFine-tune Qwen 3.5 for SHAP narrator, summarizer, and explainer tasksFine-tune Qwen 3.5 for SHAP narrator, summarizer, and explainer taskstype:featureNew featureNew feature
Description
Summary
Generate training pairs for the summarizer and explainer tasks via Claude API distillation. These are the two existing task types in the pipeline.
Depends On
Tasks
- Run
python -m training.distill --school bishop-state --task summarizer(~1,500 pairs) - Run
python -m training.distill --school bishop-state --task explainer(~1,500 pairs) - Run
python -m training.prepare --school bishop-state(validate, dedup, 80/10/10 split) - Verify pair quality: spot-check 10 pairs per task for correctness
- Commit training data to
training_data/bishop-state/(JSONL files) - Track distillation cost (expected: $3-5 total)
Acceptance Criteria
-
= 1,200 validated pairs per task after dedup
- Train/val/test splits at
training_data/bishop-state/final/{task}/{split}.jsonl - All pairs are valid JSON with correct schema
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
area:aiAI/ML, NLQ featuresAI/ML, NLQ featuresfine-tuning: student-explainabilityFine-tune Qwen 3.5 for SHAP narrator, summarizer, and explainer tasksFine-tune Qwen 3.5 for SHAP narrator, summarizer, and explainer taskstype:featureNew featureNew feature