Skip to content

Distill training pairs for SHAP narrator #100

@William-Hill

Description

@William-Hill

Summary

Generate training pairs for the new SHAP narrator task. Each pair maps a student's SHAP values + profile to an advisor-facing narrative with grounded explanations and interventions.

Depends On

Prerequisites

  • SHAP data must be populated in student_level_with_predictions.shap_explanations (run ML pipeline with SHAP step)
  • Readiness scores must exist in llm_recommendations (run readiness score generator)

Tasks

  • Ensure SHAP data is populated in DB (run python ai_model/complete_ml_pipeline.py if needed)
  • Run python -m training.distill --school bishop-state --task narrator (~1,500 pairs)
  • Run python -m training.prepare --school bishop-state --task narrator
  • Verify SHAP grounding: spot-check 10 pairs — do narratives reference actual SHAP features?
  • Commit training data to training_data/bishop-state/
  • Track distillation cost (expected: $2-4)

Acceptance Criteria

  • = 1,200 validated narrator pairs after dedup

  • Train/val/test splits at training_data/bishop-state/final/narrator/{split}.jsonl
  • Spot-check confirms narratives cite specific SHAP features by name and magnitude

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions