Skip to content

Distill training pairs for summarizer and explainer #99

@William-Hill

Description

@William-Hill

Summary

Generate training pairs for the summarizer and explainer tasks via Claude API distillation. These are the two existing task types in the pipeline.

Depends On

Tasks

  • Run python -m training.distill --school bishop-state --task summarizer (~1,500 pairs)
  • Run python -m training.distill --school bishop-state --task explainer (~1,500 pairs)
  • Run python -m training.prepare --school bishop-state (validate, dedup, 80/10/10 split)
  • Verify pair quality: spot-check 10 pairs per task for correctness
  • Commit training data to training_data/bishop-state/ (JSONL files)
  • Track distillation cost (expected: $3-5 total)

Acceptance Criteria

  • = 1,200 validated pairs per task after dedup

  • Train/val/test splits at training_data/bishop-state/final/{task}/{split}.jsonl
  • All pairs are valid JSON with correct schema

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions