Training Schedule / Curriculum #17

voegtlel · 2024-10-10T16:23:35Z

Allow defining a training schedule. I.e. shifting weights of datasets during training per step.

One example could be:

__module__: megatron.energon
__class__: Metadataset
splits:
  train:
    datasets:
      - weight: 1
        path: ds1
      - weight:
           __module__: megatron.energon 
           __class__: WeightSchedule
           linear:  # Maybe "linear" or "step"?
             0: 100  # At iteration 0 (i.e. 0 items yielded on each rank), the weight is 100
             100: 10  # At iteration 100, the weight is 10
             1000: 0  # At iteration 1000 (and onwards), the weight is 0
        path: ds2

Discussion:

Schedule is depending on the number of dataset iterations. This may not equal the number of gradient updates, e.g. for gradient accumulation. Should we make gradacc / steps_per_iter configurable?
maybe make it rather type: linear instead of linear: and step:? Should unify this with typical lr-schedulers.

The text was updated successfully, but these errors were encountered:

voegtlel changed the title ~~Training Schedule~~ Training Schedule / Curriculum Oct 10, 2024

voegtlel added enhancement New feature or request and removed enhancement New feature or request labels Oct 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training Schedule / Curriculum #17

Training Schedule / Curriculum #17

voegtlel commented Oct 10, 2024 •

edited

Loading

Training Schedule / Curriculum #17

Training Schedule / Curriculum #17

Comments

voegtlel commented Oct 10, 2024 • edited Loading

voegtlel commented Oct 10, 2024 •

edited

Loading