You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Allow defining a training schedule. I.e. shifting weights of datasets during training per step.
One example could be:
__module__: megatron.energon__class__: Metadatasetsplits:
train:
datasets:
- weight: 1path: ds1
- weight:
__module__: megatron.energon __class__: WeightSchedulelinear: # Maybe "linear" or "step"?0: 100# At iteration 0 (i.e. 0 items yielded on each rank), the weight is 100100: 10# At iteration 100, the weight is 101000: 0# At iteration 1000 (and onwards), the weight is 0path: ds2
Discussion:
Schedule is depending on the number of dataset iterations. This may not equal the number of gradient updates, e.g. for gradient accumulation. Should we make gradacc / steps_per_iter configurable?
maybe make it rather type: linear instead of linear: and step:? Should unify this with typical lr-schedulers.
The text was updated successfully, but these errors were encountered:
voegtlel
changed the title
Training Schedule
Training Schedule / Curriculum
Oct 10, 2024
Allow defining a training schedule. I.e. shifting weights of datasets during training per step.
One example could be:
Discussion:
type: linear
instead oflinear:
andstep:
? Should unify this with typical lr-schedulers.The text was updated successfully, but these errors were encountered: