Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training Schedule / Curriculum #17

Open
voegtlel opened this issue Oct 10, 2024 · 0 comments
Open

Training Schedule / Curriculum #17

voegtlel opened this issue Oct 10, 2024 · 0 comments

Comments

@voegtlel
Copy link
Collaborator

voegtlel commented Oct 10, 2024

Allow defining a training schedule. I.e. shifting weights of datasets during training per step.

One example could be:

__module__: megatron.energon
__class__: Metadataset
splits:
  train:
    datasets:
      - weight: 1
        path: ds1
      - weight:
           __module__: megatron.energon 
           __class__: WeightSchedule
           linear:  # Maybe "linear" or "step"?
             0: 100  # At iteration 0 (i.e. 0 items yielded on each rank), the weight is 100
             100: 10  # At iteration 100, the weight is 10
             1000: 0  # At iteration 1000 (and onwards), the weight is 0
        path: ds2

Discussion:

  • Schedule is depending on the number of dataset iterations. This may not equal the number of gradient updates, e.g. for gradient accumulation. Should we make gradacc / steps_per_iter configurable?
  • maybe make it rather type: linear instead of linear: and step:? Should unify this with typical lr-schedulers.
@voegtlel voegtlel changed the title Training Schedule Training Schedule / Curriculum Oct 10, 2024
@voegtlel voegtlel added enhancement New feature or request and removed enhancement New feature or request labels Oct 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant