Skip to content

Conversation

@YashasviChaurasia
Copy link
Contributor

Description of the change

This PR fixes the callback registration order in sft_trainer.py to ensure that fms-acceleration callbacks are registered before the TrainerController, essentially reordering the callbacks correctly.
This ensures on_save() from fms-acceleration executes first in order by registering fms-acceleration callbacks before TrainerControllerCallback

This is a required change because the fms-acceleration callbacks handle low-level checkpoint preparation — such as unwrapping model wrappers, merging adapter weights, and finalizing the model state before saving, when registering them before the TrainerControllerCallback, their on_save() runs first, ensuring checkpoints are complete and consistent.

Related issue number

How to verify the PR

Was the PR tested

  • I have added >=1 unit test(s) for every new method I have added.
  • I have ensured all unit tests pass

@github-actions
Copy link

Thanks for making a pull request! 😃
One of the maintainers will review and advise on the next steps.

@github-actions github-actions bot added the feat label Oct 16, 2025
@YashasviChaurasia YashasviChaurasia marked this pull request as ready for review October 16, 2025 20:11
Copy link
Collaborator

@dushyantbehl dushyantbehl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@dushyantbehl dushyantbehl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting clarification on slack on the design before merging

@ashokponkumar ashokponkumar enabled auto-merge (squash) October 27, 2025 05:34
@ashokponkumar ashokponkumar dismissed dushyantbehl’s stale review October 27, 2025 05:34

we will go with this for now, and will work on the dir as a separate PR.

@ashokponkumar ashokponkumar merged commit ebf5743 into foundation-model-stack:main Oct 27, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants