Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Callbacks] Remove double initialization, replace with updating the state directly #1169

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

kylesayrs
Copy link
Collaborator

@kylesayrs kylesayrs commented Feb 19, 2025

Purpose

  • Calling initialize twice is a confusing and misleading api. However, in some training pipelines (in this example HF trainer, but potentially others), it may be more convenient to initialize once and then update the state with lazily initialized variables later
  • Regardless of whether the HF trainer pathway will be maintained in the future, LLM Compressor should give users the tools to easily integrate with training scripts. HF Trainer is one such integration case which could benefit from a function like this.
  • Because there are currently no modifiers which rely on the optimizer or steps_per_epoch in order to initialize, this change does not break any currently existing modifiers

Prerequisites

Changes

  • Replace the second initialize call in session_mixin with an update state call. Because, to my knowledge, no modifiers rely on the optimizer to initialize, this should not break any flows.

@kylesayrs kylesayrs added the ready When a PR is ready for review label Feb 19, 2025
@kylesayrs kylesayrs marked this pull request as ready for review February 19, 2025 01:13
@kylesayrs kylesayrs marked this pull request as draft February 19, 2025 01:14
@kylesayrs kylesayrs self-assigned this Feb 19, 2025
@kylesayrs kylesayrs marked this pull request as ready for review February 25, 2025 17:27
Copy link
Collaborator

@markurtz markurtz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's talk through this change in more detail as this has a lot of further ramifications.

The first is that initialize was a one to one mapping into the modifier's initialize calls. With that, update_state needs to pass that into the modifier intialize state -- this is because it was intended that modifiers would be able to initialize once they have the required information. For example, LR modifiers need the optimizer and pruning modifiers need the module, and those may not be passed in within the same call because they are not within local context of each other when they were created within the parent caller code.

The above is important so that we could enable future expansions and not have a set list within the framework of what needed to be passed in for initialization in the event someone added a modifier that required other info and it was on the modifiers to raise an exception if they have not been setup with the necessary info once we move into the execution pipeline stage.

Additionally, this has implications now for the call order where initialization must be called at a dependable place that will always execute before any calls to update_state which may cause issues with dynamic setting of model, optimizer, and other states we need based on the execution pipeline. The purpose of enabling multiple calls there was so that we didn't have a dependency on when that was called. So, we now essentially make initilaization a no op and replace it with update_state if we take into account the issues with modifiers that was listed before.

If we go this route, we should likely move the logic of setting up variable placeholders and resolution of a recipe to the session creation and rename initialize to initialize_state or something like that. This latter point is towards the intended flow where updating the state should not happen after event lifecycles or finalize have been called since that will change state.

Base automatically changed from kylesayrs/remove-preinitialize-structure to main February 26, 2025 20:37
rahul-tuli
rahul-tuli previously approved these changes Feb 27, 2025
Copy link
Collaborator

@rahul-tuli rahul-tuli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving as per offline discussion with team, great job @kylesayrs

Copy link
Collaborator

@markurtz markurtz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quickly flagging this as scheduled for rework since it will break some flows based on current implementation so it isn't accidentally landed

@kylesayrs
Copy link
Collaborator Author

@markurtz It is my understanding that there are no modifiers which rely on the optimizer to initialize, so these changes do not break any existing flows. As we move to support flows with delayed initialization, we'll rework the design.

@kylesayrs kylesayrs changed the title [Callbacks] Remove double initialization, replace with update_state [Callbacks] Remove double initialization, replace with updating the state directly Feb 27, 2025
@markurtz
Copy link
Collaborator

@markurtz It is my understanding that there are no modifiers which rely on the optimizer to initialize, so these changes do not break any existing flows. As we move to support flows with delayed initialization, we'll rework the design.

Maybe I understood differently from where we ended up yesterday in the sync, but I was under the impression that we are going to rework these changes to address the core intended use cases and not push up breaking changes now. The latter point is something I do want to ensure: if we have planned and expected usage patterns, we are not removing code that enables that.

Specifically, this PR as is will break a few main things:

  1. If a user calls initialize and then update_state after, that updated state will not be passed to the modifiers. This includes for standard pruning, distillation, and other modifiers which rely on the model, teacher, etc to be passed through initialize.
  2. Due to the above, it requires integrations built on top of LLM compressor to have all required parameters at the time of the call to initialize including model and teacher. We have active use cases for optimizer with learning rate as well as potential future algorithms that will need other data outside of our assumptions.
  3. Finally, these changes lead to never checking whether the modifiers have the required state to run or not, meaning we can receive an exception much later in the call graph that can be confusing to trace through and understand why a specific modifier is failing.

@dsikka
Copy link
Collaborator

dsikka commented Feb 28, 2025

@markurtz It is my understanding that there are no modifiers which rely on the optimizer to initialize, so these changes do not break any existing flows. As we move to support flows with delayed initialization, we'll rework the design.

Maybe I understood differently from where we ended up yesterday in the sync, but I was under the impression that we are going to rework these changes to address the core intended use cases and not push up breaking changes now. The latter point is something I do want to ensure: if we have planned and expected usage patterns, we are not removing code that enables that.

Specifically, this PR as is will break a few main things:

  1. If a user calls initialize and then update_state after, that updated state will not be passed to the modifiers. This includes for standard pruning, distillation, and other modifiers which rely on the model, teacher, etc to be passed through initialize.
  2. Due to the above, it requires integrations built on top of LLM compressor to have all required parameters at the time of the call to initialize including model and teacher. We have active use cases for optimizer with learning rate as well as potential future algorithms that will need other data outside of our assumptions.
  3. Finally, these changes lead to never checking whether the modifiers have the required state to run or not, meaning we can receive an exception much later in the call graph that can be confusing to trace through and understand why a specific modifier is failing.

@markurtz
Can you point to what existing workflow this will currently break within the current codebase?

@markurtz
Copy link
Collaborator

markurtz commented Feb 28, 2025

@markurtz It is my understanding that there are no modifiers which rely on the optimizer to initialize, so these changes do not break any existing flows. As we move to support flows with delayed initialization, we'll rework the design.

Maybe I understood differently from where we ended up yesterday in the sync, but I was under the impression that we are going to rework these changes to address the core intended use cases and not push up breaking changes now. The latter point is something I do want to ensure: if we have planned and expected usage patterns, we are not removing code that enables that.
Specifically, this PR as is will break a few main things:

  1. If a user calls initialize and then update_state after, that updated state will not be passed to the modifiers. This includes for standard pruning, distillation, and other modifiers which rely on the model, teacher, etc to be passed through initialize.
  2. Due to the above, it requires integrations built on top of LLM compressor to have all required parameters at the time of the call to initialize including model and teacher. We have active use cases for optimizer with learning rate as well as potential future algorithms that will need other data outside of our assumptions.
  3. Finally, these changes lead to never checking whether the modifiers have the required state to run or not, meaning we can receive an exception much later in the call graph that can be confusing to trace through and understand why a specific modifier is failing.

@markurtz Can you point to what existing workflow this will currently break within the current codebase?

Nope, we don't have any code within the current code base that would break immediately with this. I don't think this should be our standard, though, as one of our core intended use cases is as an importable framework into external training and post-training code bases.

I had not seen the updated code state, which looks much better now, especially after removing the issue of modifiers reporting the initialized state. But, for the current code contained, it advises a solution that won't work specifically in the error raised in initialize, stating to use the update state pathway since initialize has already been called. A call into update state will not update the initialization for the modifiers. So, if a user calls initialize and then calls update_state later with their model or teacher, that will not work properly and is at odds with what we state in the exception.

For the intended usage, I do want to ensure we adhere to the original designs unless we have good reasons for deviating from them and removing that functionality altogether. The issues stemming from veering away and removing functionality in the numerous issues we've hit with one shot pathways. This usage was due to integrating with several customer code bases when it was still SparseML and hitting numerous issues of trying to gather everything into a functional namespace to invoke initialize. The solution was enabling partial initialization to ensure we didn't hit those issues.

I'm not against and definitely welcome simplifying and updating the interfaces, but I do want to ensure those changes are moving towards future goals and past learnings without removing and instead replacing that functionality.

@dsikka
Copy link
Collaborator

dsikka commented Mar 3, 2025

@markurtz I don't think the argument towards removing the particular code in this PR is because we no longer want to support external training. It is because we do not have any workflows which would use this at the moment and in general, it is extremely confusing to understand and difficult to use. The goal would be update the design such that we improve the API and ability for users to use the framework in the future, once these workflows are prioritized.

@markurtz
Copy link
Collaborator

markurtz commented Mar 4, 2025

@markurtz I don't think the argument towards removing the particular code in this PR is because we no longer want to support external training. It is because as we do not have any workflows which would use this at the moment and in general, it is extremely confusing to understand and difficult to use. The goal would be update the design such that we improve the API and ability for users to use the framework in the future, once these workflows are prioritized.

@dsikka, all are good points to bring up, and I would argue that we should simplify the interface with this PR rather than break or remove that intended functionality and say we're going to tackle it later—this way, we are guaranteed to keep the functionality and not need to scope it in later. Additionally, with this, I want to stress that the base functionality, such as constant pruning, is functional with the current state of the code base. With our focus on integrations, I don't think stripping out functionality is the answer for short or long term.

For example, a simple fix to this PR would be to change the expected usage pathways and, in addition, fix the incorrect error message raised within the current implementation that would cause more exceptions / potentially hidden bugs:

  • create / get active session
  • (optional) one or multiple calls to update state -> updates state for the session, but does not pass to the modifiers
  • initialize(optional state that overwrites the current state, if any) -> calls into initialize for all modifiers meaning they will only receive one initialize event with the full expected state set from the caller pipeline. Additionally, we can simplify the logic for all modifiers so that they raise an exception if they don't have the required state for their functionality on this invocation rather than expecting multiple calls or needing to return whether they successfully initialized.
  • any further calls to initialize or update state will raise an exception stating that the session has already been initialized and the state can no longer be updated

The above has the assumption that initialize is called before the training pipeline, and distribution of the model to the expected devices, has happened. We'll need to note in docs, but otherwise, this should simplify our current pathways in a fairly straightforward way without needing to completely remove the logic and wait for it to be scoped in / added later.

@dsikka
Copy link
Collaborator

dsikka commented Mar 6, 2025

@markurtz I don't think the argument towards removing the particular code in this PR is because we no longer want to support external training. It is because as we do not have any workflows which would use this at the moment and in general, it is extremely confusing to understand and difficult to use. The goal would be update the design such that we improve the API and ability for users to use the framework in the future, once these workflows are prioritized.

@dsikka, all are good points to bring up, and I would argue that we should simplify the interface with this PR rather than break or remove that intended functionality and say we're going to tackle it later—this way, we are guaranteed to keep the functionality and not need to scope it in later. Additionally, with this, I want to stress that the base functionality, such as constant pruning, is functional with the current state of the code base. With our focus on integrations, I don't think stripping out functionality is the answer for short or long term.

For example, a simple fix to this PR would be to change the expected usage pathways and, in addition, fix the incorrect error message raised within the current implementation that would cause more exceptions / potentially hidden bugs:

  • create / get active session
  • (optional) one or multiple calls to update state -> updates state for the session, but does not pass to the modifiers
  • initialize(optional state that overwrites the current state, if any) -> calls into initialize for all modifiers meaning they will only receive one initialize event with the full expected state set from the caller pipeline. Additionally, we can simplify the logic for all modifiers so that they raise an exception if they don't have the required state for their functionality on this invocation rather than expecting multiple calls or needing to return whether they successfully initialized.
  • any further calls to initialize or update state will raise an exception stating that the session has already been initialized and the state can no longer be updated

The above has the assumption that initialize is called before the training pipeline, and distribution of the model to the expected devices, has happened. We'll need to note in docs, but otherwise, this should simplify our current pathways in a fairly straightforward way without needing to completely remove the logic and wait for it to be scoped in / added later.

What current intended functionality is this breaking, within llmcompressor or the integrations? Until the aforementioned workflows become relevant, it doesn’t make sense to scope out a solution and also will not be a priority to do so until then.

@markurtz
Copy link
Collaborator

markurtz commented Mar 6, 2025

@markurtz I don't think the argument towards removing the particular code in this PR is because we no longer want to support external training. It is because as we do not have any workflows which would use this at the moment and in general, it is extremely confusing to understand and difficult to use. The goal would be update the design such that we improve the API and ability for users to use the framework in the future, once these workflows are prioritized.

@dsikka, all are good points to bring up, and I would argue that we should simplify the interface with this PR rather than break or remove that intended functionality and say we're going to tackle it later—this way, we are guaranteed to keep the functionality and not need to scope it in later. Additionally, with this, I want to stress that the base functionality, such as constant pruning, is functional with the current state of the code base. With our focus on integrations, I don't think stripping out functionality is the answer for short or long term.
For example, a simple fix to this PR would be to change the expected usage pathways and, in addition, fix the incorrect error message raised within the current implementation that would cause more exceptions / potentially hidden bugs:

  • create / get active session
  • (optional) one or multiple calls to update state -> updates state for the session, but does not pass to the modifiers
  • initialize(optional state that overwrites the current state, if any) -> calls into initialize for all modifiers meaning they will only receive one initialize event with the full expected state set from the caller pipeline. Additionally, we can simplify the logic for all modifiers so that they raise an exception if they don't have the required state for their functionality on this invocation rather than expecting multiple calls or needing to return whether they successfully initialized.
  • any further calls to initialize or update state will raise an exception stating that the session has already been initialized and the state can no longer be updated

The above has the assumption that initialize is called before the training pipeline, and distribution of the model to the expected devices, has happened. We'll need to note in docs, but otherwise, this should simplify our current pathways in a fairly straightforward way without needing to completely remove the logic and wait for it to be scoped in / added later.

What current intended functionality is this breaking, within llmcompressor or the integrations? Until the aforementioned workflows become relevant, it doesn’t make sense to scope out a solution and also will not be a priority to do so until then.

Sure, the intended functionality is to dynamically assign the state required for training flows from multiple local scopes/locations within the integration's code base. We've used it extensively in the past, such as with the YOLO repos when we were actively supporting those. Suppose a customer's / open source code base does not rely on the HF trainer. In that case, this will almost certainly be a requirement since our Trainer mixin essentially does the intended functionality that is being argued to be stripped out.

Could you explain a bit more on the rationale for the strong pushback, especially around the latest proposal I put up that isn't a pretty minimal implementation solution on top of this diff?

Separately, as stated before, the current implementation advises a solution that will not work for the exception raised on multiple invocations for initialization and we need to fix that at a minimum.

Signed-off-by: Kyle Sayers <[email protected]>
@kylesayrs kylesayrs force-pushed the kylesayrs/remove-double-init branch from fb55cd6 to 16eed83 Compare March 10, 2025 22:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready When a PR is ready for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants