-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Callbacks] Remove double initialization, replace with updating the state directly #1169
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's talk through this change in more detail as this has a lot of further ramifications.
The first is that initialize was a one to one mapping into the modifier's initialize calls. With that, update_state needs to pass that into the modifier intialize state -- this is because it was intended that modifiers would be able to initialize once they have the required information. For example, LR modifiers need the optimizer and pruning modifiers need the module, and those may not be passed in within the same call because they are not within local context of each other when they were created within the parent caller code.
The above is important so that we could enable future expansions and not have a set list within the framework of what needed to be passed in for initialization in the event someone added a modifier that required other info and it was on the modifiers to raise an exception if they have not been setup with the necessary info once we move into the execution pipeline stage.
Additionally, this has implications now for the call order where initialization must be called at a dependable place that will always execute before any calls to update_state which may cause issues with dynamic setting of model, optimizer, and other states we need based on the execution pipeline. The purpose of enabling multiple calls there was so that we didn't have a dependency on when that was called. So, we now essentially make initilaization a no op and replace it with update_state if we take into account the issues with modifiers that was listed before.
If we go this route, we should likely move the logic of setting up variable placeholders and resolution of a recipe to the session creation and rename initialize to initialize_state or something like that. This latter point is towards the intended flow where updating the state should not happen after event lifecycles or finalize have been called since that will change state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving as per offline discussion with team, great job @kylesayrs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quickly flagging this as scheduled for rework since it will break some flows based on current implementation so it isn't accidentally landed
@markurtz It is my understanding that there are no modifiers which rely on the optimizer to initialize, so these changes do not break any existing flows. As we move to support flows with delayed initialization, we'll rework the design. |
606b93b
Maybe I understood differently from where we ended up yesterday in the sync, but I was under the impression that we are going to rework these changes to address the core intended use cases and not push up breaking changes now. The latter point is something I do want to ensure: if we have planned and expected usage patterns, we are not removing code that enables that. Specifically, this PR as is will break a few main things:
|
@markurtz |
Nope, we don't have any code within the current code base that would break immediately with this. I don't think this should be our standard, though, as one of our core intended use cases is as an importable framework into external training and post-training code bases. I had not seen the updated code state, which looks much better now, especially after removing the issue of modifiers reporting the initialized state. But, for the current code contained, it advises a solution that won't work specifically in the error raised in initialize, stating to use the update state pathway since initialize has already been called. A call into update state will not update the initialization for the modifiers. So, if a user calls initialize and then calls update_state later with their model or teacher, that will not work properly and is at odds with what we state in the exception. For the intended usage, I do want to ensure we adhere to the original designs unless we have good reasons for deviating from them and removing that functionality altogether. The issues stemming from veering away and removing functionality in the numerous issues we've hit with one shot pathways. This usage was due to integrating with several customer code bases when it was still SparseML and hitting numerous issues of trying to gather everything into a functional namespace to invoke initialize. The solution was enabling partial initialization to ensure we didn't hit those issues. I'm not against and definitely welcome simplifying and updating the interfaces, but I do want to ensure those changes are moving towards future goals and past learnings without removing and instead replacing that functionality. |
@markurtz I don't think the argument towards removing the particular code in this PR is because we no longer want to support external training. It is because we do not have any workflows which would use this at the moment and in general, it is extremely confusing to understand and difficult to use. The goal would be update the design such that we improve the API and ability for users to use the framework in the future, once these workflows are prioritized. |
@dsikka, all are good points to bring up, and I would argue that we should simplify the interface with this PR rather than break or remove that intended functionality and say we're going to tackle it later—this way, we are guaranteed to keep the functionality and not need to scope it in later. Additionally, with this, I want to stress that the base functionality, such as constant pruning, is functional with the current state of the code base. With our focus on integrations, I don't think stripping out functionality is the answer for short or long term. For example, a simple fix to this PR would be to change the expected usage pathways and, in addition, fix the incorrect error message raised within the current implementation that would cause more exceptions / potentially hidden bugs:
The above has the assumption that initialize is called before the training pipeline, and distribution of the model to the expected devices, has happened. We'll need to note in docs, but otherwise, this should simplify our current pathways in a fairly straightforward way without needing to completely remove the logic and wait for it to be scoped in / added later. |
What current intended functionality is this breaking, within llmcompressor or the integrations? Until the aforementioned workflows become relevant, it doesn’t make sense to scope out a solution and also will not be a priority to do so until then. |
Sure, the intended functionality is to dynamically assign the state required for training flows from multiple local scopes/locations within the integration's code base. We've used it extensively in the past, such as with the YOLO repos when we were actively supporting those. Suppose a customer's / open source code base does not rely on the HF trainer. In that case, this will almost certainly be a requirement since our Trainer mixin essentially does the intended functionality that is being argued to be stripped out. Could you explain a bit more on the rationale for the strong pushback, especially around the latest proposal I put up that isn't a pretty minimal implementation solution on top of this diff? Separately, as stated before, the current implementation advises a solution that will not work for the exception raised on multiple invocations for initialization and we need to fix that at a minimum. |
Signed-off-by: Kyle Sayers <[email protected]>
fb55cd6
to
16eed83
Compare
Purpose
initialize
twice is a confusing and misleading api. However, in some training pipelines (in this example HF trainer, but potentially others), it may be more convenient toinitialize
once and then update the state with lazily initialized variables laterPrerequisites
Changes
initialize
call in session_mixin with an update state call. Because, to my knowledge, no modifiers rely on the optimizer to initialize, this should not break any flows.