Add `move_to_device` kwarg to the optimizer's `load_state_dict` #1344

koute · 2024-08-31T11:02:56Z

This PR makes it possible to load an optimizer checkpoint without automatically moving the optimizer's state to the GPU.

Some background as to why: I'm keeping the optimizer's state on the CPU to save on VRAM and I manually move it to the GPU as needed. Unfortunately the load_state_dict will move all of the optimizer's tensors to whatever device the model's parameters are currently on, which results in an OOM crash. So currently before loading an optimizer checkpoint I have to unnecessarily move my model to the CPU, call the optimizer's load_state_dict, and then move the model back to the GPU. With this PR I can skip this silly dance.

github-actions · 2024-09-10T14:15:16Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

hansonw

I encountered this issue as well but when using the paged variants of the optimizers (load_state_dict should be re-creating paged tensors instead of just using .to(param.device)).

My solution (see suggestion below) was to alter the initialization to use self.get_state_buffer instead. It's still kind of orthogonal to this PR (but the intent is similar) - I can submit a separate PR, but curious what the maintainers think.

hansonw · 2024-09-11T06:56:57Z

bitsandbytes/optim/optimizer.py

+                        if move_to_device:
+                            value[k] = v.to(param.device)


Suggested change

if move_to_device:

value[k] = v.to(param.device)

buffer = self.get_state_buffer(v, v.dtype)

buffer.copy_(v)

value[k] = buffer

Thanks @hansonw! This seems reasonable as a separate PR!

This makes it possible to load an optimizer checkpoint without automatically moving the optimizer's state to the GPU.

…andbytes-foundation#1344) This makes it possible to load an optimizer checkpoint without automatically moving the optimizer's state to the GPU.

matthewdouglas self-assigned this Sep 10, 2024

hansonw reviewed Sep 11, 2024

View reviewed changes

Add move_to_device kwarg to the optimizer's load_state_dict

8b4857a

This makes it possible to load an optimizer checkpoint without automatically moving the optimizer's state to the GPU.

matthewdouglas force-pushed the main branch from 4c6793a to 8b4857a Compare September 19, 2024 15:46

matthewdouglas merged commit 8fc7892 into bitsandbytes-foundation:main Sep 19, 2024
28 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `move_to_device` kwarg to the optimizer's `load_state_dict` #1344

Add `move_to_device` kwarg to the optimizer's `load_state_dict` #1344

koute commented Aug 31, 2024

github-actions bot commented Sep 10, 2024

hansonw left a comment

hansonw Sep 11, 2024

matthewdouglas Sep 19, 2024

-                        if move_to_device:
-                            value[k] = v.to(param.device)
+                        buffer = self.get_state_buffer(v, v.dtype)
+                        buffer.copy_(v)
+                        value[k] = buffer

Add move_to_device kwarg to the optimizer's load_state_dict #1344

Add move_to_device kwarg to the optimizer's load_state_dict #1344

Conversation

koute commented Aug 31, 2024

github-actions bot commented Sep 10, 2024

hansonw left a comment

Choose a reason for hiding this comment

hansonw Sep 11, 2024

Choose a reason for hiding this comment

matthewdouglas Sep 19, 2024

Choose a reason for hiding this comment

Add `move_to_device` kwarg to the optimizer's `load_state_dict` #1344

Add `move_to_device` kwarg to the optimizer's `load_state_dict` #1344