-
Notifications
You must be signed in to change notification settings - Fork 46
Open
Labels
bugSomething isn't workingSomething isn't working
Description
🧠 Describe the Bug
When distilling to RF-DETR, stopping and then resuming the training results in the following error:
Restoring states from the checkpoint path at /home/jeroen/pretrain/out/rf-detr-base/checkpoints/last.ckpt
Traceback (most recent call last):
File "/home/jeroen/pretrain/distill_rfdetr_base.py", line 4, in <module>
lightly_train.train(
File "/home/jeroen/.pyenv/versions/rf-detr/lib/python3.11/site-packages/lightly_train/_commands/train.py", line 238, in train
train_from_config(config=config)
File "/home/jeroen/.pyenv/versions/rf-detr/lib/python3.11/site-packages/lightly_train/_commands/train.py", line 419, in train_from_config
trainer_instance.fit(
File "/home/jeroen/.pyenv/versions/rf-detr/lib/python3.11/site-packages/pytorch_lightning/trainer/trainer.py", line 560, in fit
call._call_and_handle_interrupt(
File "/home/jeroen/.pyenv/versions/rf-detr/lib/python3.11/site-packages/pytorch_lightning/trainer/call.py", line 49, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jeroen/.pyenv/versions/rf-detr/lib/python3.11/site-packages/pytorch_lightning/trainer/trainer.py", line 598, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/home/jeroen/.pyenv/versions/rf-detr/lib/python3.11/site-packages/pytorch_lightning/trainer/trainer.py", line 980, in _run
self._checkpoint_connector._restore_modules_and_callbacks(ckpt_path)
File "/home/jeroen/.pyenv/versions/rf-detr/lib/python3.11/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py", line 411, in _restore_modules_and_callbacks
self.restore_callbacks()
File "/home/jeroen/.pyenv/versions/rf-detr/lib/python3.11/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py", line 328, in restore_callbacks
call._call_callbacks_on_load_checkpoint(trainer, self._loaded_checkpoint)
File "/home/jeroen/.pyenv/versions/rf-detr/lib/python3.11/site-packages/pytorch_lightning/trainer/call.py", line 291, in _call_callbacks_on_load_checkpoint
callback.on_load_checkpoint(trainer, trainer.lightning_module, checkpoint)
File "/home/jeroen/.pyenv/versions/rf-detr/lib/python3.11/site-packages/lightly_train/_callbacks/checkpoint.py", line 105, in on_load_checkpoint
self._models.model.load_state_dict(_checkpoint.models.model.state_dict())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'RFDETRBase' object has no attribute 'load_state_dict'
🔁 Steps to Reproduce
Just run the example code from the docs, interrupt the training, then rerun with the resume_interrupted argument set to True:
import lightly_train
lightly_train.train(
out="out/rf-detr-base",
data="./data",
model="rfdetr/rf-detr-base",
resume_interrupted=True,
)🤖 Environment Details
- OS: Ubuntu 24.04
- Python version: 3.11
- Frameworks/Libraries (with versions): lightly-train[rf-detr] 0.11.3
- How did you install the package: pip
📌 Additional Context
The RFDETRBase object is not a Torch module, so calling load_state_dict doesn't work. I'm not familiar with the way lightly-train loads checkpoints, but maybe there's a way to override the default behavior and put the weights in the right place for RF-DETR.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working