Crash when using save_state with deepspeed: model.state_dict
functions incompatible with new deepspeed.
#596
Labels
bug
Something isn't working
🐛 Describe the bug
I've recently been using the code provided in https://github.com/tlc4418/llm_optimization, which in turn uses trlx.
In doing so I encountered a bug causing trlx to crash when trying to save, caused by a recent change in deepspeeed.
To reproduce, use this https://gist.github.com/JohannesAck/feb31ee5c491ca30771335296ec8b295 and start it with deepspeed by using
accelerate launch
with a config that enables deepspeed:This is caused by this change in deepspeed deepspeedai/DeepSpeed#5408, that changes the call to state_dict to use a keyword instead of positional argument:
TRLX however assumes that the argument will be passed
trlx/trlx/models/modeling_ppo.py
Lines 354 to 359 in 3340c2f
In L359:
dict(prefix="v_head.", **kwargs)
becomesdict(prefix="v_head.", prefix="")
and thus has two values forprefix
and crashes.Workaround:
Downgrade deepspeed to a version < 0.14.1:
I'm not sure what the proper solution here would be, just ignoring the
prefix
argument doesn't sound great either. One option might be to just ignore it if it's an empty string and raise an exception otherwise.Hope this helps somebody!
Which trlX version are you using?
trlx=0.7.0
Additional system and package information
deepspeed=0.14.4
The text was updated successfully, but these errors were encountered: