Replies: 1 comment
-
|
I understand that there is an error in the reward function I wrote. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello Cryolite,
Is the change in loss during Conservative Q-Learning training of any reference value?
In my attempts at training, the loss in CQL is always increasing. It remains very stable until 1 million training steps, after which it starts to increase linearly.
Thank you.
Beta Was this translation helpful? Give feedback.
All reactions