You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
class RewardScaler(gym.RewardWrapper):
"""
Bring rewards to a reasonable scale for PPO. This is incredibly important
and effects performance a lot.
"""
def reward(self, reward):
return reward * 0.01
Is there a good explanation for this? I would have thought it should converge regardless? Why does it affect convergence speed so much? How did you even find this to be a problem? How can I know whether my rewards (on a different game) are too large or need to be scaled, are there good indicators for that besides performance (which could be affected by any parameters)
The text was updated successfully, but these errors were encountered:
I noticed the below:
Is there a good explanation for this? I would have thought it should converge regardless? Why does it affect convergence speed so much? How did you even find this to be a problem? How can I know whether my rewards (on a different game) are too large or need to be scaled, are there good indicators for that besides performance (which could be affected by any parameters)
The text was updated successfully, but these errors were encountered: