Skip to content

Normalizing environment wrapper #115

@vzhuang

Description

@vzhuang

For Mujoco envs, i's a standard practice to normalize rewards by a running estimate of their standard deviation (e.g. VecNormalize in baselines, NormalizedEnv in rllab). Without it, performance is noticeably worse - for example, in the current PPO implementation, the value function fails to converge since the return magnitudes are too high, and the algorithm takes around 3x as many iterations to converge compared to the normalized implementations.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions