Normalize rewards by standard deviation of discounted return in MuJoCo by vzhuang · Pull Request #149 · astooke/rlpyt

vzhuang · 2020-04-21T04:55:18Z

Averaged results over 10 runs for PPO on Walker2d-v3:

vzhuang · 2020-04-21T04:55:33Z

codecov-io · 2020-04-21T04:56:51Z

Codecov Report

Merging #149 into master will decrease coverage by 0.00%.
The diff coverage is 20.58%.

@@            Coverage Diff             @@
##           master     #149      +/-   ##
==========================================
- Coverage   22.56%   22.56%   -0.01%     
==========================================
  Files         128      128              
  Lines        7987     8014      +27     
==========================================
+ Hits         1802     1808       +6     
- Misses       6185     6206      +21

Flag	Coverage Δ
#unittests	`22.56% <20.58%> (-0.01%)`	⬇️

Impacted Files	Coverage Δ
rlpyt/algos/pg/a2c.py	`0.00% <0.00%> (ø)`
rlpyt/algos/pg/base.py	`0.00% <0.00%> (ø)`
rlpyt/algos/pg/ppo.py	`0.00% <0.00%> (ø)`
rlpyt/experiments/configs/mujoco/pg/mujoco_a2c.py	`0.00% <ø> (ø)`
rlpyt/experiments/configs/mujoco/pg/mujoco_ppo.py	`0.00% <ø> (ø)`
rlpyt/samplers/base.py	`80.00% <ø> (ø)`
rlpyt/samplers/collections.py	`96.29% <ø> (ø)`
rlpyt/samplers/collectors.py	`81.03% <ø> (ø)`
rlpyt/samplers/parallel/gpu/collectors.py	`0.00% <0.00%> (ø)`
rlpyt/samplers/serial/sampler.py	`97.72% <ø> (ø)`
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 668290d...a15a93b. Read the comment docs.

astooke · 2020-06-30T18:26:54Z

OK this is interesting and I think it can be made a lot simpler. As far as I can tell from the PR, the same could be achieved by changing process_returns() of the policy gradient class, right after the following lines

rlpyt/rlpyt/algos/pg/base.py

Lines 47 to 49 in 85d4e01

    
           reward, done, value, bv = (samples.env.reward, samples.env.done, 
        
               samples.agent.agent_info.value, samples.agent.bootstrap_value) 
        
           done = done.type(reward.dtype)

by inserting:

if self.normalize_reward:
  return_ = discount_return(reward, done, 0., self.discount)  # NO boostrapping of value
  self.rets_rms.update(return_.view(-1, 1))  # matching the shape you used, not sure if the extra dim is needed?
  std_dev = torch.sqrt(self.rets_rms.var)
  reward = torch.div(reward, std_dev)

# proceed with computing discounted returns or GAE returns using the scaled reward

I think that accomplishes the same math? And doesn't need to change any files in the sampler :)

astooke · 2020-09-05T01:29:59Z

Any more comment? Anyone else used this?

normalize rewards by standard deviation of discounted return in MuJoCo

a15a93b

vzhuang mentioned this pull request Apr 21, 2020

Normalizing environment wrapper #115

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Normalize rewards by standard deviation of discounted return in MuJoCo#149

Normalize rewards by standard deviation of discounted return in MuJoCo#149
vzhuang wants to merge 1 commit intoastooke:masterfrom
vzhuang:normalize_rewards

vzhuang commented Apr 21, 2020 •

edited

Loading

Uh oh!

vzhuang commented Apr 21, 2020

Uh oh!

codecov-io commented Apr 21, 2020 •

edited

Loading

Uh oh!

astooke commented Jun 30, 2020 •

edited

Loading

Uh oh!

astooke commented Sep 5, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

vzhuang commented Apr 21, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vzhuang commented Apr 21, 2020

Uh oh!

codecov-io commented Apr 21, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

astooke commented Jun 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

astooke commented Sep 5, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vzhuang commented Apr 21, 2020 •

edited

Loading

codecov-io commented Apr 21, 2020 •

edited

Loading

astooke commented Jun 30, 2020 •

edited

Loading