Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The random seed doesn't work #33

Open
xffxff opened this issue Nov 13, 2018 · 9 comments
Open

The random seed doesn't work #33

xffxff opened this issue Nov 13, 2018 · 9 comments

Comments

@xffxff
Copy link

xffxff commented Nov 13, 2018

Even if I set the same random seed, the result is different, and you can test it on ddpg. I think tf.set_random_seed(seed) doesn't work, but I don't know how to solve it.

@machinaut
Copy link

Can you give an example with what you expected to happen and what actually happened?

If tensorflow is failing to set a set, maybe you should file a bug on tensorflow.

For gym environments and numpy (for example) are seeded differently, and you should not expect seeding tensorflow to affect either of them.

@jachiam
Copy link
Contributor

jachiam commented Nov 13, 2018

Ooh, I may not have seeded Gym envs. My bad. Will look into getting this working---it's possible that Gym isn't the only other source of nondeterminism (they can be hard to track down).

@xffxff
Copy link
Author

xffxff commented Nov 14, 2018

@machinaut @jachiam ,I run the common python ddpg.py -s 2, the first time I got AverageTestEpRet=-594 of epoch one,but the second time I got the AverageTestRet=-314 of epoch one. And I tried to set

env.seed(seed)
test_env.seed(seed)

but the AverageTestEpRet of epoch one was still different.

@jachiam
Copy link
Contributor

jachiam commented Nov 15, 2018

@xffxff, hmm, that's a bit unfortunate. I appreciate that you tried this out.

I am not fully sure what could be going wrong here. My suspicion is that it might involve the Python hash seed used to prevent dict collision attacks. See here for an explanation of the issue, and see here for more info. Can you try export PYTHONHASHSEED=0 and then try running your experiment again?

@jachiam
Copy link
Contributor

jachiam commented Nov 19, 2018

@xffxff I tried out export PYTHONHASHSEED=0, in addition to setting env seed and test_env seed, and did two runs of DDPG with python -m spinup.run ddpg --hid [32] --env HalfCheetah-v2 --steps_per_epoch 1000.

Results from first run:

---------------------------------------
|             Epoch |               1 |
|      AverageEpRet |            -260 |
|          StdEpRet |               0 |
|          MaxEpRet |            -260 |
|          MinEpRet |            -260 |
|  AverageTestEpRet |            -533 |
|      StdTestEpRet |            3.39 |
|      MaxTestEpRet |            -525 |
|      MinTestEpRet |            -536 |
|             EpLen |           1e+03 |
|         TestEpLen |           1e+03 |
| TotalEnvInteracts |           1e+03 |
|      AverageQVals |           0.508 |
|          StdQVals |            1.17 |
|          MaxQVals |            5.88 |
|          MinQVals |           -6.88 |
|            LossPi |           -1.39 |
|             LossQ |           0.964 |
|              Time |            6.86 |
---------------------------------------

Results from second run:

---------------------------------------
|             Epoch |               1 |
|      AverageEpRet |            -260 |
|          StdEpRet |               0 |
|          MaxEpRet |            -260 |
|          MinEpRet |            -260 |
|  AverageTestEpRet |            -533 |
|      StdTestEpRet |            4.89 |
|      MaxTestEpRet |            -526 |
|      MinTestEpRet |            -541 |
|             EpLen |           1e+03 |
|         TestEpLen |           1e+03 |
| TotalEnvInteracts |           1e+03 |
|      AverageQVals |           0.508 |
|          StdQVals |            1.17 |
|          MaxQVals |            5.88 |
|          MinQVals |           -6.88 |
|            LossPi |           -1.39 |
|             LossQ |           0.964 |
|              Time |              16 |
---------------------------------------

Looks like this solves the issue. I'm going to mark this as closed.

@jachiam jachiam closed this as completed Nov 19, 2018
@jachiam
Copy link
Contributor

jachiam commented Nov 19, 2018

Scratch that, double-checking and it looks like things diverge after Epoch 1. Don't know where this nondeterminism is coming from.

@jachiam jachiam reopened this Nov 19, 2018
@jachiam
Copy link
Contributor

jachiam commented Nov 19, 2018

With env seed setting and export PYTHONHASHSEED=0, TRPO/PPO/VPG are deterministic through at least the first three epochs. I have no idea why DDPG would be different.

@Skalwalker
Copy link

Is this issue happening in both tensorflow and pytorch? Have the operation-level seeds been properly set?

@Jianengzhang
Copy link

With following env seed setting

    env.seed(seed)
    env.action_space.seed(seed)
    test_env.seed(seed)
    test_env.action_space.seed(seed)

the result is same in same seed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants