Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates before start_steps #48

Open
unrealwill opened this issue Nov 20, 2018 · 2 comments
Open

Updates before start_steps #48

unrealwill opened this issue Nov 20, 2018 · 2 comments

Comments

@unrealwill
Copy link

Hello,

In sac.py,

        if t > start_steps:
            a = get_action(o)
        else:
            a = env.action_space.sample()

You use random policy before start_steps. But you nevertheless start updating model parameters immediately, using a small replay memory dataset.
It seems that a cautious approach would only update the model parameters once a sufficient dataset has been collected.

Currently we do start_steps model updates with a small dataset which mean we risk initially over-fitting the parameters, to this small dataset, which may take a long time to recover from.

It is particularly insidious, because when you have a slow network architecture you won't see a problem, but once you try a faster architecture you will overfit to the small dataset and take a long time to recover. It is also environment dependent and may depend on the luck of the first few episodes.

@jachiam
Copy link
Contributor

jachiam commented Nov 20, 2018

Can you identify an environment where this happens? So far this isn't a problem for the environments tested.

@unrealwill
Copy link
Author

I can't say for sure, what happens in your code, I haven't run it yet.

I'm currently fiddling with my code loosely inspired by yours. Toggling extra terms and varying some parameters to get a feel from the algorithm.

I'm mostly playing with simple environments, 'Pendulum-v0', 'LunarLanderContinuous-v2', "BipedalWalkerHardcore-v2", but It'll probably have more impact when you have short episodes (and a big batch_size),

In Pendulum-v0, I observed that sometimes it didn't converge with networks of bigger layer size 300 whereas it did when the layer size was 100. It was spinning it up very fast always in the same direction.
In BipedalWalker, I observed that it would stiffen its leg, the action becoming saturated at the border of the action space, it would then "fall" and start exploring from the border of the action space.

Not doing updates before start_steps had some impact and helps mitigate those undesired behaviours.

My code is still buggy, so maybe yours can recover from it, but I figure this issue is quite orthogonal and make sense so you are probably affected too :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants