Updates before start_steps #48

unrealwill · 2018-11-20T17:33:20Z

Hello,

In sac.py,

        if t > start_steps:
            a = get_action(o)
        else:
            a = env.action_space.sample()

You use random policy before start_steps. But you nevertheless start updating model parameters immediately, using a small replay memory dataset.
It seems that a cautious approach would only update the model parameters once a sufficient dataset has been collected.

Currently we do start_steps model updates with a small dataset which mean we risk initially over-fitting the parameters, to this small dataset, which may take a long time to recover from.

It is particularly insidious, because when you have a slow network architecture you won't see a problem, but once you try a faster architecture you will overfit to the small dataset and take a long time to recover. It is also environment dependent and may depend on the luck of the first few episodes.

The text was updated successfully, but these errors were encountered:

jachiam · 2018-11-20T21:12:07Z

Can you identify an environment where this happens? So far this isn't a problem for the environments tested.

unrealwill · 2018-11-21T08:34:57Z

I can't say for sure, what happens in your code, I haven't run it yet.

I'm currently fiddling with my code loosely inspired by yours. Toggling extra terms and varying some parameters to get a feel from the algorithm.

I'm mostly playing with simple environments, 'Pendulum-v0', 'LunarLanderContinuous-v2', "BipedalWalkerHardcore-v2", but It'll probably have more impact when you have short episodes (and a big batch_size),

In Pendulum-v0, I observed that sometimes it didn't converge with networks of bigger layer size 300 whereas it did when the layer size was 100. It was spinning it up very fast always in the same direction.
In BipedalWalker, I observed that it would stiffen its leg, the action becoming saturated at the border of the action space, it would then "fall" and start exploring from the border of the action space.

Not doing updates before start_steps had some impact and helps mitigate those undesired behaviours.

My code is still buggy, so maybe yours can recover from it, but I figure this issue is quite orthogonal and make sense so you are probably affected too :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updates before start_steps #48

Updates before start_steps #48

unrealwill commented Nov 20, 2018

jachiam commented Nov 20, 2018

unrealwill commented Nov 21, 2018

Updates before start_steps #48

Updates before start_steps #48

Comments

unrealwill commented Nov 20, 2018

jachiam commented Nov 20, 2018

unrealwill commented Nov 21, 2018