action bounds #2

jacobzweig · 2016-06-08T03:00:38Z

Hey - I was looking through your code to try to implement some things from here, but I couldn't figure out what you mean by the action_bounds parameter. I'm trying to implement it in an environment where there are 11 possible actions [0:10]. Any suggestions?

MOCR · 2016-06-08T09:00:42Z

What do you mean by 11 possible actions? 11 actions dimensions each one taking a specific value at each time step, or 11 discrete actions with only one to choose at each time step?
If it is discrete actions, this is not the algorithm you're looking for, I suggest you search for an implementation of DQN.

If it is 11 dimensions, the action bound parameter is used for the gradient inverter module that speed up the learning process (it is an idea explained in this paper : http://arxiv.org/pdf/1511.04143v4.pdf ).
So it represent the range of each actions dimensions. It has this structure :
[
[max_of_action_dim_0, max_of_action_dim_1, ..., max_of_action_dim_10],
[min_of_action_dim_0, min_of_action_dim_1, ..., min_of_action_dim_10]
]

jacobzweig · 2016-06-09T17:45:32Z

Thanks! As you suspected, I was misunderstanding. Thank you for the tip and the useful code!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

action bounds #2

action bounds #2

jacobzweig commented Jun 8, 2016

MOCR commented Jun 8, 2016 •

edited

Loading

jacobzweig commented Jun 9, 2016

action bounds #2

action bounds #2

Comments

jacobzweig commented Jun 8, 2016

MOCR commented Jun 8, 2016 • edited Loading

jacobzweig commented Jun 9, 2016

MOCR commented Jun 8, 2016 •

edited

Loading