Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

action bounds #2

Open
jacobzweig opened this issue Jun 8, 2016 · 2 comments
Open

action bounds #2

jacobzweig opened this issue Jun 8, 2016 · 2 comments

Comments

@jacobzweig
Copy link

Hey - I was looking through your code to try to implement some things from here, but I couldn't figure out what you mean by the action_bounds parameter. I'm trying to implement it in an environment where there are 11 possible actions [0:10]. Any suggestions?

@MOCR
Copy link
Owner

MOCR commented Jun 8, 2016

What do you mean by 11 possible actions? 11 actions dimensions each one taking a specific value at each time step, or 11 discrete actions with only one to choose at each time step?
If it is discrete actions, this is not the algorithm you're looking for, I suggest you search for an implementation of DQN.

If it is 11 dimensions, the action bound parameter is used for the gradient inverter module that speed up the learning process (it is an idea explained in this paper : http://arxiv.org/pdf/1511.04143v4.pdf ).
So it represent the range of each actions dimensions. It has this structure :
[
[max_of_action_dim_0, max_of_action_dim_1, ..., max_of_action_dim_10],
[min_of_action_dim_0, min_of_action_dim_1, ..., min_of_action_dim_10]
]

@jacobzweig
Copy link
Author

Thanks! As you suspected, I was misunderstanding. Thank you for the tip and the useful code!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants