You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey - I was looking through your code to try to implement some things from here, but I couldn't figure out what you mean by the action_bounds parameter. I'm trying to implement it in an environment where there are 11 possible actions [0:10]. Any suggestions?
The text was updated successfully, but these errors were encountered:
What do you mean by 11 possible actions? 11 actions dimensions each one taking a specific value at each time step, or 11 discrete actions with only one to choose at each time step?
If it is discrete actions, this is not the algorithm you're looking for, I suggest you search for an implementation of DQN.
If it is 11 dimensions, the action bound parameter is used for the gradient inverter module that speed up the learning process (it is an idea explained in this paper : http://arxiv.org/pdf/1511.04143v4.pdf ).
So it represent the range of each actions dimensions. It has this structure :
[
[max_of_action_dim_0, max_of_action_dim_1, ..., max_of_action_dim_10],
[min_of_action_dim_0, min_of_action_dim_1, ..., min_of_action_dim_10]
]
Hey - I was looking through your code to try to implement some things from here, but I couldn't figure out what you mean by the action_bounds parameter. I'm trying to implement it in an environment where there are 11 possible actions [0:10]. Any suggestions?
The text was updated successfully, but these errors were encountered: