-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can you elaborate on running SAC on discrete action space #22
Comments
You're actually the second person to ask about this! First person sent an email. I'll add a sub-section or a "you should know" to the docs to go over this soon. |
Thanks. Also since this tutorial is more in favor of learn-by-doing rather than being purely theoretical, it would be nice to see explanations with some images of neural network architectures to get a quick overview of how to implement. For e.g SAC implements about 5 NN for value ,value_target , gaussian_policy, 2 Q_networks . It would be more convenient to understand if there is some pictorial representation of the networks and their relation |
count me as 3rd. For discrete action space, the entropy calculation can be directly derived from distribution. The policy loss needs probably to maximize the advantage * log_probablity. What I am confused is, do we still need 2 Q networks and 1 Value network? |
Is it just average over all \pi(a|s) for all actions, as it is already parameterized? |
+1 |
+1 |
In the docs, it is mentioned about an alternate version of SAC with slight change can be used for discrete action space. Please elaborate with some more details.
The text was updated successfully, but these errors were encountered: