Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Action used for gradient calculation #14

Closed
AlEmerich opened this issue Aug 30, 2018 · 2 comments
Closed

Action used for gradient calculation #14

AlEmerich opened this issue Aug 30, 2018 · 2 comments

Comments

@AlEmerich
Copy link

Hi, thank you for your implementation, it helped me to wrote my own.

I have a question tho, about the action you used to compute gradients, in ddpg.py line 71.

Why don't you use action_batch to compute gradient ? I didn't manage to get any agent working so I can't test the difference.

@j314erre
Copy link

j314erre commented Feb 1, 2019

Because the formula for the DDPQ Algorithm 1 in the original paper says you compute gradients for

a Q(s,a|θQ)|s=st, a=μ(st)

i.e. you want the actions generated by the current actor network for these gradients...I believe this is why it is a deterministic policy gradient.

@AlEmerich
Copy link
Author

Yes, I finally figured it out on my own but thanks for the confirmation :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants