Action used for gradient calculation #14

AlEmerich · 2018-08-30T10:18:48Z

Hi, thank you for your implementation, it helped me to wrote my own.

I have a question tho, about the action you used to compute gradients, in ddpg.py line 71.

Why don't you use action_batch to compute gradient ? I didn't manage to get any agent working so I can't test the difference.

j314erre · 2019-02-01T08:36:51Z

Because the formula for the DDPQ Algorithm 1 in the original paper says you compute gradients for

∇ _a Q(s,a|θ^Q)|_{s=s_t, a=μ(s_t)}

i.e. you want the actions generated by the current actor network for these gradients...I believe this is why it is a deterministic policy gradient.

AlEmerich · 2019-02-03T21:42:21Z

Yes, I finally figured it out on my own but thanks for the confirmation :)

AlEmerich closed this as completed Feb 3, 2019

Provide feedback