Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question regarding the calculation of the actor gradient. #2

Closed
Tomakko opened this issue May 30, 2016 · 5 comments
Closed

Question regarding the calculation of the actor gradient. #2

Tomakko opened this issue May 30, 2016 · 5 comments

Comments

@Tomakko
Copy link

Tomakko commented May 30, 2016

Hi,

why did you include the minus sign in the grad_ys argument of the bottom function?

self.parameters_gradients = tf.gradients(self.action_output,self.parameters,-self.q_gradient_input/BATCH_SIZE)

As far as i understand grad_ys weights the gradients of each of the actor outputs with the the corresponding value (coming from the critic in your case).

Thanks!

@Tomakko
Copy link
Author

Tomakko commented Jun 3, 2016

Note that your are dividing twice through the batch size when computing the update for the actor weights.
1 q_gradient_batch = self.critic_network.gradients(state_batch,action_batch_for_gradients)/BATCH_SIZE in ddpg.py

2 self.parameters_gradients = tf.gradients(self.action_output,self.parameters,-self.q_gradient_input/BATCH_SIZE) in actor.py

This might be the cause of your bad performance.

@GeremWD
Copy link

GeremWD commented Jun 5, 2016

Hi,
Did you manage to improve the performances by correcting this ? Because even if i do not see any other error in the code, it does not work.

@Tomakko
Copy link
Author

Tomakko commented Jun 5, 2016

Still embedding this into a bigger project. I can probably give you feedback if its working or not in a week or so.
Could you elaborate what it not working out? Does the networks converge?
Btw: Note that in the original paper the critic has a learning rate of 0.001 while the author is using 0.0001.

@doomie
Copy link

doomie commented Jun 7, 2016

The paper suggests using batch-norm ("We also report results with components of our algorithm (i.e. the target network or batch normalization) removed. In order to perform
well across all tasks, both of these additions are necessary. "). See Figure 2 in https://arxiv.org/pdf/1509.02971.pdf

@floodsung
Copy link
Owner

Have fixed bugs and added batch norm on the actor network!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants