You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It seems to use n-steps learning in this implementation, which n is 30(SAMPLE_NUMS = 30)?
I have another question about this implementation:
[Q] As far as I know, A2C is a synchronous version of A3C. In order to deal with correlation issue, we use multiple workers in A3C so as A2C. However, in this version, It doesn't support multiple workers which may cause learning biased.
别人很多A2C是一步一更新,你这个是每一轮过后总的更新一次
The text was updated successfully, but these errors were encountered: