Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to draw the score curve of PPO? #78

Open
AceChuse opened this issue Dec 19, 2018 · 0 comments
Open

How to draw the score curve of PPO? #78

AceChuse opened this issue Dec 19, 2018 · 0 comments

Comments

@AceChuse
Copy link

Hello,

I don't know how to draw the score curve PPO which in the paper of PPO? How to deal with the situation when the game is not over but the sample pool is full? In this cause if we end the game, it means that we cannot calculate the score which agent needs to perform more than Horizon (T) interactions with the environment to get, such as Walker2d-v1. But if we don't end the game when the sample pool is full, the number of samples maybe larger than Horizon (T).

I don't know how to deal with this problem. How do you deal with this problem in the PPO experiment? I really care about this. Thanks for your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant