Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistency of notations in Intro to RL: part 1 #80

Open
jiadingfang opened this issue Dec 20, 2018 · 0 comments
Open

Inconsistency of notations in Intro to RL: part 1 #80

jiadingfang opened this issue Dec 20, 2018 · 0 comments

Comments

@jiadingfang
Copy link

jiadingfang commented Dec 20, 2018

Hi OpenAI developers,

Big love for this tutorial about RL! I just started studying the materials, and in the introduction to RL, part 1, I found some places where the notations are somewhat inconsistent.

  1. In the Reward and Return section, the reward function is denoted as $$R$$, while in the Bellman equation section, the reward function is denoted as $$r$$.
  2. In the Policies section, a (stochastic) policy $$\pi$$ is defined as a distribution of actions at time $$t$$. However, throughout the tutorials, notations like $$\tau\sim\pi$$ and $$a\sim\pi$$ are both frequently used, which creates some confusion about what distribution $$\pi$$ really is. (I can see why $$\tau\sim\pi$$ is reasonable when a trajectory $$\tau$$ is generated by a policy $$\pi$$, but it's not what the definition promise)

These findings are by no means errors, but just notational issues that may cause confusions. I wish they could be fixed, or if they are indeed used interchangeably in the literature and you decide not to change it, then some explanations would be just fine.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant