Inconsistency of notations in Intro to RL: part 1 #80

jiadingfang · 2018-12-20T22:39:27Z

Hi OpenAI developers,

Big love for this tutorial about RL! I just started studying the materials, and in the introduction to RL, part 1, I found some places where the notations are somewhat inconsistent.

In the Reward and Return section, the reward function is denoted as $$R$$, while in the Bellman equation section, the reward function is denoted as $$r$$.
In the Policies section, a (stochastic) policy $$\pi$$ is defined as a distribution of actions at time $$t$$. However, throughout the tutorials, notations like $$\tau\sim\pi$$ and $$a\sim\pi$$ are both frequently used, which creates some confusion about what distribution $$\pi$$ really is. (I can see why $$\tau\sim\pi$$ is reasonable when a trajectory $$\tau$$ is generated by a policy $$\pi$$, but it's not what the definition promise)

These findings are by no means errors, but just notational issues that may cause confusions. I wish they could be fixed, or if they are indeed used interchangeably in the literature and you decide not to change it, then some explanations would be just fine.

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistency of notations in Intro to RL: part 1 #80

Inconsistency of notations in Intro to RL: part 1 #80

jiadingfang commented Dec 20, 2018 •

edited

Loading

Inconsistency of notations in Intro to RL: part 1 #80

Inconsistency of notations in Intro to RL: part 1 #80

Comments

jiadingfang commented Dec 20, 2018 • edited Loading

jiadingfang commented Dec 20, 2018 •

edited

Loading