Summary README Part I Chapter 1: Introduction Chapter 2: Multi-armed Bandits Chapter 3: Finite Markov Decision Processes Chapter 4: Dynamic Programming Chapter 5: Monte Carlo Methods Chapter 6: Temporal-Difference Learning Chapter 7: n-step Bootstrapping Chapter 8: Planning and Learning with Tabular Methods Part II Chapter 9: On-policy Prediction with Approximation Chapter 10: On-policy Control with Approximation Chapter 11: Off-policy Methods with Approximation Chapter 12: Eligibility Traces Chapter 13: Policy Gradient Methods Part III Chapter 14: Psychology Chapter 15: Neuroscience Chapter 16: Applications and Case Studies