RL algorithms with pytorch.
- Q-learning
- Deep Q-Network (DQN)
- Deep Deterministic Policy Gradient (DDPG)
- (Asynchronous) Advantage Actor-Critic (A3C/A2C)
- Trust Region Policy Optimization (TRPO)
- Proximal Policy Optimization (PPO)
- Prioritized Experience Replay
- Hindsight Experience Replay
- Count-based Exploration
Feel free to open an issue or a pull request.