Rainbow 🌈

An variant of Rainbow DQN which reaches a median HNS of 205.7 after only 10M frames (the original Rainbow from Hessel et al. 2017 reached 231.0 using 20x more data). See the paper for more details. This was developed as part of an undergraduate university course on scientific research and writing. A selection of videos is available here.

Key Changes and Results

We used the large IMPALA-CNN with 2x channels from Espeholt et al. (2018), other networks are also implemented.
We used spectral normalization in the residual blocks which resulted in faster learning (especially at the start of training).
We removed the distributional RL component since we didn't see any benefit when only training for 10M frames and appreciated the reduced implementation complexity (we tried both C51 and QR-DQN).
We performed additional hyperparameters tuning (see paper).
The implementation uses large, vectorized environments, asynchronous environment interaction, mixed-precision training, and larger batch sizes to improve computational efficiency and reduce training time.
Integrations and recommended preprocessing for >1000 environments from gym, gym-retro and procgen are provided.

Please cite the paper if you use this implementation in your publication.

Setup

Install necessary prerequisites with

sudo apt install zlib1g-dev cmake unrar
pip install wandb gym[atari]==0.18.0 imageio moviepy torchsummary tqdm rich procgen gym-retro torch stable_baselines3 atari_py==0.2.9

If you intend to use gym Atari games, you will need to install these separately, e.g., by running:

wget http://www.atarimania.com/roms/Roms.rar 
unrar x Roms.rar
python -m atari_py.import_roms .

To set up gym-retro games you should follow the instructions here.

How to use

To get started right away, run

python train_rainbow.py --env_name gym:Qbert

This will train Rainbow on Atari Qbert and log all results to "Weights and Biases" and the checkpoints directory.

Please take a look at common/argp.py or run python train_rainbow.py --help for more configuration options.

Some Notes

With a single RTX 3090 and 12 CPU cores, training for 10M frames takes around 7.5 hours.
About 15GB of RAM are required. When using a larger replay buffer or subprocess envs, memory use may be much higher.
Hyperparameters can be configured through command line arguments; defaults can be found in common/argp.py
For highest training throughput use batch_size=512, parallel_envs=64, train_count=1, subproc_vecenv=True

Acknowledgements

We are very grateful to the TU Wien DataLab for providing the majority of the compute resources that were necessary to perform the experiments.

Here are some other implementations and resources that were helpful in the completion of this project:

OpenAI Baselines (especially for preprocessing and Atari wrappers)
https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/dqn_atari.py
https://github.com/Kaixhin/Rainbow/
https://github.com/Kaixhin/Rainbow/wiki/Matteo's-Notes

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
common		common
LICENSE		LICENSE
README.md		README.md
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Rainbow 🌈

Key Changes and Results

Setup

How to use

Some Notes

Acknowledgements

About

Languages

License

schmidtdominik/Rainbow

Folders and files

Latest commit

History

Repository files navigation

Rainbow 🌈

Key Changes and Results

Setup

How to use

Some Notes

Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Languages