This is the code for experiments in the paper Learning to Incentivize Other Learning Agents. Baselines are included.
- Python 3.6
- Tensorflow >= 1.12
- OpenAI Gym == 0.10.9
- Clone and
pip installSequential Social Dilemma, which is a fork from the original open-source implementation. - Clone and
pip installLOLA if you wish to run this baseline. - Clone this repository and run
$ pip install -e .from the root.
alg/- Implementation of LIO and PG/AC baselinesenv/- Implementation of the Escape Room game and wrappers around the SSD environment.results/- Results of training will be stored in subfolders here. Each independent training run will create a subfolder that contains the final Tensorflow model, and reward log files. For example, 5 parallel independent training runs would createresults/cleanup/10x10_lio_0,...,results/cleanup/10x10_lio_4(depending on configurable strings in config files).utils/- Utility methods
- Set config values in
alg/config_room_lio.py cdinto thealgfolder- Execute training script
$ python train_multiprocess.py lio er. Default settings conduct 5 parallel runs with different seeds. - For a single run, execute
$ python train_lio.py er.
- Set config values in
alg/config_ssd_lio.py cdinto thealgfolder- Execute training script
$ python train_multiprocess.py lio ssd. - For a single run, execute
$ python train_ssd.py.
@article{yang2020learning,
title={Learning to incentivize other learning agents},
author={Yang, Jiachen and Li, Ang and Farajtabar, Mehrdad and Sunehag, Peter and Hughes, Edward and Zha, Hongyuan},
journal={Advances in Neural Information Processing Systems},
volume={33},
pages={15208--15219},
year={2020}
}
See LICENSE.
SPDX-License-Identifier: MIT