This repo is for the Influence-Aware Memory(IAM) architecture(https://arxiv.org/abs/1911.07643), based on the pytorch structure of https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail and the paper's repo source https://github.com/INFLUENCEorg/influence-aware-memory
To run for different scenarios, use the following codes
cd YOUR_PATH/IAM-Reproduce
Run FNN8 (32)
python main.py --env-name warehouse --num-steps 8 --recurrent-policy --log-dir ./log_w/
python main.py --env-name traffic --num-steps 32 --recurrent-policy --num-env-steps 2000000 --num-processes 1 --log-dir ./log_tc/
Run FNN1 (10)
python main.py --env-name warehouse --num-steps 1 --recurrent-policy --log-dir ./log_w/
python main.py --env-name traffic --num-steps 10 --recurrent-policy --num-env-steps 2000000 --num-processes 1 --log-dir ./log_tc/
Run GRU only
python main.py --env-name warehouse --num-steps 8 --log-dir ./log_w/
python main.py --env-name traffic --num-steps 32 --num-env-steps 2000000 --num-processes 1 --log-dir ./log_tc/
Run IAM
python main.py --env-name warehouse --num-steps 8 --IAM --log-dir ./log_w/
python main.py --env-name traffic --num-steps 32 --IAM --num-env-steps 2000000 --num-processes 1 --log-dir ./log_tc/
NOTE:
- To render the warehouse dynamics, alter the variable
render_bool
to True inwarehouse.py
, and run with just 1 processes(recommended, because all processes will pop out) - The
log_xxx
folder will store the monitor files of all processes and a manually stored filemean_rewards_xxx.txt
recording the mean rewards.
The results are saved in ./log (warehouse) and ./log_t (traffic), respectively. To visualize the results, run the following code. EWMA method is used to smooth the collected data.
python plot_results.py
Currently the results of mean rewards is like:
To run flicker Atari 'BreakoutNoFrameskip-v4' without flickering, use:
python main.py --env-name BreakoutNoFrameskip-v4 --num-env-steps 4000000 --num-steps 8 --lr 0.00025 --log-dir ./log_fa/ --IAM
The result:
To run flicker Atari 'BreakoutNoFrameskip-v4' with flickering, use:
python main.py --env-name BreakoutNoFrameskip-v4 --num-env-steps 4000000 --num-steps 32 --lr 0.00025 --log-dir ./log_fa/ --IAM --flicker
For the result with flickering, we are still testing.
The work is customized in:
- Embed three environments(warehouse, traffic control with sumo and Atari from Gym) in the pytorch structure
- Design
IAMModel.py
for Influence-Aware Model, used with A2C - Visualize the result, and compare them with the original paper