To run the virtual environment, you need to set up MuJoCo.
- Download the MuJoCo version 2.1 binaries for Linux or OSX.
- Extract the downloaded mujoco210 directory into ~/.mujoco/mujoco210.
- Install and use mujoco-py.
pip install -U 'mujoco-py<2.2,>=2.1'
pip install -e ./mujuco_environment
# (optional) if mujoco dir need to changed
export MUJOCO_PY_MUJOCO_PATH=YOUR_MUJOCO_DIR/.mujoco/mujoco210
export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:YOUR_MUJOCO_DIR/.mujoco/mujoco210/bin:/usr/lib/nvidia
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/nvidia
We highly recommend you to ensure the MuJoCo is indeed working by running testing examples in mujoco-py. In most case, you need to run:
import mujoco_py
import os
mj_path = mujoco_py.utils.discover_mujoco()
xml_path = os.path.join(mj_path, 'model', 'humanoid.xml')
model = mujoco_py.load_model_from_path(xml_path)
sim = mujoco_py.MjSim(model)
Throughout this section, we will use the Blocked Half-cheetah
environment as an example,
for using other environments (including Blocked Ant
, Biased Pendulumn
, Blocked Walker
, and Blocked Swimmer
, please refer to their configs in this dir)
Note that the expert agent is to generate demonstration data (see the step 3 below).
# step in the dir containing the "main" files.
cd ./interface/
# run PPO without knowing the constraint
python train_policy.py ../config/mujuco_BlockedHalfCheetah/train_ppo_HCWithPos-v0.yaml -n 5 -s 123
# run PPO-Lag knowing the ground-truth
python train_policy.py ../config/mujuco_BlockedHalfCheetah/train_ppo_lag_HCWithPos-v0.yaml -n 5 -s 123
Note that:
- you don't need to generate expert demonstrations since they are provided, but if you want to test other types of expert demonstrations, here is the code to start from:
- Since the generation relies on expert agent, we provide an example to you (named
train_ppo_lag_HCWithPos-v0-multi_env-Apr-21-2022-04:49-seed_123
).
# step in the dir containing the "main" files.
cd ./interface/
# run data generation
python generate_data_for_constraint_inference.py -n 5 -mn train_ppo_lag_HCWithPos-v0-multi_env-Apr-21-2022-04:49-seed_123 -tn PPO-Lag-HC -ct no-constraint -rn 1
We use the Blocked Half-Cheetah
environment as an example (also see the notice above).
Note that:
- This is to reproduce the results in the Section 4.2 of our paper.
- The following code uses the random seed '123'. For reproduction, a total of 5 random seeds ('123', '321', '456', '654', '666') are required.
# step in the dir containing the "main" files.
cd ./interface/
# run GACL
python train_gail.py ../config/mujuco_BlockedHalfCheetah/train_GAIL_HCWithPos-v0.yaml -n 5 -s 123
# run BC2L
python train_icrl.py ../config/mujuco_BlockedHalfCheetah/train_Binary_HCWithPos-v0.yaml -n 5 -s 123
# run MECL
python train_icrl.py ../config/mujuco_BlockedHalfCheetah/train_ICRL_HCWithPos-v0.yaml -n 5 -s 123
# run VICRL
python train_icrl.py ../config/mujuco_BlockedHalfCheetah/train_VICRL_HCWithPos-v0.yaml -n 5 -s 123
Note that:
- This is to reproduce the results in the Section 4.3 of our paper.
- The following code uses the random seed '123'. For reproduction, a total of 5 random seeds ('123', '321', '456', '654', '666') are required.
# step in the dir containing the "main" files.
cd ./interface/
# run GACL with 20%/50%/80% sub-optimal trajectories
python train_gail.py ../config/mujuco_HCWithPos-v0/train_GAIL_HCWithPos-v0_sub-2e-1.yaml -n 5 -s 123
python train_gail.py ../config/mujuco_HCWithPos-v0/train_GAIL_HCWithPos-v0_sub-5e-1.yaml -n 5 -s 123
python train_gail.py ../config/mujuco_HCWithPos-v0/train_GAIL_HCWithPos-v0_sub-8e-1.yaml -n 5 -s 123
# run Binary with 20%/50%/80% sub-optimal trajectories
python train_icrl.py ../config/mujuco_HCWithPos-v0/train_Binary_HCWithPos-v0_sub-2e-1.yaml -n 5 -s 123
python train_icrl.py ../config/mujuco_HCWithPos-v0/train_Binary_HCWithPos-v0_sub-5e-1.yaml -n 5 -s 123
python train_icrl.py ../config/mujuco_HCWithPos-v0/train_Binary_HCWithPos-v0_sub-8e-1.yaml -n 5 -s 123
# run MECL with 20%/50%/80% sub-optimal trajectories
python train_icrl.py ../config/mujuco_HCWithPos-v0/train_ICRL_HCWithPos-v0_sub-2e-1.yaml -n 5 -s 123
python train_icrl.py ../config/mujuco_HCWithPos-v0/train_ICRL_HCWithPos-v0_sub-5e-1.yaml -n 5 -s 123
python train_icrl.py ../config/mujuco_HCWithPos-v0/train_ICRL_HCWithPos-v0_sub-8e-1.yaml -n 5 -s 123
# run VCIRL with 20%/50%/80% sub-optimal trajectories
python train_icrl.py ../config/mujuco_HCWithPos-v0/train_VICRL_HCWithPos-v0_sub-2e-1.yaml -n 5 -s 123
python train_icrl.py ../config/mujuco_HCWithPos-v0/train_VICRL_HCWithPos-v0_sub-5e-1.yaml -n 5 -s 123
python train_icrl.py ../config/mujuco_HCWithPos-v0/train_VICRL_HCWithPos-v0_sub-8e-1.yaml -n 5 -s 123
Note that:
- This is to reproduce the results in the Section 4.3 of our paper.
- The following code uses the random seed '123'. For reproduction, a total of 5 random seeds ('123', '321', '456', '654', '666') are required.
# step in the dir containing the "main" files.
cd ./interface/
# run GACL with noise scale N(0,0.001), N(0,0.01), N(0,0.1)
python train_icrl.py ../config/mujuco_HCWithPos-v0/train_GAIL_HCWithPos-v0_noise-1e-3.yaml -n 5 -s 123
python train_icrl.py ../config/mujuco_HCWithPos-v0/train_GAIL_HCWithPos-v0_noise-1e-2.yaml -n 5 -s 123
python train_icrl.py ../config/mujuco_HCWithPos-v0/train_GAIL_HCWithPos-v0_noise-1e-1.yaml -n 5 -s 123
# run Binary with noise scale N(0,0.001), N(0,0.01), N(0,0.1)
python train_icrl.py ../config/mujuco_HCWithPos-v0/train_Binary_HCWithPos-v0_noise-1e-3.yaml -n 5 -s 123
python train_icrl.py ../config/mujuco_HCWithPos-v0/train_Binary_HCWithPos-v0_noise-1e-2.yaml -n 5 -s 123
python train_icrl.py ../config/mujuco_HCWithPos-v0/train_Binary_HCWithPos-v0_noise-1e-1.yaml -n 5 -s 123
# run ICRL with noise scale N(0,0.001), N(0,0.01), N(0,0.1)
python train_icrl.py ../config/mujuco_HCWithPos-v0/train_ICRL_HCWithPos-v0_noise-1e-3.yaml -n 5 -s 123
python train_icrl.py ../config/mujuco_HCWithPos-v0/train_ICRL_HCWithPos-v0_noise-1e-2.yaml -n 5 -s 123
python train_icrl.py ../config/mujuco_HCWithPos-v0/train_ICRL_HCWithPos-v0_noise-1e-1.yaml -n 5 -s 123
# run VICRL with noise scale N(0,0.001), N(0,0.01), N(0,0.1)
python train_icrl.py ../config/mujuco_HCWithPos-v0/train_VICRL_HCWithPos-v0_noise-1e-3.yaml -n 5 -s 123
python train_icrl.py ../config/mujuco_HCWithPos-v0/train_VICRL_HCWithPos-v0_noise-1e-2.yaml -n 5 -s 123
python train_icrl.py ../config/mujuco_HCWithPos-v0/train_VICRL_HCWithPos-v0_noise-1e-1.yaml -n 5 -s 123