¹ Duke University
- Clone the repository:
git clone https://github.com/generalroboticslab/Pref-GUIDE.git
-
Install the CREW platform, follow the instructions in CREW
-
Activate the conda environment
conda activate crew
- Download the human feedback dataset from here, and extract it with the following
tar -xvzf RL_checkpoint.tar.gz
cd RL_checkpoint
python unzip_data.py
cd ../
- Process the dataset for reward model training
python process_data/process_data.py
- Train the preference-based reward model:
cd reward_model_training
bash train_model.sh
- Train the RL Agent with the reward model:
cd CREW/crew-algorithms
bash ddpg.sh
- Evaluate the trained RL Agent
cd CREW/crew-algorithms
bash ddpg_eval.sh
This work is supported by the ARL STRONG program under awards W911NF2320182, W911NF2220113, and W911NF2420215, and by gift supports from BMW and OpenAI. We also thank Lingyu Zhang for helpful discussion.
If you think this paper is helpful, please consider citing our work
@misc{ji2025prefguidecontinualpolicylearning,
title={Pref-GUIDE: Continual Policy Learning from Real-Time Human Feedback via Preference-Based Learning},
author={Zhengran Ji and Boyuan Chen},
year={2025},
eprint={2508.07126},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2508.07126},
}