Pref-GUIDE: Continual Policy Learning from Real-Time Human Feedback via Preference-Based Rewards

¹ Duke University

Overview

Result

Quick Start

Clone the repository:

git clone https://github.com/generalroboticslab/Pref-GUIDE.git

Install the CREW platform, follow the instructions in CREW
Activate the conda environment

conda activate crew

Download the human feedback dataset from here, and extract it with the following

tar -xvzf RL_checkpoint.tar.gz
cd RL_checkpoint
python unzip_data.py
cd ../

Process the dataset for reward model training

python process_data/process_data.py

Train the preference-based reward model:

cd reward_model_training
bash train_model.sh

Train the RL Agent with the reward model:

cd CREW/crew-algorithms
bash ddpg.sh

Evaluate the trained RL Agent

cd CREW/crew-algorithms
bash ddpg_eval.sh

Acknowledgement

This work is supported by the ARL STRONG program under awards W911NF2320182, W911NF2220113, and W911NF2420215, and by gift supports from BMW and OpenAI. We also thank Lingyu Zhang for helpful discussion.

Citation

If you think this paper is helpful, please consider citing our work

@misc{ji2025prefguidecontinualpolicylearning,
      title={Pref-GUIDE: Continual Policy Learning from Real-Time Human Feedback via Preference-Based Learning}, 
      author={Zhengran Ji and Boyuan Chen},
      year={2025},
      eprint={2508.07126},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2508.07126}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
Analysis		Analysis
CREW		CREW
Figure		Figure
process_data		process_data
reward_model_training		reward_model_training
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Pref-GUIDE: Continual Policy Learning from Real-Time Human Feedback via Preference-Based Rewards

Overview

Result

Quick Start

Acknowledgement

Citation

About

Uh oh!

Releases

Packages

Languages

generalroboticslab/Pref-GUIDE

Folders and files

Latest commit

History

Repository files navigation

Pref-GUIDE: Continual Policy Learning from Real-Time Human Feedback via Preference-Based Rewards

Overview

Result

Quick Start

Acknowledgement

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages