This repository contains the implementation for the paper "Generalized Fitted Q-iteration with Clustered Data" in Python. This paper focuses on reinforcement learning (RL) in clustered environments with limited data, a common scenario in healthcare applications. We propose an optimal policy learning algorithm that integrates Generalized Estimating Equations (GEE) into the Bellman equation framework to account for intra-cluster correlations. Our approach not only minimizes the variance of the Q-function estimator but also ensures that the derived policy achieves minimal regret.
We illustrate the motivation behind the proposed approach through a simple tabular example where the optimal Q-function is analytically known (See Section 3.1 for details). Increasing the variance of the Q-function leads to higher regret in the derived policies.
-
Folder
functions/
:generate_joint_data
: Generates data for simulation.GEE_Q
: Implements the generatlized Fitted Q iteration (FQI) and the optimal FQI with GEE.cov_struct
: Contains several correlation structures for GEE.utilities
: Contains some helping functions.
-
Folder
simulation/
:R_autoex
: Runs the generalized FQI and the proposed optimal FQI with different within cluster correlation structures.create_r_autoex.sh
: Creates SLURM jobs to runR_autoex.py
.Qonline_single.py
: Run online DQN to approximate the optimal Q function.create_online_Q.sh
: Creates SLURM jobs to runQonline_single.py
.
-
Folder
simulation/regret
:value_comparison.py
: Estimates the regret of the optimal Q function with different noise variance.create_value_comparison.sh
: Creates SLURM jobs to runvalue_comparison.py
.
-
Folder
semi/codes/
:run_individual_rl_ihs.py
: Run the semisynthetic simulation based on IHS dataset.run_online_learning.py
: Run the online policy learning on semisynthetic dataset.
-
Folder
semi/models
: Include the learned transition and reward functions learned from IHS dataset.