- Obtain access to the MIMIC-CXR-JPG Database Database on PhysioNet and download the dataset. We recommend downloading from the GCP bucket:
gcloud auth login
mkdir MIMIC-CXR-JPG
gsutil -m rsync -d -r gs://mimic-cxr-jpg-2.0.0.physionet.org MIMIC-CXR-JPG
- In order to obtain gender information for each patient, you will need to obtain access to MIMIC-IV. Download
core/patients.csv.gzandcore/admissions.csv.gzand place the files in theMIMIC-CXR-JPGdirectory.
-
Sign up with your email address here.
-
Download either the original or the downsampled dataset (we recommend the downsampled version -
CheXpert-v1.0-small.zip) and extract it. -
Register for an account and download the CheXpert demographics data here.
-
In
cxr_fairness/data/Constants.py, updateimage_pathsto point to the two directories that you downloaded, andCXP_detailsto be the path to the CheXpert demographics file. -
Run
python -m cxr_fairness.data.preprocess.preprocess. -
(Optional) If you are training a lot of models, it might be faster to cache all images to binary 224x224 files on disk. This is especially true if you are using non-downsized versions of the datasets. In this case, you should update the
cache_dirpath incxr_fairness/data/Constants.pyand then runpython -m cxr_fairness.data.preprocess.cache_data, optionally parallelizing over--env_id {0, 1}for speed. To use the cached files, pass--use_cachetotrain.py.