This repository contains the code for downloading the eICU dataset from PhysioNet and transforming it into the Medical Event Data Standard (MEDS) format.
pip install eICU-MEDS # use `pip install -e .` for local installation in editing mode
export DATASET_DOWNLOAD_USERNAME=$PHYSIONET_USERNAME
export DATASET_DOWNLOAD_PASSWORD=$PHYSIONET_PASSWORD
MEDS_extract-eICU root_output_dir=data/eicu_meds do_download=FalseIf you want to convert a large dataset, you can use parallelization with MEDS-transforms (the MEDS-transformation step that takes the longest).
Using local parallelization with the hydra-joblib-launcher package, you can set the number of workers:
pip install hydra-joblib-launcher --upgrade
Then, you can set the number of workers as environment variable:
export N_WORKERS=8Moreover, you can set the number of subjects per shard to balance the parallelization overhead based on how many subjects you have in your dataset:
export N_SUBJECTS_PER_SHARD=100000