Visual Prompt Engineering for Multimodal and Irregularly Sampled Medical Data

Malte Tölle, Mohamad Scharaf, Samantha Fischer, Christoph Reich, Silav Zeid, Christoph Dieterich, Benjamin Meder, Norbert Frey, Philipp Wild, Sandy Engelhardt

Paper link: https://arxiv.org/abs/2501.18237

Abstract

A multitude of examinations are conducted to assess a patient's health, with each modality contributing unique information that collectively creates comprehensive understanding. These assessments include temporal data with varying sampling rates as well as single value measurements, interventions like medications, or imaging modalities. While physicians are able to process different information easily, neural networks need specific modeling for each modality complicating the training procedure. We demonstrate that this complexity can be significantly reduced by visualizing all information as images along with unstructured text and subsequently training a conventional vision-text transformer. Our approach, Vision Transformer for irregular sampled Multi-modal Measurements (ViTiMM), not only simplifies data preprocessing and modeling but also outperforms current state-of-the-art methods in predicting in-hospital mortality and phenotyping, as evaluated on 6,175 patients from the MIMIC-IV dataset. The modalities include patient's clinical measurements, medications, X-ray images, and electrocardiography scans. % characteristics, conditions, and We hope our work inspires advancements in multi-modal medical AI by reducing the training complexity to (visual) prompt engineering, thus lowering entry barriers and enabling no-code solutions for training.

Method

During a hospital stay, a patient typically undergoes multiple examinations, each offering distinct insights into their health status. While physicians have learned to intuitively extract the different information and assemble them to an overall picture, neural networks need specific modeling of the different modalities and their interactions. Nevertheless, once these challenges are addressed, multi-modal models have demonstrated promising performance. However, a significant challenge persists: How to integrate multi-modal data that is captured at irregularly sampled time intervals?

Our primary contribution is a substantial reduction of the modeling complexity for multiple irregularly sampled modalities by transforming each modality into an image representation. Humans are then tasked with visualizing the different modalities in an informative manner, effectively engaging in a form of "visual prompt engineering". For example, laboratory measurements can be represented as line graphs over time to convey trends and patterns (Li et al., 2023). Our approach, Vision Transformer for irregular sampled Multi-modal Measurements (ViTiMM), unifies the data processing pipeline, significantly reducing modeling complexity. This approach not only mimics the way humans interpret diverse data streams but also demonstrates significant improvements across a range of tasks.

Results

Results of ViTiMM for the task In-hospital Mortality and Phenotyping compared to MeTra and MedFuse.
We compare the three methods for uni-modal training for clinical measurements (C) and X-ray (X) as well as a combination of the two similar to their original publication.
Extension of both methods to further modalities requires explicit modeling, which must not be done in ViTiMM.
Thus, by only plotting the other modalities, our method can straightforwardly expand to arbitrary modalities.
The results per phenotype can be found in Supplementary Table.
The corresponding significance tests (pairwise t-test) can be found in Supplementary Tables.

Modalities:

C: Clinical measurements
X: CXR images
M: Medications
E: Electrocardiography

Method	Modalities	In-hospital Mortality			Phenotyping
Method	Modalities	AUROC	AUPRC	Bal. Acc.	AUROC	AUPRC	Bal. Acc.
MeTra	C	0.791	0.441	0.609	0.691	0.400	0.574
	X	0.810	0.471	0.544	0.667	0.387	0.564
	C\|X	0.859	0.595	0.707	0.712	0.431	0.583
MedFuse	C	0.812	0.448	0.571	0.705	0.417	0.569
	X	0.662	0.264	0.500	0.640	0.349	0.538
	C\|X	0.805	0.431	0.631	0.733	0.448	0.600
ViTiMM (Ours)	C	0.837	0.512	0.743	0.766	0.506	0.618
	X	0.826	0.494	0.758	0.730	0.460	0.589
	M	0.741	0.346	0.680	0.710	0.430	0.577
	E	0.704	0.297	0.636	0.681	0.427	0.573
	C\|X	0.875	0.615	0.776	0.778	0.530	0.636
	C\|M\|X\|E	0.922	0.764	0.847	0.784	0.549	0.659

Usage

After downloading the MIMIC datasets all plots can be created with the plot_[labs,ecgs,meds].ipynb files.

MIMIC-IV: https://physionet.org/content/mimiciv/3.1/

MIMIC-CXR: https://physionet.org/content/mimic-cxr-jpg/2.1.0/

MIMIC-IC-ECG: https://physionet.org/content/mimic-iv-ecg/1.0/

Place the runs folder in this directory, the data directory can have an arbitrary location.

Training can be performed with:

python main.py \
    --task [inhospital_mortality,phenotyping] \
    --model [swin,vit] \
    --modalities lab med cxr ecg \
    [--with_text] \
    --root PATH_TO_DATA \
    --n_epochs 3 \
    --weight_decay 3e-8 \
    --lrs 1e-5 5e-6 1e-6 \
    --batch_size 4 \
    [--ckpt PATH_TO_CKPT] \
    --seed 0

BibTeX

@misc{toelle2025vitimm,
    title={Arbitrary Data as Images: Fusion of Patient Data Across Modalities and Irregular Intervals with Vision Transformers},
    author={T{\"o}lle, Malte and Scharaf, Mohamad and Fischer, Samantha and Reich, Christoph and Zeid, Silav and Dieterich, Christoph and Meder, Benjamin and Frey, Norbert and Wild, Philipp and Engelhardt, Sandy},
    year={2025},
    doi={10.48550/arXiv.2501.18237}
}

Contact

Malte Tölle
malte.toelle@med.uni-heidelberg.de
@maltetoelle

Group Artificial Intelligence in Cardiovascular Medicine (AICM)
Heidelberg University Hospital
Im Neuenheimer Feld 410, 69120 Heidelberg, Germany

Name	Name	Last commit message	Last commit date
Latest commit maltetoelle update readme Feb 17, 2025 890ba71 · Feb 17, 2025 History 4 Commits
images	images	update readme	Feb 17, 2025
README.md	README.md	update readme	Feb 17, 2025
attention.ipynb	attention.ipynb	initial commit	Feb 11, 2025
data.py	data.py	initial commit	Feb 11, 2025
eval_dataloader.ipynb	eval_dataloader.ipynb	initial commit	Feb 11, 2025
eval_inhospital_mortality.ipynb	eval_inhospital_mortality.ipynb	initial commit	Feb 11, 2025
eval_phenotyping.ipynb	eval_phenotyping.ipynb	initial commit	Feb 11, 2025
main.py	main.py	initial commit	Feb 11, 2025
med_categories.json	med_categories.json	initial commit	Feb 11, 2025
model.py	model.py	initial commit	Feb 11, 2025
p_values_inhospital_mortality.ipynb	p_values_inhospital_mortality.ipynb	initial commit	Feb 11, 2025
p_values_phenotyping.ipynb	p_values_phenotyping.ipynb	initial commit	Feb 11, 2025
patient_journey.ipynb	patient_journey.ipynb	initial commit	Feb 11, 2025
plot_ecgs.ipynb	plot_ecgs.ipynb	initial commit	Feb 11, 2025
plot_labs.ipynb	plot_labs.ipynb	initial commit	Feb 11, 2025
plot_meds.ipynb	plot_meds.ipynb	initial commit	Feb 11, 2025
requirements.txt	requirements.txt	initial commit	Feb 11, 2025
utils.py	utils.py	initial commit	Feb 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Visual Prompt Engineering for Multimodal and Irregularly Sampled Medical Data

Abstract

Method

Results

Usage

BibTeX

Contact

About

Languages

Cardio-AI/ViTiMM

Folders and files

Latest commit

History

Repository files navigation

Visual Prompt Engineering for Multimodal and Irregularly Sampled Medical Data

Abstract

Method

Results

Usage

BibTeX

Contact

About

Resources

Stars

Watchers

Forks

Languages