Skip to content

Vision Transformer for irregular sampled Multi-modal Measurements (ViTiMM)

Notifications You must be signed in to change notification settings

Cardio-AI/ViTiMM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

890ba71 · Feb 17, 2025

History

4 Commits
Feb 17, 2025
Feb 17, 2025
Feb 11, 2025
Feb 11, 2025
Feb 11, 2025
Feb 11, 2025
Feb 11, 2025
Feb 11, 2025
Feb 11, 2025
Feb 11, 2025
Feb 11, 2025
Feb 11, 2025
Feb 11, 2025
Feb 11, 2025
Feb 11, 2025
Feb 11, 2025
Feb 11, 2025
Feb 11, 2025

Repository files navigation

Visual Prompt Engineering for Multimodal and Irregularly Sampled Medical Data

Malte Tölle, Mohamad Scharaf, Samantha Fischer, Christoph Reich, Silav Zeid, Christoph Dieterich, Benjamin Meder, Norbert Frey, Philipp Wild, Sandy Engelhardt

Paper link: https://arxiv.org/abs/2501.18237

Abstract

A multitude of examinations are conducted to assess a patient's health, with each modality contributing unique information that collectively creates comprehensive understanding. These assessments include temporal data with varying sampling rates as well as single value measurements, interventions like medications, or imaging modalities. While physicians are able to process different information easily, neural networks need specific modeling for each modality complicating the training procedure. We demonstrate that this complexity can be significantly reduced by visualizing all information as images along with unstructured text and subsequently training a conventional vision-text transformer. Our approach, Vision Transformer for irregular sampled Multi-modal Measurements (ViTiMM), not only simplifies data preprocessing and modeling but also outperforms current state-of-the-art methods in predicting in-hospital mortality and phenotyping, as evaluated on 6,175 patients from the MIMIC-IV dataset. The modalities include patient's clinical measurements, medications, X-ray images, and electrocardiography scans. % characteristics, conditions, and We hope our work inspires advancements in multi-modal medical AI by reducing the training complexity to (visual) prompt engineering, thus lowering entry barriers and enabling no-code solutions for training.

Method

During a hospital stay, a patient typically undergoes multiple examinations, each offering distinct insights into their health status. While physicians have learned to intuitively extract the different information and assemble them to an overall picture, neural networks need specific modeling of the different modalities and their interactions. Nevertheless, once these challenges are addressed, multi-modal models have demonstrated promising performance. However, a significant challenge persists: How to integrate multi-modal data that is captured at irregularly sampled time intervals?

Description

Our primary contribution is a substantial reduction of the modeling complexity for multiple irregularly sampled modalities by transforming each modality into an image representation. Humans are then tasked with visualizing the different modalities in an informative manner, effectively engaging in a form of "visual prompt engineering". For example, laboratory measurements can be represented as line graphs over time to convey trends and patterns (Li et al., 2023). Our approach, Vision Transformer for irregular sampled Multi-modal Measurements (ViTiMM), unifies the data processing pipeline, significantly reducing modeling complexity. This approach not only mimics the way humans interpret diverse data streams but also demonstrates significant improvements across a range of tasks.

Description

Results

Results of ViTiMM for the task In-hospital Mortality and Phenotyping compared to MeTra and MedFuse.
We compare the three methods for uni-modal training for clinical measurements (C) and X-ray (X) as well as a combination of the two similar to their original publication.
Extension of both methods to further modalities requires explicit modeling, which must not be done in ViTiMM.
Thus, by only plotting the other modalities, our method can straightforwardly expand to arbitrary modalities.
The results per phenotype can be found in Supplementary Table.
The corresponding significance tests (pairwise t-test) can be found in Supplementary Tables.

Modalities:

  • C: Clinical measurements
  • X: CXR images
  • M: Medications
  • E: Electrocardiography
Method Modalities In-hospital Mortality Phenotyping
AUROC AUPRC Bal. Acc. AUROC AUPRC Bal. Acc.
MeTra C 0.7910.4410.609 0.6910.4000.574
X 0.8100.4710.544 0.6670.3870.564
C|X 0.8590.5950.707 0.7120.4310.583
MedFuse C 0.8120.4480.571 0.7050.4170.569
X 0.6620.2640.500 0.6400.3490.538
C|X 0.8050.4310.631 0.7330.4480.600
ViTiMM (Ours) C 0.8370.5120.743 0.7660.5060.618
X 0.8260.4940.758 0.7300.4600.589
M 0.7410.3460.680 0.7100.4300.577
E 0.7040.2970.636 0.6810.4270.573
C|X 0.8750.6150.776 0.7780.5300.636
C|M|X|E 0.9220.7640.847 0.7840.5490.659

Usage

After downloading the MIMIC datasets all plots can be created with the plot_[labs,ecgs,meds].ipynb files.

MIMIC-IV: https://physionet.org/content/mimiciv/3.1/

MIMIC-CXR: https://physionet.org/content/mimic-cxr-jpg/2.1.0/

MIMIC-IC-ECG: https://physionet.org/content/mimic-iv-ecg/1.0/

Place the runs folder in this directory, the data directory can have an arbitrary location.

Training can be performed with:

python main.py \
    --task [inhospital_mortality,phenotyping] \
    --model [swin,vit] \
    --modalities lab med cxr ecg \
    [--with_text] \
    --root PATH_TO_DATA \
    --n_epochs 3 \
    --weight_decay 3e-8 \
    --lrs 1e-5 5e-6 1e-6 \
    --batch_size 4 \
    [--ckpt PATH_TO_CKPT] \
    --seed 0

BibTeX

@misc{toelle2025vitimm,
    title={Arbitrary Data as Images: Fusion of Patient Data Across Modalities and Irregular Intervals with Vision Transformers},
    author={T{\"o}lle, Malte and Scharaf, Mohamad and Fischer, Samantha and Reich, Christoph and Zeid, Silav and Dieterich, Christoph and Meder, Benjamin and Frey, Norbert and Wild, Philipp and Engelhardt, Sandy},
    year={2025},
    doi={10.48550/arXiv.2501.18237}
}

Contact

Malte Tölle
malte.toelle@med.uni-heidelberg.de
@maltetoelle

Group Artificial Intelligence in Cardiovascular Medicine (AICM)
Heidelberg University Hospital
Im Neuenheimer Feld 410, 69120 Heidelberg, Germany

About

Vision Transformer for irregular sampled Multi-modal Measurements (ViTiMM)

Resources

Stars

Watchers

Forks