What about an ECG foundation model?
Cardiovascular diseases are the leading cause of death worldwide, accounting for an estimated 17.9 million deaths annually, which is about 32% of all global deaths. Electrocardiograms (ECGs) play a crucial role in diagnosing these conditions, with over 300 million ECGs performed each year globally.
Despite the widespread use of ECGs, there's a lack of publicly available general-purpose models that can effectively interpret ECG data across diverse populations and conditions. Our work presents D-BETA, an approach that learns general knowledge from both ECG signals and their relevant textual reports simultaneously without needing exact manual labels during pre-training. D-BETA not only captures subtle details in each type of data but also learns how they connect, helping it make a better foundation model with more accurate decisions in downstream tasks.
Across comprehensive evaluation, D-BETA consistently outperforms strong baselines on multiple cardiac conditions, offering a scalable, self-supervised path toward accurate, label-efficient heart health AI worldwide.
This repository shows how to perform inference with the model and includes a quick example in a zero-shot setting on the CODE-15 test dataset. The structure is as follows:
.
├── configs
│ ├── config.json
├── data
│ ├── pretrain
│ ├── downstream
│ │ ├── code-test
│ │ │ └── data
│ │ ├── annotations
│ │ ├── ecg_tracings.hdf5
├── models
│ ├── modules
│ └── dbeta.py
└── infer.ipynb
└── README.md
from transformers import AutoModel
import torch
model = AutoModel.from_pretrained("Manhph2211/D-BETA", trust_remote_code=True)
model.eval()
ecgs = torch.randn(2, 12, 5000) # [batch, leads, length]
with torch.no_grad():
output = model(ecgs)
ecg_features = output.pooler_output
print(ecg_features.shape) # (2, 768)Clone the project and prepare the environment:
git clone https://github.com/manhph2211/D-BETA.git && cd D-BETA
conda create -n dbeta python=3.9
conda activate dbeta
pip install -r requirements.txtCheckout our example.ipynb notebook for a quick example of using the model for zero-shot classification on the CODE-15 test dataset. You can also use the encoder directly:
import torch
from models.processor import get_model, get_ecg_feats
model = get_model(config_path='configs/config.json', checkpoint_path='checkpoints/pytorch_model.bin') # or sample.pt
ecgs = torch.randn(2, 12, 5000) # [batch, leads, length]
ecg_features = get_ecg_feats(model, ecgs)
print(ecg_features.shape) # (2, 768)This research was supported by the Google South Asia & Southeast Asia research award.
We are also thankful for the valuable work provided by this nice repo and repo.
If you find this work useful 😄, please consider citing our paper:
@inproceedings{
hung2025boosting,
title={Boosting Masked {ECG}-Text Auto-Encoders as Discriminative Learners},
author={Manh Pham Hung and Aaqib Saeed and Dong Ma},
booktitle={Forty-second International Conference on Machine Learning},
year={2025},
}