Skip to content

baeseongsu/awesome-machine-learning-for-healthcare

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

28 Commits
Β 
Β 
Β 
Β 

Repository files navigation

awesome-machine-learning-for-healthcare

Welcome to my personal repository, a curated collection of cutting-edge research at the intersection of machine learning and healthcare. As an AI researcher with a strong interest in healthcare applications, I've compiled this repository to showcase innovative works mostly in natural language processing (NLP) and multimodal learning within the healthcare domain. While this collection reflects my personal research focus, it aims to serve as a valuable resource for anyone passionate about leveraging machine learning for healthcare. I welcome contributions and discussions, so feel free to share ideas or suggest papers!

Table of Contents

Large Language Models

  • (2023/11) Meditron-70b: Scaling medical pretraining for large language models [paper]
  • (2024/04) Hippocrates: An Open-Source Framework for Advancing Large Language Models in Healthcare [paper]
  • (2024/04) Small Language Models Learn Enhanced Reasoning Skills from Medical Textbooks [paper]
  • (2024/01) Health-LLM: Large language models for health prediction via wearable sensor data [paper]
  • (2022/03) MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering [paper]
  • (2023/07) Med-HALT: Medical Domain Hallucination Test for Large Language Models [paper]
  • (2024/01) K-QA: A Real-World Medical Q&A Benchmark [paper]
  • (2024/05) MedReadMe: A Systematic Study for Fine-grained Sentence Readability in Medical Domain [paper]

Medical Agent

  • (2023/11) MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning [paper]
  • (2024/02) Ai hospital: Interactive evaluation and collaboration of llms as intern doctors for clinical diagnosis [paper]
  • (2024/02) AgentMD: Empowering Language Agents for Risk Prediction with Large-Scale Clinical Tool Learning [paper]
  • (2024/04) Adaptive Collaboration Strategy for LLMs in Medical Decision Making [paper]
  • (2024/05) Agent hospital: A simulacrum of hospital with evolvable medical agents [paper]
  • (2024/05) AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments [paper]
  • (2024/05) DrHouse: An LLM-empowered Diagnostic Reasoning System through Harnessing Outcomes from Sensor Data and Expert Knowledge [paper]
  • (2024/06) ClinicalLab: Aligning Agents for Multi-Departmental Clinical Diagnostics in the Real World [paper]
  • (2024/07) MMedAgent: Learning to Use Medical Tools with Multi-modal Agent [paper]
  • (2024/08) MEDCO: Medical Education Copilots Based on A Multi-Agent Framework [paper]
  • (2024/08) Interactive Agents: Simulating Counselor-Client Psychological Counseling via Role-Playing LLM-to-LLM Interactions [paper]

Synthetic Data Generation

  • (2017/03) Generating Multi-label Discrete Patient Records using Generative Adversarial Networks [paper]
  • (2010/10) Data-driven approach for creating synthetic electronic medical records [paper]
  • (2023/03) EHRDiff: Exploring Realistic EHR Synthesis with Diffusion Models [paper]
  • (2023/04) Synthesize High-dimensional Longitudinal Electronic Health Records via Hierarchical Autoregressive Language Model [paper]
  • (2023/08) EHR-Safe: generating high-fidelity and privacy-preserving synthetic electronic health records [paper]
  • LLMSYN: Generating Synthetic Electronic Health Records Without Patient-Level Data [paper]

Data Representation and Predictive Modeling

  • (2022/07) GenHPF: General Healthcare Predictive Framework with Multi-task Multi-source Learning [paper]
  • (2024/02) REALM: RAG-Driven Enhancement of Multimodal Electronic Health Records Analysis via Large Language Models [paper]
  • (2024/06) EMERGE: Integrating RAG for Improved Multimodal EHR Predictive Modeling [paper]
  • (2024/07) EHRmonize: A Framework for Medical Concept Abstraction from Electronic Health Records using Large Language Models [paper]

Multimodal Representation Learning

  • (2022/07) MedFuse: Multi-modal fusion with clinical time-series data and chest X-ray images [paper]
  • (2023/05) Learning Missing Modal Electronic Health Records with Unified Multi-modal Data Embedding and Modality-Aware Attention [paper]
  • (2024/06) From Basic to Extra Features: Hypergraph Transformer Pretrain-then-Finetuning for Balanced Clinical Predictions on EHR [paper]
  • (2024/06) FlexCare: Leveraging Cross-Task Synergy for Flexible Multimodal Healthcare Prediction [paper]
  • (2024/07) MEDFuse: Multimodal EHR Data Fusion with Masked Lab-Test Modeling and Large Language Models [paper]
  • Multimodal Patient Representation Learning with Missing Modalities and Labels [paper]

Toward a Natural Language Interface for EHRs

  • (2015) Toward a Natural Language Interface for EHR Questions (AMIA 2015) [paper]
  • (2016) Annotating Logical Forms for EHR Questions (LREC 2016) [paper]
  • (2017) A Semantic Parsing Method for Mapping Clinical Questions to Logical Forms (AMIA 2017) [paper]
  • (2018/09) emrQA: A Large Corpus for Question Answering on Electronic Medical Records (EMNLP 2018) [arxiv]
  • (2019/08) Text-to-SQL Generation for Question Answering on Electronic Medical Records (WWW 2020) [arxiv]
  • (2019) Using FHIR to Construct a Corpus of Clinical Questions Annotated with Logical Forms and Answers (AMIA 2019) [paper]
  • (2020) Dataset and Enhanced Model for Eligibility Criteria-to-SQL Semantic Parsing (LREC 2020) [paper]
  • (2020) Paraphrasing to Improve the Performance of Electronic Health Records Question Answering (AMIA 2020) [paper]
  • (2020/10) Knowledge Graph-based Question Answering with Electronic Health Records (ML4H 2020) [arxiv]
  • (2021) emrKBQA: A Clinical Knowledge-Base Question Answering Dataset (ACL 2021 BioNLP Workshop) [paper]
  • (2021/11) Question Answering for Complex Electronic Health Records Database using Unified Encoder-Decoder Architecture (ML4H 2021) [arxiv]
  • (2022/03) Uncertainty-Aware Text-to-Program for Question Answering on Structured Electronic Health Records (CHIL 2022) [arxiv]
  • (2022/05) DrugEHRQA: A Question Answering Dataset on Structured and Unstructured Electronic Health Records For Medicine Related Queries (LREC 2022) [arxiv]
  • (2022/06) Learning to Ask Like a Physician (ACL 2022 Clinical NLP Workshop) [arxiv]
  • (2022) RadQA: A Question Answering Dataset to Improve Comprehension of Radiology Reports (LREC 2022) [paper]
  • (2023/01) EHRSQL: A Practical Text-to-SQL Benchmark for Electronic Health Records (NeurIPS 2022) [arxiv] [code]
  • (2023/04) LeafAI: query generator for clinical cohort discovery rivaling a human programmer (JAMIA 2023) [arxiv]
  • (2023) Toward a Neural Semantic Parsing System for EHR Question Answering (AMIA 2023) [paper]
  • (2023) quEHRy: a question answering system to query electronic health records (JAMIA 2023) [paper]
  • (2023/06) ECG-QA: A Comprehensive Question Answering Dataset Combined With Electrocardiogram (NeurIPS 2023) [arxiv]
  • (2023/08) MedAlign: A Clinician-Generated Dataset for Instruction Following with Electronic Medical Records (AAAI 2024) [arxiv]
  • (2023/09) Adapted Large Language Models Can Outperform Medical Experts in Clinical Text Summarization (Nature Medicine 2024) [arxiv]
  • (2023/10) Question Answering for Electronic Health Records: A Scoping Review of Datasets and Models (JMIR 2024) [arxiv]
  • (2023/10) EHRXQA: A Multi-Modal Question Answering Dataset for Electronic Health Records with Chest X-ray Images (NeurIPS 2023) [arxiv] [code] [physionet]
  • (2024/01) EHRAgent: Code Empowers Large Language Models for Few-shot Complex Tabular Reasoning on Electronic Health Records (EMNLP 2024) [arxiv] [code]
  • (2024/02) EHRNoteQA: A Patient-Specific Question Answering Benchmark for Evaluating Large Language Models in Clinical Settings (NeurIPS 2024) [arxiv] [code] [physionet]
  • (2024/03) A Benchmark of Domain-Adapted Large Language Models for Generating Brief Hospital Course Summaries (Preprint) [arxiv]

Fact Checking

  • (2020/10) Explainable Automated Fact-Checking for Public Health Claims (EMNLP 2020) [paper] [code]
  • (2021) Evidence-based Fact-Checking of Health-related Claims (Findings of EMNLP 2021) [paper] [code]
  • (2024) HealthFC: Verifying Health Claims with Evidence-Based Medical Fact-Checking (LREC-COLING 2024) [paper] [code]
  • (2024) DOSSIER: Fact Checking in Electronic Health Records while Preserving Patient Privacy (MLHC 2024) [paper] [code]
  • (2024/06) EHRCon: Dataset for Checking Consistency between Unstructured Notes and Structured Tables in Electronic Health Records (NeurIPS 2024) [arxiv] [code] [physionet]
  • (2024/11) FactCheXcker: Mitigating Measurement Hallucinations in Chest X-ray Report Generation Models (CVPR 2025) [arxiv] [code]
  • (2025/01) VeriFact: Verifying Facts in LLM-Generated Clinical Text with Electronic Health Records (Preprint) [arxiv] [code] [physionet]

Medical Imaging

Medical Imaging Datasets

  • (2019/12) MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports [paper]
  • (2019/01) MIMIC-CXR-JPG, a large publicly available database of labeled chest radiographs [paper]
  • (2023/10) Towards long-tailed, multi-label disease classification from chest X-ray: Overview of the CXR-LT challenge [paper]
  • (2024/03) A foundation model utilizing chest CT volumes and radiology reports for supervised-level zero-shot detection of abnormalities [paper]
  • (2024/04) RadGenome-Chest CT: A Grounded Vision-Language Dataset for Chest CT Analysis [paper]
  • (2024/06) Shadow and Light: Digitally Reconstructed Radiographs for Disease Classification [paper]
  • (2024/08) MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine [paper]

Radiology Report Generation

  • (2024/01) CheXagent: Towards a Foundation Model for Chest X-Ray Interpretation [paper]
  • (2024/05) Dr-LLaVA: Visual Instruction Tuning with Symbolic Clinical Grounding [paper]
  • (2020/04) CheXbert: Combining Automatic Labelers and Expert Annotations for Accurate Radiology Report Labeling Using BERT [paper]
  • (2021/06) RadGraph: Extracting Clinical Entities and Relations from Radiology Reports [paper]
  • (2023/08) Radgraph2: Modeling disease progression in radiology reports via hierarchical information extraction [paper]
  • (2023/09) Evaluating progress in automatic chest x-ray radiology report generation [paper]
  • (2023/11) Radiology-Aware Model-Based Evaluation Metric for Report Generation [paper]
  • (2024/03) Evaluating GPT-V4 (GPT-4 with Vision) on Detection of Radiologic Findings on Chest Radiographs [paper]
  • (2024/04) LLM-RadJudge: Achieving Radiologist-Level Evaluation for X-Ray Report Generation [paper]
  • (2024/05) GREEN: Generative Radiology Report Evaluation and Error Notation [paper]

Multimodal Large Language Models (MLLMs)

  • (2024/09) MediConfusion: Can You Trust Your AI Radiologist? Probing the Reliability of Multimodal Medical Foundation Models (ICLR 2025) [arxiv]
  • (2024/10) MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models (ICLR 2025) [arxiv]
  • (2025/04) GMAI-VL-R1: Harnessing Reinforcement Learning for Multimodal Medical Reasoning (Preprint) [arxiv]
  • (2025/04) Systematic Evaluation of Large Vision-Language Models for Surgical Artificial Intelligence (Preprint) [arxiv]

About

A curated collection of cutting-edge research at the intersection of machine learning and healthcare. This repository will be actively maintained until at least 202X (When is my graduation expected to be πŸ˜₯?), so feel free to explore and enjoy!

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors