Skip to content
/ Libra Public

Code for the paper "Libra: Leveraging Temporal Images for Biomedical Radiology Analysis"

License

Notifications You must be signed in to change notification settings

X-iZhang/Libra

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

86 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Libra: Leveraging Temporal Images for Biomedical Radiology Analysis

Demo hf_space Project Page arXiv License

This repository hosts Libra, a tool designed to generate radiology reports by leveraging temporal information from chest X-rays taken at different time points.

📢 More Than Radiology: Codespace Features for MLLMs Workflow You’ll Love! 🎉

  • LLaVA-Type & LLaMA_3 & Mistral Support: Deploy and train advanced models effortlessly.
  • Resume Training: Resume training from checkpoints at any stage, whether for pre-training or fine-tuning.
  • Validation Dataset: Track model performance in real-time on validation datasets during training.
  • Custom Metrics: Go beyond eval_loss with metrics like BLEU, ROUGE-L, RadGraph-F1 or define your own criteria on valid dataset.
  • Smart Saving: Automatically save the best model based on validation loss or custom evaluation scores.

🔥 News

  • [24 Mar 2025] 🏆 Libra was invited to the ReXrank Challenge — a leading leaderboard for Chest X-ray Report Generation.
  • [11 Feb 2025] 🚨 libra-Llama-3.2-3B-Instruct has been released! A small MLLM 👏.
  • [10 Feb 2025] 🔥 The Libra repo now supports Mistral, Phi-3, and Gemma as LLMs, along with SigLip as the encoder! 🚀
  • [19 Jan 2025] ⚡ The online demo is available at Hugging Face Demo. Welcome to try it out!
  • [07 Jan 2025] 🗂️ The processed data is available at Data Download.
  • [20 Dec 2024] 🚨 Libra-v1.0-7b has been released!

Overview

Radiology report generation (RRG) requires integrating temporal medical images and creating accurate reports. Traditional methods often overlook crucial temporal information. We introduce Libra, a temporal-aware multimodal large language model (MLLM) for chest X-ray (CXR) report generation. Libra combines a radiology-specific image encoder with an MLLM and uses a Temporal Alignment Connector to capture and synthesize temporal information. Experiments show that Libra sets new performance benchmarks on the MIMIC-CXR dataset for the RRG task.

Libra’s Architecture

architecture

Contents

Install

We strongly recommend that you create an environment from scratch as follows:

  1. Clone this repository and navigate to Libra folder
git clone https://github.com/X-iZhang/Libra.git
cd Libra
  1. Install Package
conda create -n libra python=3.10 -y
conda activate libra
pip install --upgrade pip  # enable PEP 660 support
pip install -e .
  1. Install additional packages for Training and Evaluation cases
pip install -e ".[train,eval]"
pip install flash-attn --no-build-isolation
Upgrade to latest code base
git pull
pip install -e .

Libra Weights

Version Base LLM Vision Encoder Checkpoint
Libra v1.0 Meditron-7B RAD-DINO X-iZhang/libra-v1.0-7b
Libra v1.0 Llama-3.2-3B-Instruct RAD-DINO X-iZhang/libra-Llama-3.2-3B-Instruct


Libra-v1.0-7b achieves SoTA performance.

Quick Start

Gradio Web UI

Launch a local or online web demo by running:

python -m libra.serve.app
Specify your model:
python -m libra.serve.app --model-path /path/to/your/model

You just launched the Gradio web interface. Now, you can open the web interface with the URL printed on the screen. You will notice that both the default libra-v1.0 model and your model are available in the model list, and you can choose to switch between them.

demo

CLI Inference

We support running inference using the CLI. To use our model, run:

python -m libra.serve.cli \
    --model-path X-iZhang/libra-v1.0-7b \
    --image-file "./path/to/current_image.jpg" "./path/to/previous_image.jpg"
    # If there is no previous image, only one path is needed.

Script Inference

You can use the libra_eval function in libra/eval/run_libra.py to easily launch a model trained by yourself or us on local machine or in Google Colab, after installing this repository.

from libra.eval import libra_eval

# Define the model path, which can be a pre-trained model or your own fine-tuned model.
model_path = "X-iZhang/libra-v1.0-7b"  # Or your own model

# Define the paths to the images. The second image is optional for temporal comparisons.
image_files = [
    "./path/to/current/image.jpg", 
    "./path/to/previous/image.jpg"  # Optional: Only include if a reference image is available
]

# Define the prompt to guide the model's response. Add clinical instructions if needed.
prompt = (
    "Provide a detailed description of the findings in the radiology image. "
    "Following clinical context: ..."
)

# Specify the conversational mode, matching the PROMPT_VERSION used during training.
conv_mode = "libra_v1"

# Call the libra_eval function.
libra_eval(
    model_path=model_path,
    image_file=image_files,
    query=prompt,
    temperature=0.9,
    top_p=0.8,
    conv_mode=conv_mode,
    max_new_tokens=512
)
Meanwhile, you can use the Beam Search method to obtain output.
libra_eval(
    model_path=model_path,
    image_file=image_files,
    query=prompt,
    num_beams=5, 
    length_penalty=2,
    num_return_sequences=2,
    conv_mode=conv_mode,
    max_new_tokens=512
)
Additionally, you can directly use LoRA weights for inference.
libra_eval(
    model_path="./path/to/lora_weights",  # path to LoRA weights
    model_base="./path/to/base_model",  # path to base Libra model
    image_file=image_files,
    query=prompt,
    num_beams=5, 
    length_penalty=2,
    num_return_sequences=2,
    conv_mode=conv_mode,
    max_new_tokens=512
)

Dataset

Prepare Data

All the data we use comes from MIMIC-CXR and its two variants, and we strictly follow the official split for train/valid/test division.

  • Image Data

All images used for Libra come from the MIMIC-CXR-JPG dataset in .jpg format. DICOM format is also supported and can be found in the MIMIC-CXR.

After downloading the images, they will be automatically organized into the following structure in ./path/to/playground/data:

./data/physionet.org/files/mimic-cxr-jpg/2.0.0
└──files
    ├── p10
    │   └── p10000032
    │       └── s50414267
    │           ├── image1.jpg
    │           └── image2.jpg
    ├── p11
    ├── p12
    ├── ...
    └── p19
  • Annotation Data

All annotations used for Libra come from the MIMIC-CXR and its two variants. This includes Radiology Reports and other relevant Visual Question Answering.

Please download the following datasets from the official website: mimic-cxr-reports.zip from MIMIC-CXR, MIMIC-Diff-VQA, and MIMIC-Ext-MIMIC-CXR-VQA.

Preprocess Data

  • Radiology Report Sections

For free-text radiology report, we extract the Findings, Impression, Indication, History, Comparison, and Technique sections using the official mimic-cxr repository.

  • Visual Question Answering for Chest X-ray

In Medical-Diff-VQA, the main image is used as the current image, and the reference image is used as the prior image. In MIMIC-Ext-MIMIC-CXR-VQA, all cases use a dummy prior image.

Data Download

Alignment data files Split Size
libra_alignment_train.json train 780 MiB
libra_alignment_valid.json valid 79 MiB
Fine-Tuning data files Split Size
libra_findings_section_train.json train 159 MiB
libra_findings_section_valid.json valid 79 MiB
Evaluation data files Split Size
libra_findings_section_eval.jsonl eval 2 MiB
Meanwhile, here are some bonus evaluation data files.
Evaluation data files Split Size
libra_impressions_section_eval.jsonl eval 1 MiB
libra_MIMIC-Ext-MIMIC-CXR-VQA_eval.jsonl eval 4 MiB
libra_MIMIC-Diff-VQA _eval.jsonl eval 20 MiB

If you want to train or evaluate your own tasks or datasets, please refer to Custom_Data.md.

Train

Libra adopt a two-stage training strategy: (1) visual feature alignment: the visual encoder and LLM weights are frozen, and the Temporal Alignment Connector is trained; (2) RRG downstream task fine-tuning: apply LoRA to fine-tune the pre-trained LLM on the Findings section generation task.

Libra is trained on 1 A6000 GPU with 48GB memory. To train on multiple GPUs, you can set the per_device_train_batch_size and the gradient_accumulation_steps accordingly. Always keep the global batch size the same: per_device_train_batch_size x gradient_accumulation_steps x num_gpus.

Hyperparameters

We set reasonable hyperparameters based on our device. The hyperparameters used in both pretraining and LoRA finetuning are provided below.

  1. Pretraining
Hyperparameter Global Batch Size Learning rate Epochs Max length Weight decay
Libra-v1.0-7b 16 2e-5 1 2048 0
  1. LoRA finetuning
Hyperparameter Global Batch Size Learning rate Epochs Max length Weight decay LoRA rank LoRA alpha
Libra-v1.0-7b 16 2e-5 3 2048 0 128 256

Download Meditron checkpoints (automatically)

Our base LLM model, Meditron-7B, adapted to the medical domain from the Llama-2-7B model, will be downloaded automatically when you run our provided training scripts. No action is needed on your part.

Stage 1: visual feature alignment

Pretraining takes approximately 385 hours for Libra-v1.0-7b-pretrain on a single A6000 GPU (48GB) due to device limitations.

For detailed training scripts and guidelines, please refer to the following: pretrain.sh and pretrain_xformers.sh for memory-efficient attention implemented in xFormers.

  • --mm_projector_type TAC: the Temporal Alignment Connector.
  • --vision_tower microsoft/rad-dino: RAD-DINO is a vision transformer for encoding chest X-rays using DINOv2.
  • --mm_vision_select_layer all: Use all image features from the encoder for the Layerwise Feature Extractor.
  • --tune_mm_mlp_adapter True
  • --freeze_mm_mlp_adapter False

Stage 2: RRG downstream task fine-tuning

You may download our pretrained projectors from the mm_tac_projector.bin file. It takes around 213 hours for Libra-v1.0-7b on a single A6000 GPU (48GB) due to device limitations.

For detailed training scripts and guidelines, please refer to: finetune_lora.sh.

  • --tune_mm_mlp_adapter False
  • --freeze_mm_mlp_adapter True

If you have enough GPU memory: Use finetune.sh to fine-tune the entire model. Alternatively, you can replace zero3.json with zero3_offload.json to offload some parameters to CPU RAM, though this will slow down the training speed.

If you are interested in continue finetuning Libra model to your own task/data, please check out Custom_Data.md.

New Options to Note

  • --mm_projector_type TAC: Specifies the Temporal Alignment Connector for Libra.
  • --vision_tower microsoft/rad-dino: Uses RAD-DINO as the chest X-rays encoder.
  • --mm_vision_select_layer all: Selects specific vision layers (e.g., -1, -2) or "all" for all layers.
  • --validation_data_path ./path/: Path to the validation data.
  • --compute_metrics True: Optionally computes metrics during validation. Note that this can consume significant memory. If GPU memory is insufficient, it is recommended to either disable this option or use a smaller validation dataset.

Evaluation

In Libra-v1.0, we evaluate models on the MIMIC-CXR test split for the findings section generation task. You can download the evaluation data here. To ensure reproducibility and output quality, we evaluate our model using the beam search strategy.

1. Generate Libra responses.

python -m libra.eval.eval_vqa_libra \
    --model-path X-iZhang/libra-v1.0-7b \
    --question-file libra_findings_section_eval.jsonl \
    --image-folder ./physionet.org/files/mimic-cxr-jpg/2.0.0 \
    --answers-file /path/to/answer-file.jsonl \
    --num_beams 10 \
    --length_penalty 2 \
    --num_return_sequences 3 \
    --max_new_tokens 1024 \
    --conv-mode libra_v1

You can evaluate Libra on your custom datasets by converting your dataset to the JSONL format and evaluating using eval_vqa_libra.py.

Additionally, you can execute the evaluation using the command line. For detailed instructions, see libra_eval.sh.

bash ./scripts/eval/libra_eval.sh beam

2. Evaluate the generated report.

In our case, you can directly use libra_findings_section_eval.jsonl and answer-file.jsonl for basic evaluation, using radiology_report.py.

from libra.eval import evaluate_report

references = "libra_findings_section_eval.jsonl"
predictions = "answer-file.jsonl"

resul = evaluate_report(references=references, predictions=predictions)

# Evaluation scores
resul
{'BLEU1': 51.25,
 'BLEU2': 37.48,
 'BLEU3': 29.56,
 'BLEU4': 24.54,
 'METEOR': 48.90,
 'ROUGE-L': 36.66,
 'Bert_score': 62.50,
 'Temporal_entity_score': 35.34}

Or use the command line to evaluate multiple references and store the results in a .csv file. For detailed instructions, see get_eval_scores.sh.

bash ./scripts/eval/get_eval_scores.sh

Metrics

  • Temporal Entity F1

The $F1_{temp}$ score includes common radiology-related keywords associated with temporal changes. You can use temporal_f1.py as follows:

from libra.eval import temporal_f1_score

predictions = [
    "The pleural effusion has progressively worsened since previous scan.",
    "The pleural effusion is noted again on the current scan."
]
references = [
    "Compare with prior scan, pleural effusion has worsened.",
    "Pleural effusion has worsened."
]

tem_f1_score = temporal_f1_score(
    predictions=predictions,
    references=references
)

# Temporal Entity F1 score
tem_f1_score
{'f1': 0.500000000075,
 'prediction_entities': [{'worsened'}, set()],
 'reference_entities': [{'worsened'}, {'worsened'}]}
  • Radiology-specific Metrics

Some specific metrics may require configurations that could conflict with Libra. It is recommended to follow the official guidelines and use separate environments for evaluation: RG_ER, CheXpert-F1, RadGraph-F1, RadCliQ, CheXbert vector.

Acknowledgements 🙏

We sincerely thank the following projects for their contributions to Libra:

  • LLaVA: A Large Language and Vision Assistant, laying the groundwork for multimodal understanding.
  • FastChat: An Open Platform for Training, Serving, and Evaluating Large Language Model based Chatbots.
  • LLaMA: Open and efficient foundation language models that inspired our core language processing capabilities.
  • MEDITRON: Open and efficient medical Large language models.
  • RAD-DINO: An open and efficient biomedical image encoder, enabling robust radiological analysis.

Citation ✒️

If you find our paper and code useful in your research and applications, please cite using this BibTeX:

@misc{zhang2024libraleveragingtemporalimages,
      title={Libra: Leveraging Temporal Images for Biomedical Radiology Analysis}, 
      author={Xi Zhang and Zaiqiao Meng and Jake Lever and Edmond S. L. Ho},
      year={2024},
      eprint={2411.19378},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2411.19378}, 
}

Intended Use 🧰

Libra is primarily designed to assist clinical practitioners, researchers, and medical students in generating chest X-ray reports. Key applications include:

  • Clinical Decision Support: Providing draft findings that can be refined by a radiologist.
  • Educational Tool: Demonstrating example interpretations and temporal changes for training radiology residents.
  • Research: Facilitating studies on automated report generation and temporal feature learning in medical imaging.

Important: Outputs should be reviewed by qualified radiologists or medical professionals before final clinical decisions are made.

Limitations and Recommendations
  1. Data Bias: The model’s performance may be less reliable for underrepresented demographics or rare pathologies.
  2. Clinical Oversight: Always involve a medical professional to verify the results—Libra is not a substitute for professional judgment.
  3. Temporal Inaccuracies: Despite TAC’s focus on temporal alignment, subtle or uncommon changes may go unrecognized.
  4. Generalization: Libra’s performance on chest X-ray types or conditions not seen during training may be limited.
Ethical Considerations
  • Patient Privacy: Ensure the data is fully de-identified and compliant with HIPAA/GDPR (or relevant privacy regulations).
  • Responsible Use: Deploy Libra’s outputs carefully; they are not guaranteed to be error-free.
  • Accountability: Users and organizations must assume responsibility for verifying clinical accuracy and safety.
Disclaimer

This tool is for research and educational purposes only. It is not FDA-approved or CE-marked for clinical use. Users should consult qualified healthcare professionals for any clinical decisions.