| Section | Description |
|---|---|
| 📂 IU-Xray Dataset | Evaluation, Perturbation Experiments, Visualization, and Results Compilation |
First, generate radiology reports using different RRG models.
- MAIRA2
- Chexpertplus trained on Chexpertplus + MIMIC data
- Chexpertplus trained on MIMIC data
📌 First, clone the repository, install the conda environment, and navigate into the repo:
git clone https://github.com/nyuolab/RRGEval.git
cd RRGEval
conda env create -f environment.yml📌 Next, update the .env file with your API credentials before running any scripts:
# In the .env file, set the following:
RRGEVAL_API_KEY="your_api_key_here"
RRGEVAL_API_URL="your_api_url_here"📌 Evaluate reports generated from different RRG models:
sbatch scripts/iuxray_data/maira2.shsbatch scripts/iuxray_data/chexpertplus_mimic.shsbatch scripts/iuxray_data/chexpertplus_chexpertplus_mimic.sh📌 Modify the following variables in each script as needed:
EVAL_SEEDMODEL_SEEDINPUT_CSV(Path to output file containing generated reports from the RRG model)OUTPUT_DIR(Path to store results)
📌 Results Structure
The results are stored in ${OUTPUT_DIR}/shuffled_ans_choices_data/. Within this directory:
gen_reports_as_ref/andgt_reports_as_ref/: Contain all ICARE_GEN and ICARE_GT evaluation results. Each of these directories includes amcqa_eval/subdirectory with the complete set of evaluation scores.mcq_eval_dataset_level_agreement_stats.csv: Contains dataset-level agreement scores.mcq_eval_report_level_stats.csv: Contains agreement scores for individual reports.mcq_eval_report_level_stats_aggregated.csv: Provides aggregated report-level results across the dataset.
Question Categorization and Analysis: follow the steps in the readme here src/question_categorization_and_analysis/
Evaluate our approach on reports generated from different RRG models:
sbatch scripts/iuxray_data/maira2_perturbed_word_level.shsbatch scripts/iuxray_data/chexpertplus_mimic_perturbed_word_level.shsbatch scripts/iuxray_data/chexpertplus_chexpertplus_mimic_perturbed_word_level.shEvaluate our approach on reports generated from different RRG models:
sbatch scripts/iuxray_data/maira2_perturbed.shsbatch scripts/iuxray_data/chexpertplus_mimic_perturbed.shsbatch scripts/iuxray_data/chexpertplus_chexpertplus_mimic_perturbed.sh📌 Modify the following variables in each script as needed:
EVAL_SEEDMODEL_SEEDINPUT_CSV(Path to output from RRG model)OUTPUT_DIR(Path to store results)
To generate plots showing agreement percentage as a function of perturbation intensity:
sbatch scripts/iuxray_data/plot_agreement_with_perturbation_stats.sh📂 Results will be stored in:
INPUT_DIR/plots/perturbation_char_levelINPUT_DIR/plots/perturbation_word_level
Run the following notebook to compile all results:
jupyter notebook src/results_compilation.ipynb