At the time of writing, pydicom requires the package gdcm to read and decompress the provided DICOM files. The easiest way to install this is directly through conda.
conda install conda-forge::gdcm
See https://github.com/conda-forge/gdcm-feedstock for repo documentation
Activate the conda environment and install dependencies:
conda activate screws_env
pip install -r requirements.txtExplores DICOM files and extracts metadata in human-readable JSON format. Useful for understanding DICOM structure and verifying metadata fields.
python scripts/extract_DICOM_data.py \
--patient-dir /path/to/DICOM/study_type/patient_id \
--depth 2Matches DICOM metadata to annotated JPG images using image similarity (SSIM). This is the primary script for pairing metadata with extracted images when the original extraction order is unknown.
Key features:
- Uses Structural Similarity Index Measure (SSIM) to compare image content
- Handles count mismatches (e.g., 5 DICOMs to 4 JPGs)
- Configurable confidence threshold (default: 0.5)
- Creates placeholder JSON files for low-confidence or failed matches
Dry-run mode (recommended first):
python scripts/match_dicom_metadata_to_images.py \
--images-dir /path/to/full_images_with_masks_batch_1 \
--dicom-dir /path/to/DICOM \
--dry-runActual run:
python scripts/match_dicom_metadata_to_images.py \
--images-dir /path/to/full_images_with_masks_batch_1 \
--dicom-dir /path/to/DICOMDebug mode (detailed similarity scores):
python scripts/match_dicom_metadata_to_images.py \
--images-dir /path/to/full_images_with_masks_batch_1 \
--dicom-dir /path/to/DICOM \
--debug \
--dry-runCustom confidence threshold:
python scripts/match_dicom_metadata_to_images.py \
--images-dir /path/to/full_images_with_masks_batch_1 \
--dicom-dir /path/to/DICOM \
--confidence-threshold 0.7Generates a CSV summary report of all images and their metadata matching status. Run this after match_dicom_metadata_to_images.py to review results.
Output columns:
- manufacturer, patient_number, filename, relative_file_path
- view_position (AP, LATERAL, etc.)
- similarity_score (0.0-1.0 confidence)
- error (yes/no)
- error_type (no_match, low_confidence, dicom_not_found, etc.)
Usage:
python scripts/create_metadata_summary_csv.py \
--images-dir /path/to/full_images_with_masks_batch_1Custom output filename:
python scripts/create_metadata_summary_csv.py \
--images-dir /path/to/full_images_with_masks_batch_1 \
--output metadata_report.csv-
Explore DICOM structure (optional):
python scripts/extract_DICOM_data.py --patient-dir /path/to/DICOM/study/patient --depth 2
-
Match metadata to images:
# First, run in dry-run mode to preview python scripts/match_dicom_metadata_to_images.py \ --images-dir /path/to/full_images_with_masks_batch_1 \ --dicom-dir /path/to/DICOM \ --dry-run # Then run actual matching to create JSON files python scripts/match_dicom_metadata_to_images.py \ --images-dir /path/to/full_images_with_masks_batch_1 \ --dicom-dir /path/to/DICOM
-
Generate summary report:
python scripts/create_metadata_summary_csv.py \ --images-dir /path/to/full_images_with_masks_batch_1
-
Review results:
- Open
metadata_summary.csvin Excel/Google Sheets - Filter by
error="yes"to find images needing manual review - Sort by
similarity_scoreto prioritize low-confidence matches
- Open