adding code for the modified chexpert-labeler and explanations of how…

… it was used
ricbl · Apr 6, 2022 · 9824bf9 · 9824bf9
1 parent ceb39c8
commit 9824bf9
Show file tree

Hide file tree

Showing 4 changed files with 211 additions and 9 deletions.
diff --git a/.gitignore b/.gitignore
@@ -7,7 +7,6 @@ data_viewer
 *.EDF
 *.mat
 *.m~
-*.csv
 mimic-sample
 *.json
 *.pkl

diff --git a/README.md b/README.md
@@ -6,7 +6,7 @@ This code was used to collect, process, and validate the REFLACX (Reports and Ey
 
 The code is organized in 4 folders. `pre_processing_or_sampling_or_ibm_training`, `interface_src`, and `post_processing_and_dataset_generation` are provided to show how our dataset was collected, and their scripts might need changes to hard-coded code to adapt it to different needs of data collection.
 `examples_and_paper_numbers` is provided to show how to get the numbers used to validate the publicly available dataset and a few examples of how to use it.
-All scripts have to be run from inside their respective folders.
+All scripts have to be run from inside their respective folders and, unless differently instructed, should be run using a Python environment satisfying the requirements listed below.
 
 Below we provide a short description of each folder and the recommended order for running scripts. The provided paths are relative to each of the folders.
 
@@ -104,7 +104,10 @@ To calculate the agreement between radiologists in terms of manual labeling, run
 
 To get the statistics for the `image_all_paths.txt` images, run `get_filtered_mimic_statistics.py` and `get_sex_statistics_datasets.py`
 
-To get the temporal correlation graph (Figure 4), run `generate_temporal_correlation_numbers.py`, followed by `draw_graph_temporal_correlation.py`
+To produce the temporal correlation graph (Figure 4), run `generate_temporal_correlation_numbers.py`, followed by `draw_graph_temporal_correlation.py`. We provide the file `manually_labeled_reports_3.csv` containing the manually-labeled abnormality mention locations in the reports to generate the graph. The modified chexpert-labeler, provided in the folder `chexpert-labeler`, was used for faster manual labeling following:
+- generate the labels of the reports from the REFLACX dataset using the modified chexpert-labeler, running `extract_report.py`;
+- follow the instruction in `chexpert-labeler/README.md` to create a new Python environment and run `python chexpert-labeler/label.py --reports_path=phase_3.csv --output_path=labeled_reports_3.csv`;
+- labels from 200 random cases in the generated `labeled_reports_3.csv` were manually corrected to match the contents of the reports.
 
 The tables used to calculate some of the numbers shown in the paper are in `tables_calculations/`. These tables were modified from the csv files generated by other scripts.
 
@@ -121,7 +124,6 @@ The following scripts need additional data not provided with the public dataset:
 - `create_calibration_table.py`, which depends on the output from `../post_processing_and_dataset_generation/ASC2MAT.py` and was used to calculate the average and maximum error values for the calibrations.
 - `edit_video.py`, used to generate a video showing interface use through all screens of a case, with all portions recorded by the MATLAB interface.
 
-
 ## Requirements
 
 ### Python

diff --git a/examples_and_paper_numbers/chexpert-labeler/args/arg_parser.py b/examples_and_paper_numbers/chexpert-labeler/args/arg_parser.py
@@ -20,23 +20,23 @@ def __init__(self):
 
         # Phrases
         parser.add_argument('--mention_phrases_dir',
-                            default='./src/chexpert-labeler/phrases/mention',
+                            default='./chexpert-labeler/phrases/mention',
                             help='Directory containing mention phrases for ' +
                                  'each observation.')
         parser.add_argument('--unmention_phrases_dir',
-                            default='./src/chexpert-labeler/phrases/unmention',
+                            default='./chexpert-labeler/phrases/unmention',
                             help='Directory containing unmention phrases ' +
                                  'for each observation.')
 
         # Rules
         parser.add_argument('--pre_negation_uncertainty_path',
-                            default='./src/chexpert-labeler/patterns/pre_negation_uncertainty.txt',
+                            default='./chexpert-labeler/patterns/pre_negation_uncertainty.txt',
                             help='Path to pre-negation uncertainty rules.')
         parser.add_argument('--negation_path',
-                            default='./src/chexpert-labeler/patterns/negation.txt',
+                            default='./chexpert-labeler/patterns/negation.txt',
                             help='Path to negation rules.')
         parser.add_argument('--post_negation_uncertainty_path',
-                            default='./src/chexpert-labeler/patterns/post_negation_uncertainty.txt',
+                            default='./chexpert-labeler/patterns/post_negation_uncertainty.txt',
                             help='Path to post-negation uncertainty rules.')
 
         # Output parameters.
-Original file line number
+Diff line change
@@ Expand Up / @@ -7,7 +7,6 @@ data_viewer @@
     *.EDF
     *.mat
     *.m~
-    *.csv
     mimic-sample
     *.json
     *.pkl
@@ Expand Down @@