We present the results of the ISMRM 2020 joint Reproducible
+ Research and Quantitative MR study groups reproducibility challenge on
+ T1 mapping in phantom and human brain. T1 mapping, a widely used
+ quantitative MRI technique, exhibits inconsistent tissue-specific
+ values across protocols, sites, and vendors. The challenge aimed to
+ assess the reproducibility of a well-established inversion recovery T1
+ mapping technique, with acquisition details published solely as a PDF,
+ on a standardized phantom and in human brains. Participants acquired
+ T1 mapping data from MRIs of three manufacturers at 3T, resulting in
+ 39 phantom datasets and 56 datasets from healthy human subjects. The
+ T1 inter-submission variability was twice as high as the
+ intra-submission variability in both phantoms and human brains,
+ indicating that the acquisition details in the selected paper were
+ insufficient to reproduce a quantitative MRI protocol. This study
+ reports the inherent uncertainty in T1 measures across independent
+ research groups, bringing us one step closer to a practical clinical
+ baseline of T1 variations in vivo. This challenge resulted in the
+ creation of a comprehensive open database of T1 mapping acquisitions,
+ accessible at
+
Dashboard. a) welcome page listing all the sites, the
+ types of subject, and scanner, and the relationship between the
+ three. Row b) shows two of the phantom dashboard tabs, and row c)
+ shows two of the human data dashboard tabs Link:
+
The conception of this collaborative reproducibility challenge + originated from discussions with experts, including Paul Tofts, Joëlle + Barral, and Ilana Leppert, who provided valuable insights. + Additionally, Kathryn Keenan, Zydrunas Gimbutas, and Andrew Dienstfrey + from NIST provided their code to generate the ROI template for the + ISMRM/NIST phantom. Dylan Roskams-Edris and Gabriel Pelletier from the + Tanenbaum Open Science Institute (TOSI) offered valuable insights and + guidance related to data ethics and data sharing in the context of + this international multi-center conference challenge. The 2020 RRSG + study group committee members who launched the challenge, Martin + Uecker, Florian Knoll, Nikola Stikov, Maria Eugenia Caligiuri, and + Daniel Gallichan, as well as the 2020 qMRSG committee members, Kathryn + Keenan, Diego Hernando, Xavier Golay, Annie Yuxin Zhang, and Jeff + Gunter, also played an essential role in making this challenge + possible. We would also like to thank the Canadian Open Neuroscience + Platform (CONP), the Quebec Bioimaging Network (QBIN), and the + Montreal Heart Institute Foundation for their support in creating the + NeuroLibre preprint. Finally, we extend our thanks to all the + volunteers and individuals who helped with the scanning at each + imaging site.
+The authors thank the ISMRM Reproducible Research Study Group for + conducting a code review of the code (Version 1) supplied in the Data + Availability Statement. The scope of the code review covered only the + code’s ease of download, quality of documentation, and ability to run, + but did not consider scientific accuracy or code efficiency.
+Lastly, we acknowledge use of ChatGPT (v3), a generative language + model, for accelerating manuscript preparation. The co-first authors + employed ChatGPT in the initial draft for transforming bullet point + sentences into paragraphs, proofreading for typos, and refining the + academic tone. ChatGPT served exclusively as a writing aid, and was + not used to create or interpret results.
+ +The following section in this document repeats the narrative
+ content exactly as found in the
+
Significant challenges exist in the reproducibility of quantitative
+ MRI (qMRI)
+ (
Among fundamental MRI parameters, T1 holds significant importance
+ (
In ongoing efforts to standardize T1 mapping methods, researchers
+ have been actively developing quantitative MRI phantoms
+ (
The 2020 ISMRM reproducibility challenge
+
The challenge asked researchers with access to the ISMRM/NIST
+ system phantom
+ (
Researchers were also instructed to collect T1 maps in healthy
+ human brains, and were asked to measure a single slice positioned
+ parallel to the anterior commissure - posterior commissure (AC-PC)
+ line. Prior to imaging, the imaging subjects consented
+
Researchers followed the inversion recovery T1 mapping protocol
+ optimized for the human brain as described in the paper published by
+ Barral et al.
+ (
Data submissions for the challenge were handled through a GitHub
+ repository
+ (
Figure 1 A snapshot of the figures (top row) included in the + reproducible preprint + (https://preprint.neurolibre.org/10.55458/neurolibre.00023) and the + dashboard (bottom row, https://rrsg2020.db.neurolibre.org).
+A reduced-dimension non-linear least squares (RD-NLS) approach + was used to fit the complex general inversion recovery signal + equation:
+where a and b are complex constants. This approach, developed by
+ Barral et al.
+ (
A data processing pipeline was written using MATLAB/Octave in a
+ Jupyter Notebook. This pipeline downloads every dataset from OSF.io
+ (
The T1 plate (NiCl2 array) of the phantom has 14 spheres that
+ were labeled as the regions-of-interest (ROI) using a numerical mask
+ template created in MATLAB, provided by NIST researchers (Figure
+ 1-c). To avoid potential edge effects in the T1 maps, the ROI labels
+ were reduced to 60% of the expected sphere diameter. A registration
+ pipeline in Python using the Advanced Normalization Tools (ANTs)
+ {cite}
For human data, manual ROIs were segmented by a single researcher
+ (M.B., 11+ years of neuroimaging experience) using FSLeyes
+ {cite}
Analysis code and scripts were developed and shared in a
+ version-controlled public GitHub repository
+
The mean T1 values of the ISMRM/NIST phantom data for each ROI
+ were compared with temperature-corrected reference values and
+ visualized in three different types of plots (linear axes, log-log
+ axes, and error relative to the reference value). Temperature
+ correction involved nonlinear interpolation
+
To widely disseminate the challenge results, a web-based + dashboard was developed (Figure 2, + https://rrsg2020.dashboards.neurolibre.org). The landing page + (Figure 2-a) showcases the relationship between the phantom and + brain datasets acquired at different sites/vendors. Selecting the + Phantom or In Vivo icons and then clicking a ROI will display + whisker plots for that region. Additional sections of the dashboard + allow for displaying statistical summaries for both sets of data, a + magnitude vs complex data fitting comparison, and hierarchical shift + function analyses.
+Figure 3 presents a comprehensive overview of the challenge + results through violin plots, depicting inter- and intra- submission + comparisons in both phantoms (a) and human (b) datasets. For the + phantom (Figure 3-a), the average inter-submission COV for the first + five spheres, representing the expected T1 value range in the human + brain (approximately 500 to 2000 ms) was 6.1%. By addressing + outliers from two sites associated with specific challenges for + sphere 4 (signal null near a TI), the mean inter-submission COV was + reduced to 4.1%. One participant (submission 6, Figure 1) measured + T1 maps using a consistent protocol at 7 different sites, and the + mean intra-submission COV across the first five spheres for this + submission was calculated to be 2.9%.
+For the human datasets (Figure 3-b), inter-submission COVs for + independently-implemented imaging protocols were 5.9% for genu, 10.6 + % for splenium, 16 % for cortical GM, and 22% for deep GM. One + participant (submission 18, Figure 1) measured a large dataset (13 + individuals) on three scanners and two vendors, and the + intra-submission COVs for this submission were 3.2% for genu, 3.1% + for splenium, 6.9% for cortical GM, and 7.1% for deep GM. The + binomial appearance for the splenium, deep GM, and cortical GM for + the sites used in the inter-site analyses (green) can be explained + by an outlier measurement, which can be seen in (Figure 4 e-f, site + 3.001).
+A scatterplot of the T1 data for all submissions and their ROIs + is shown in Figure 4 (phantom a-c, and human brains d-f). The NIST + phantom T1 measurements are presented in each plot for different + axes types (linear, log, and error) to better visualize the results. + Figure 4-a shows good agreement for this dataset in comparison with + the temperature-corrected reference T1 values. However, this trend + did not persist for low T1 values (T1 < 100-200 ms), as seen in + the log-log plot (Figure 4-b), which was expected because the + imaging protocol is optimized for human brain T1 values (T1 > 500 + ms). Higher variability is seen at long T1 values (T1 ~ 2000 ms) in + Figure 4-a. Errors exceeding 10% are observed in the phantom spheres + with T1 values below 300 ms (Figure 4-c), and 3-4 measurements with + outlier values exceeding 10% error were observed in the human brain + tissue range (~500-2000 ms).
+Figure 4 d-f displays the scatter plot data for human datasets + submitted to this challenge, showing mean and standard deviation T1 + values for the WM (genu and splenium) and GM (cerebral cortex and + deep GM) ROIs. Mean WM T1 values across all submissions were 828 ± + 38 ms in the genu and 852 ± 49 ms in the splenium, and mean GM T1 + values were 1548 ± 156 ms in the cortex and 1188 ± 133 ms in the + deep GM, with less variations overall in WM compared to GM, possibly + due to better ROI placement and less partial voluming in WM. The + lower standard deviations for the ROIs of human database ID site 9 + (by submission 18, Figure 1, and seen in orange in Figure 4d-g) are + due to good slice positioning, cutting through the AC-PC line and + the genu for proper ROI placement, particularly for the corpus + callosum and deep GM.
+This challenge focused on exploring if different research groups
+ could reproduce T1 maps based on the protocol information reported
+ in a seminal publication
+ (
Overall, our approach did show improvement in the reproducibility
+ of T1 measurements in vivo compared to researchers implementing T1
+ mapping protocols completely independently (i.e. with no central
+ guidance), as literature T1 values in vivo vary more than reported
+ here (e.g., Bojorquez et al.
+ (
This analysis highlights that more information is needed to unify
+ all the aspects of a pulse sequence across sites, beyond what is
+ routinely reported in a scientific publication. However, in a
+ vendor-specific setting, this is a major challenge given the
+ disparities between proprietary development libraries
+ (
The 2020 Reproducibility Challenge, jointly organized by the
+ Reproducible Research and Quantitative MR ISMRM study groups, led to
+ the creation of a large open database of standard quantitative MR
+ phantom and human brain inversion recovery T1 maps. These maps were
+ measured using independently implemented imaging protocols on MRI
+ scanners from three different manufacturers. All collected data,
+ processing pipeline code, computational environment files, and
+ analysis scripts were shared with the goal of promoting reproducible
+ research practices, and an interactive dashboard was developed to
+ broaden the accessibility and engagement of the resulting datasets
+ (https://rrsg2020.dashboards.neurolibre.org). The differences in
+ stability between independently implemented (inter-submission) and
+ centrally shared (intra-submission) protocols observed both in
+ phantoms and in vivo could help inform future meta-analyses of
+ quantitative MRI metrics
+ (
By providing access and analysis tools for this multi-center T1 + mapping dataset, we aim to provide a benchmark for future T1 mapping + approaches. We also hope that this dataset will inspire new + acquisition, analysis, and standardization techniques that address + non-physiological sources of variability in T1 mapping. This could + lead to more robust and reproducible quantitative MRI and ultimately + better patient care.
+The
+
This
+
Submissions were reviewed by MB and AK. + Submission guidelines + (https://github.com/rrsg2020/data_submission/blob/master/README.md) + and a GitHub issue checklist + (https://github.com/rrsg2020/data_submission/blob/master/.github/ISSUE_TEMPLATE/data-submission-request.md) + were checked. Lastly, the submitted data was passed to the T1 + processing pipeline and verified for quality and expected values. + Feedback was sent to the authors if their submission did not adhere + to the requested guidelines, or if issues with the submitted + datasets were found and if possible, corrected (e.g., scaling issues + between inversion time data points).
+Strictly speaking, not all manufacturers operate
+ at 3.0 T. Even though this is the field strength advertised by the
+ system manufacturers, there is some deviation in actual field
+ strength between vendors. The actual center frequencies are
+ typically reported in the DICOM files, and these were shared for
+ most datasets and are available in our OSF.io repository
+ (https://osf.io/ywc9g/). From these datasets, the center frequencies
+ imply participants that used GE and Philips scanners were at 3.0T
+ (
https://mybinder.org/v2/gh/rrsg2020/analysis/master?filepath=analysis
+The T1 values vs temperature tables reported by
+ the phantom manufacturer did not always exhibit a linear
+ relationship. We explored the use of spline fitting on the original
+ data and quadratic fitting on the log-log representation of the
+ data, Both methods yielded good results, and we opted to use the
+ latter in our analyses. The code is found
+
Only T1 maps measured using phantom version 1 + were included in this inter-submission COV, as including both sets + would have increased the COV due to the differences in reference T1 + values. There were seven research groups that used version 1, and + six that used version 2.
+