Robustness and Reproducibility of Radiomics in Magnetic Resonance Imaging - A Phantom Study


OBJECTIVES The aim of this study was to investigate the robustness and reproducibility of radiomic features in different magnetic resonance imaging sequences. MATERIALS AND METHODS A phantom was scanned on a clinical 3 T system using fluid-attenuated inversion recovery (FLAIR), T1-weighted (T1w), and T2-weighted (T2w) sequences with low and high matrix size. For retest data, scans were repeated after repositioning of the phantom. Test and retest datasets were segmented using a semiautomated approach. Intraobserver and interobserver comparison was performed. Radiomic features were extracted after standardized preprocessing of images. Test-retest robustness was assessed using concordance correlation coefficients, dynamic range, and Bland-Altman analyses. Reproducibility was assessed by intraclass correlation coefficients. RESULTS The number of robust features (concordance correlation coefficient and dynamic range ≥ 0.90) was higher for features calculated from FLAIR than from T1w and T2w images. High-resolution FLAIR images provided the highest percentage of robust features (n = 37/45, 81%). No considerable difference in the number of robust features was observed between low- and high-resolution T1w and T2w images (T1w low: n = 26/45, 56%; T1w high: n = 25/45, 54%; T2 low: n = 21/45, 46%; T2 high: n = 24/45, 52%). A total of 15 (33%) of 45 features showed excellent robustness across all sequences and demonstrated excellent intraobserver and interobserver reproducibility (intraclass correlation coefficient ≥ 0.75). CONCLUSIONS FLAIR delivers the most robust substrate for radiomic analyses. Only 15 of 45 features showed excellent robustness and reproducibility across all sequences. Care must be taken in the interpretation of clinical studies using nonrobust features.

Investigative Radiology