Introduction
Disc pathology identification and classification using MR imaging
Machine learning
What this review will add
Methods
Author—date | Name of study | Disc pathology identified | # participants | # discs/ images | Design and algorithm/s used | MRI characteristics | Design/results comments |
---|---|---|---|---|---|---|---|
Athertya et al., 2019 | Detection of Modic changes in MR images of the spine using local binary patterns | Development study assessing Modic change types 1 and 2 or ‘healthy’ | 100 | 500 | Texture features extracted with local binary patterns. Compared classifiers kNN, NB, SVM & RT, with RF highest accuracy | T1W T2W sagittal plane 1.5 T | No pre-processing, so real-world validity, but aggressive data augmentation 10 MC1 10 cases trained as 160. MC increased frequency in older participants. MC type 1 more prevalent in females. Compared random train/test split with tenfold cross-validation. Contrast the use of data augmentation |
Athertya et al., 2021 | Classification of certain vertebral degenerations using MRI image features | Development study assessing Modic change severity (mild, moderate, severe) as well as chronicity (acute or chronic) | 100 | 500 | Texture analysis using the different local binary patterns. They compare the performance of random forest and SVM algorithms, reporting highest performance for SVM | T1W T2W STIRW sagittal plane 1.5 T | Same data as Athertya et al., 2019. Applied oversampling of cases (data augmentation) using synthetic data through SMOTE technique. They compare random train/test split with tenfold cross-validation. They also contrast the use of data augmentation |
Beulah et al., 2018 | Disc bulge diagnostic model in axial lumbar MR images using Intervertebral disc Descriptor (IdD) | Development study assessing disc bulge, disc herniation | 93 | 675 | Automatic axial segmentation followed by feature extraction and the linear combination of the different features. SVM was compared with kNN, decision tree and FFNN for binary disc bulge classification | T2W axial plane 1.5 T | Train/test split of the same number of cases and controls to avoid class bias. They assess disc bulge in a binary state. Posteriorly they extend the model to assess disc herniation |
Beulah et al., 2021 | Degenerative disc disease diagnosis from lumbar MR images using hybrid features | Development study assessing disc degeneration | 93 | 558 | Automatic sagittal segmentation followed by feature extraction of different texture features. SVM was compared with kNN, decision tree and FFNN for binary disc degeneration classification | T2W sagittal plane 1.5 T | Image pre-processing by filter application (e.g., maxima, thinning, opening). A hybrid model since training features include the signal intensity of MRI and invariant moments and Gabor texture features. tenfold cross-validation used in classification |
Castro-Mateos et al., 2016 | Intervertebral disc classification by its degree of degeneration from T2-weighted magnetic resonance images | Development study assessing Pfirrmann score | 48 | 240 | Semi-automatic segmentation followed by extraction of features to represent Pfirrmann definition (5 levels). NN classification performance compared (but not fully reported) with SVM and logistic regression | T2W sagittal plane 0.4 T | Novel extension of active contour models. Image pre-processing by pixel normalization and fuzzy C-means filtering. NN model selected includes swarm optimization. Images divided into test and training sets maintaining the proportion of different Pfirrmann levels in both sets |
Ebrahimzadeh et al., 2018 | Toward an automatic diagnosis system for lumbar disc herniation: the significance of local subset feature selection | Development study assessing disc herniation | 30 | 210 | Automatic segmentation, followed by feature extraction and feature selection. Classification of binary herniation performed with kNN, SVM and Deep learning | T2W used, sagittal plane and 1.5 T | Automatic segmentation based on spinal cord extraction previously to disc boundary definition. Features selected with feature decision tree. tenfold cross-validation used in classification |
Gao et al., 2021 | Automated grading of lumbar disc degeneration using a push–pull regularization network based on MRI | Development study assessing Pfirrmann score | 500 | 2500 | Automatic segmentation and classification of 5 grade Pfirrmann score using a CNN model | T2W sagittal plane 3.0 T | The CNN model consists of convolutional layers, pooling layers to extract image features and a fully-connected (FC) layers to perform classification. Push–pull regularization is used to enhance performance. Different CNN models were compared, Resnet-34 giving the best results |
Ghosh et al., 2011 | Composite features for automatic diagnosis of intervertebral disc herniation from lumbar MRI | Development study assessing disc herniation | 35 | 175 | Automatic segmentation followed by different feature extraction. Features are then combined, and different classifiers are used, including KNN, naive Bayes, and SVM | T2-SPIR sagittal plane 3.0 T | Automatic segmentation based on probabilistic models. Different features include raw, LBP (Local Binary Patterns), Gabor, GLCM (gray-level co-occurrence matrix), intensity and shape features. Dimensionality feature reduction using PCA and Linear Discriminant Analysis (LDA) and combinations were compared. tenfold cross-validation |
Gong et al., 2021 | Axial-SpineGAN: Simultaneous segmentation and diagnosis of multiple spinal structures on axial magnetic resonance imaging images | Development study assessing disc degeneration | 62 | 169 | Automatic axial segmentation of 4 spinal structures followed by a combination of CNN and other NN modules to perform feature extraction and disc degeneration (normal and abnormal) classification | T2W sagittal plane 1.5 T | Model includes a generator consisting of CNN and FC modules to combine spatially overlapping tissue information and extract features. Discriminator enhances generator performance. Diagnostic of output from different tissues. fivefold cross-validation |
Grob et al., 2022 | External validation of the deep learning 'SpineNet' for grading radiological features of degeneration on MRIs of the lumbar spine | External validation study assessing Pfirrmann score | 882 | 4410 | External validation consisting of patients with degenerative spinal disorders from 2 previous trials | T2W sagittal plane T—NR | Ground truth generated by a single radiologist. Inter-rater agreement between SpineNet and the radiologist demonstrated with weighted Kappa, CAA, Spearmans’ rank and Lin’s concordance correlation coefficients, and precision (positive predictive value), sensitivity (recall) and specificity |
Han et al., 2018 | Spine-GAN: Semantic segmentation of multiple spinal structures | Development study assessing disc degeneration | 253 | 1818 | Automatic segmentation, combination and binary classification of 3 spinal tissues. DL modules included CNN, atrous CNN and long short-term memory module (LSTM). Discriminator similar to Gong et al., 2021 | T1W T2W sagittal plane 1.5 T | LSTM module integrates information on the different tissues and reduces overfitting. Discriminator helps increase segmentation and classification performance. fivefold cross-validation. Data collected from different centres, 2 radiologists established ground truth |
Hashia et al., 2020 | Texture features'-based classification of MR images of normal and herniated intervertebral discs | Development study assessing disc herniation | 99 | NR | Manual segmentation followed by different texture features extraction. Classification of binary herniation was performed with kNN, SVM and Back Propagation Neural Network (BPNN) on each set of features and performance was compared | T2W sagittal plane T—NR | Texture features set comprise features extracted by GLRLM, GLCM and GLDM texture analysis techniques, respectively. Data was split into train, validation and test sets for performance evaluation |
He et al., 2017 | Automated grading of lumbar disc degeneration via supervised distance metric learning | Development study assessing disc degeneration | 93 | 465 | HOG texture features were extracted from disc images. Features were used to classify discs as normal or slightly, marked or severe low disc degeneration by distance metric learning | T2W sagittal plane 1.5 T | Distance metric learning minimizes the distance between two distributions, in this case, the distribution of features and the distribution of levels on the grading. Feature distance was defined according to Mahalanobis distance. fivefold cross-validation |
Jamaludin et al., 2016 | Automatic Modic changes classification in spinal MRI | Development study assessing Modic Changes | 444 | 4656 (endplates) | Automatic vertebra segmentation and alignment between T1W and T2W followed by histogram feature extraction. Features used to classify Modic change (no MC, type 1, 2 and 3) using SVM | T1W T2W sagittal plane T—NR | Features from T1W and T2W were joined by spatially-binned joint histogram of intensities, SJT. Data augmentation (43 transformations) was applied to 5 samples and performances of all transformations were mean. fivefold cross-validation |
Jamaludin et al., 2017 | SpineNet: Automated classification and evidence visualization in spinal MRIs | Development study assessing Pfirrmann score | 2009 | 12,018 | Automatic vertebra and disc segmentation followed by disc volume (3D) extraction. Pixels in the disc volume are used for classification of different spine conditions and scores (including 5-point Pfirrmann) in a CNN model | T2W sagittal plane T—NR | CNN model based on a modified version of VGG-M architecture. Deeper CNN architectures were tried but no differences were observed. Comparison of using 2D and 3D images shows in general better performance in 3D. Disc level adjustments were additionally implemented, and no improvement was observed. Different data augmentation strategies are used in the training step |
Koh et al., 2012 | Disc herniation diagnosis in MRI using a CAD framework and a two-level classifier | Development study assessing disc herniation | 70 | 350 | Manual segmentation and labelling followed by image subdivision and feature extraction. 4 different algorithms, including NN, k-means, SVM and a least mean square (linear) classifier, are combined in an ensemble classifier to diagnose herniation (binary) | T1W T2W sagittal plane 3 T | Image noise reduction applied to images. Feature vectors include the pairwise ratio of pixels corresponding to the spinal cord, vertebra and disc in the image subdivision. The ensemble classifier consists of a weighted agreement within the 4 different algorithms. Ground truth extracted from medical reports. Leave-out cross-validation used |
Lehnen et al., 2021 | Detection of degenerative changes on MR images of the lumbar spine with a convolutional neural network: A feasibility study | External validation study assessing disc bulge, disc herniation | 146 | 888 | External validation of Columbo software. Columbo is based on CNN architecture, and it can assess different spinal structures and classify different pathologies | T2W axial sagittal planes 1.5 T 3.0 T | Ground truth generated by a single expert reader. Subjects were patients with back pain. Images tested include 1.5 T or 3.0 T MRI. Comparison between software and reader was done by McNemar test and sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and accuracy |
Lewandrowski et al., 2020 | Feasibility of deep learning algorithms for reporting in routine spine magnetic resonance imaging | Development study assessing disc bulge, disc herniation | 3560 | 17,800 | Automatic axial and sagittal segmentation with 3D reconstruction. Classification using 4 class grading that includes bulge and herniation was accomplished using CNN | T1W T2W sagittal axial transverse planes T—NR | Ground truth from semi-supervised ML from radiologist reports trained using 5000 manually translated reports. Axial and sagittal MRI intersections were stacked and used as input for the CNN model. A final decision tree transforms the output of the classification into written reports. They used a random train/test split |
McSweeney et al., 2022 | External validation of Spine Net a deep learning model for automated grading of lumbar disc degeneration using the North Finland Birth Cohort | External validation study assessing Pfirrmann score, Modic change | 684 | 3420 | SpineNet external validation. Validation in a 4-point modified Pfirrmann grade and binary MC | T2W sagittal plane 1.5 T | Participants from a population birth cohort. 3 expert readers for Pfirrmann and 2 for MC scans. A subset of LBP subjects to match the training profile was separately evaluated. Lin's CC, Cohen's K, MCC, sensitivity, specificity and accuracy were reported |
Niemeyer et al., 2021 | A deep learning model for the accurate and reliable classification of disc degeneration based on MRI data | Development study assessing Pfirrmann score | 1599 | 7948 | Manual disc segmentation and labelling. Disc images were used to train CNN-based model to predict Pfirrmann score (5-point and extended 13-point score) | T2W sagittal plane 1.5 T 3.0 T | CNN used was modification of VGG-16 architecture. Ground truth from single radiologist. Extended fractional Pfirrmann score to account for between-grade presentations. Comparison between the classical and extended version reported. Data augmentation (fourfold) by rotation the training set. tenfold cross-validation |
Nikravan et al., 2016 | Toward a computer-aided diagnosis system for lumbar disc herniation disease based on MR images analysis | Development study assessing disc herniation | 30 | 210 | Automatic segmentation and disc labelling followed by intensity and shape feature extraction. SVM and neural networks were used for a binary herniation classification | T2W sagittal plane 1.5 T | Segmentation is based on Otsu thresholding and extraction of the spinal cord, followed by disc alienation and boundary definition. Random train/test split used |
Oktay et al., 2014 | Computer-aided diagnosis of degenerative intervertebral disc diseases from lumbar MR images | Development study assessing disc degeneration | 102 | 612 | Automatic segmentation of disc followed by different feature extraction. SVM classifier is used to classify binary disc degeneration | T1W T2W axial sagittal planes 1.5 T | Segmentation of discs were performed using active appearance models (AAM). Features include intensity, texture, whole shape, and context features. Each feature is tested by itself and in combination. Using all features gave best performance. fivefold cross-validation used |
Pan et al., 2021 | Automatically diagnosing disk bulge and disk herniation with lumbar magnetic resonance images by using deep convolutional neural networks | Method and development study assessing disc bulge | 500 | 3555 | Automatic axial and sagittal segmentation and next disc classification was performed using CNN modules. Classification levels comprised normal, bulge or herniated disc | T2W axial plane 3.0 T | 3 CNN models were used to locate vertebral bodies, define intervertebral discs and classify the images. Classification used ResNet-101. Classification performance was assessed level-wise. They performed fourfold cross-validation |
Su et al., 2022 | Automatic grading of disc herniation, central canal stenosis and Nerve root compression in lumbar MRI diagnosis | Development study assessing disc herniation | 1015 | 15,254 | No segmentation used. Feature extraction and the following 4-point herniation classification was performed using deep learning architectures. 2 additional pathologies were also assessed. CNN ResNet-50 was used for feature extraction and FC layers for classification | T2W used, axial plane and 3.0 T | Ground truth by 2 readers with a 3rd one to solve disagreement. Participants were patients with back pain. Data augmentation by random rotation and cropping training data. Random training, validation and test splits on training data |
External validation study assessing disc herniation | 100 | 1273 | External validation of the model using patients from another hospital | T2W axial plane 3.0 T | |||
Sundarsingh et al., 2020 | Diagnosis of disc bulge and disc desiccation in lumbar MRI using concatenated shape and texture features with random forest classifier | Development study assessing disc bulge | 63 | 378 | Automatic segmentation followed by different feature extraction and final disc bulge classification using random forest | T2W sagittal plane 1.5 T | Classification classes include normal, bulge and desiccated discs. Shape features (HOG) and texture features (LS-RBR) are combined and/or compared with combination giving a better performance. Random train and test split were used |
Tsai et al., 2021 | Lumbar disc herniation automatic detection in MRI based on deep learning | Development study assessing disc herniation | 168 | 714 | Automatic segmentation and classification of 4-grade bulge and herniation grading on normalized MRI. CNN architecture used after normalization | T2W sagittal plane 1.5 T | CNN is based on YOLOv3 (DarkNet-50) model. Grading includes bulge, protrusion, extrusion and sequestration extracted from clinical reports. They use different amounts of data augmentation to assess the level of over and underfitting and compare their performances. They use a random train/validation/test split. Performance measures used were not following standards |
Zheng et al., 2022 | Deep learning-based high-accuracy quantitation for lumbar intervertebral disc degeneration from MRI | Development study assessing Pfirrmann score | 1051 | 5255 | Focused on automatic segmentation with no classification but quantitative measurements of signal intensity and shape features. Features then regressed against different LDD scores and demographics | T2W sagittal plane 1.5 T | Segmentation based on CNN (BianqueNet – ResNet101). From disc segment, the signal-intensity difference (ΔSI), average disc height (DH), disc-height index (DHI), and disc height-to-diameter ratio (HDR) were calculated. These measurements were then correlated to Pfirrmann scores and additionally with different age ranges, sex and the disc level. No actual classification was performed with AI |
Meta-analysis
Variable | Estimate | SE | Estimate 95% CI | Z-value | P-value |
---|---|---|---|---|---|
Sensitivity | 4.239 | 7.074 | (− 9.625, 18.104) | 0.599 | 0.549 |
Specificity | 4.946 | 7.074 | (− 8.919, 18.812) | 0.699 | 0.484 |
Year of publication | 1.025 | 6.769 | (− 12.242, 14.292) | 0.151 | 0.88 |
External validation | −4.169 | 2.745 | (− 9.549, 1.21) | −1.519 | 0.129 |
Data augmentation | 0.547 | 0.097 | (0.357, 0.738) | 5.626 | < .0001 |
Phenotype classification# | – | – | – | – | 0.027 |
Algorithm# | – | – | – | – | 0.668 |
Variable | Estimate | Estimate 95% CI | SE | Z-value | P-value |
---|---|---|---|---|---|
Year of publication | − 1.292 | (− 6.126, 3.542) | 2.466 | − 0.524 | 0.601 |
External validation | − 2.359 | (− 3.535, − 1.183) | 0.6 | − 3.931 | < .0001 |
Data augmentation | 0.668 | (0.562, 0.776) | 0.055 | 12.251 | < .0001 |
Phenotype classification# | – | – | – | – | 0.789 |
Algorithm# | – | – | – | – | 0.824 |
Results
Types of studies
Magnetic resonance imaging specifications
LDD classifications
Performance metrics and algorithms
Variable | Estimate | Estimate 95% CI | SE | Z-value | P-value |
---|---|---|---|---|---|
Year of publication | 1.031 | (−1.293, 3.354) | 1.186 | 0.87 | 0.385 |
External validation | −0.724 | (−1.047, −0.401) | 0.165 | −4.396 | < .0001 |
Data augmentation | −0.113 | (−0.581, 0.355) | 0.239 | −0.474 | 0.635 |
Phenotype classification# | – | – | – | – | 0.043* |
Algorithm# | – | – | – | – | 0.92 |