Study cohorts
The derivation cohort was a propensity-matched cohort examining the association between anesthetic exposure and subsequent neurodevelopmental outcomes, including LD and ADHD [
6]. In summary, a birth cohort of all children born in Olmsted County, MN from January 1, 1996 to December 31, 2000 was identified. For each child, school enrollment status in the local public school district at age 5 and all episodes of anesthetic exposure before age 3 were identified. The derivation cohort was created by selecting children enrolled in the school district (and thus survived and were resident in Olmsted county until at least age 5) based on their propensity to receive general anesthesia, using multiple variables including information from birth certificates and medical diagnoses to calculate the propensity score. Children were followed up to December 31, 2014.
The validation cohort was generated as part of an unpublished study to study the association of pediatric ICU admission and neurodevelopmental outcomes. A population-based birth cohort included children born in Olmsted County, MN during a 5 year period (1/1/2003–12/31/2007) with an ICU admission prior to age 4. Each child with an ICU admission was matched (based on gender, birth date (± 30 days), maternal age (± 3 years) and education level) with a child who were not admitted to the ICU prior to age 4. These children were followed for up to 11 years after ICU admission (last follow up date 12/31/2013).
All diagnostic codes from birth were available for all cohort members through the Rochester Epidemiology Project, a population-based medical records linkage system [
9]. For each outcome, a master list of all International Classification of Diseases (ICD)-9 codes received by each child during his/her lifetime was generated in chronologic order. A list for further analysis was then generated from the master list by deleting all duplicated ICD-9 codes, resulting in all distinct ICD-9 codes received by each child during their lifetimes.
Classification algorithms
For each outcome (ADHD or LD), we aimed to identify classifier algorithms with optimized predictive ability by comparing algorithm results with the confirmed cases. Machine learning models were developed and trained in the derivation cohort before applied to the validation cohort.
We considered four methods for classification: Least Absolute Shrinkage and Selection Operator (LASSO) logistic regression, Elastic Net (ENET) logistic regression, Classification trees (CART), and Stochastic Gradient Boosting (GBM). Inputs included ICD-9 diagnosis codes. To summarize, LASSO and ENET are regression analysis methods that perform both variable selection and regularization in order to enhance the prediction accuracy and interpretability of the statistical model produced. CART and GBM are tree-based methods, with the latter using boosting to combine many weak classifiers into a single strong classifier. Each method further implemented internal cross-validation selection of tuning parameters that affect model complexity of the classifier based on the misclassification error. For each method, we considered tuning parameters minimizing cross-validation mean misclassification error (MIN) and one maximizing regularization or pruning but still within 1 standard error (1SE) of the minimum misclassification error which may reduce overfitting.
Because the prevalence of cases in the cohort is relatively low, the number of cases is less than the number of non-cases. By minimizing misclassification error misclassification is roughly equally likely to occur in those predicted/classified as case and non-case status; as a result, we would expect a higher proportion of true cases will be misclassified as compared to that of non-cases, leading to low sensitivity and high specificity. One approach to overcome this sensitivity/specificity imbalance is assigning more weight to the cases during model training, thereby increasing the cost of misclassifying a case as compared to a non-case. Several case-weight options were considered in the derivation dataset, modeling prior probabilities for cases ranging from 10 to 75% (representing the proportion of LD or ADHD cases after different weights are assigned).
For each of these methods, additional factors were considered. The first was whether to include within the classifier all ICD-9 codes or a subset of codes that are plausibly related to the diagnosis in question. The former approach makes no presuppositions and uses all data, but may also increase the possibility of spurious associations or overfitting, especially as the number of events within the dataset is relatively modest and substantially less than the number of codes considered (3597). The latter approach may increase specificity, but may also miss important unanticipated associations. For the latter, experts in pediatric neurodevelopment, independently and without access to the data, developed a list of select ICD-9 codes that could conceivably indicate a diagnosis of LD or ADHD, respectively. The LD list included 38 unique ICD-9 codes and ADHD list included 34 unique codes (additional file
1).
The second factor is whether the frequency of an ICD-9 code appearing in a child’s record is incorporated into the algorithm, or whether just the presence of at least one appearance of that code is considered. Repeated coding may imply that that code should be more heavily weighted, but the vagaries of the coding process may also intrude. As this would likely reduce external validity as results may be tuned to the coding practice at our institution, we decided to only pursue the approach using indication of the ICD-9 code rather than the frequency.
An ICD-9 code of 314.XX indicates a diagnosis of ADHD and has been shown in previous studies to have a high sensitivity in identification of ADHD cases when only medical records were examined [
8,
12]. Therefore, a model containing a single ICD code of 314.XX was also evaluated in this study. Similar analysis was not performed for LD due to the lack of clear ICD-9 codes labeling LD.
Classification metrics of the resulting machine learning models included sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and overall accuracy. Kappa was estimated for overall accuracy, controlling for the expected accuracy. Concordance is also described based on the numeric predicted probability of case status. We determined a priori that to identify children with these disorders from administrative data, a model with high sensitivity and PPV would be preferred.
In the derivation cohort, results are presented using an internal non-parametric bootstrap approach. Only select models demonstrating strong sensitivity and PPV in the derivation cohort were carried forward to the validation cohort. Analyses were performed using R statistical software (R version 3.6.1) with the caret package which acts as a wrapper for functions in the glmnet, gbm, and rpart packages [
13‐
17].