Background
Child maltreatment, including physical abuse, sexual abuse, emotional abuse, and neglect impacts a significant number of children worldwide [
1‐
3]. For example, a survey involving a nationally representative sample of American children selected using telephone numbers from 2013 to 2014 found that lifetime rates of maltreatment for children aged 14 to 17 was 18.1% for physical abuse, 23.9% for emotional abuse, 18.4% for neglect, and 14.3% and 6.0% for sexual abuse of girls and boys respectively [
4]. Child maltreatment is associated with many physical, emotional, and relationship consequences across the lifespan, such as developmental delay first seen in infancy; anxiety and mood disorder symptoms and poor peer relationships first seen in childhood; substance use and other risky behaviours often first seen in adolescence; and increased risk for personality and psychiatric disorders, relationship problems, and maltreatment of one’s own children in adulthood [
5‐
9]. Given the high prevalence and serious potential negative consequences of child maltreatment, clinicians need to be informed about strategies to accurately identify children potentially exposed to maltreatment, a task that “can be one of the most challenging and difficult responsibilities for the pediatrician” [
10]. Two main strategies for identification of maltreatment—screening and case-finding—are often compared to one another in the literature [
11,
12]. Screening involves administering a standard set of questions, or applying a standard set of criteria, to assess for the suspicion of child maltreatment in all presenting children (“mass screening”) or high-risk groups of children (“selective screening”). Case-finding, alternatively, involves providers being alert to the signs and symptoms of child maltreatment and assessing for potential maltreatment exposure in a way that is tailored to the unique circumstances of the child.
A previous systematic review by Bailhache et al. [
13] summarized “evidence on the accuracy of instruments for identifying abused children during any stage of child maltreatment evolution before their death, and to assess if any might be adapted to screening, that is if accurate screening instruments were available.” The authors reviewed 13 studies addressing the identification of physical abuse (7 studies), sexual abuse (4 studies), emotional abuse (1 study), and multiple forms of child maltreatment (1 study). The authors noted in their discussion that the tools were not suitable for screening, as they either identified children too late (i.e., children were already suffering from serious consequences of maltreatment) or the performance of the tests was not adaptable to screening, due to low sensitivity and specificity of the tools [
13].
This review builds upon the work of Bailhache et al. [
13] and performs a systematic review with the objective of assessing evidence about the accuracy of instruments for identifying children suspected of having been exposed to maltreatment (neglect, as well as physical, sexual abuse, emotional abuse). Similar to the review by Bailhache et al. [
13], we investigate both screening tools and other identification tools or strategies that could be adapted into screening tools. In addition to reviewing the sensitivity and specificity of instruments, as was done by Bailhache et al. [
13], for five studies, we have also calculated estimates of false positives and negatives per 100 children, a calculation which can assist providers in making decisions about the use of an instrument [
14]. This review contributes to an important policy debate about the benefits and limitations of using standardized tools (versus case-finding) to identify children exposed to maltreatment. This debate has become increasingly salient with the publication of screening tools for adverse childhood experiences, or tools that address child maltreatment alongside other adverse experiences [
15,
16].
It should be noted here that while “screening” typically implies identifying
health problems, screening for child maltreatment is different in that it usually involves identifying
risk factors or
high-risk groups. As such, while studies evaluating tools that assist with identification of child maltreatment are typically referred to as diagnostic accuracy studies [
17], the word “diagnosis” is potentially misleading. Instead, screening tools for child maltreatment typically codify several risk and clinical indicators of child maltreatment (e.g., caregiver delay in seeking medical attention without adequate explanation). As such, they may more correctly be referred to as tools that identify potential maltreatment, or signs, symptoms and risk factors that have a strong association with maltreatment and may lead providers to consider maltreatment as one possible explanation for the sign, symptom, or risk factor. Assessment by a health care provider should then include consideration of whether there is reason to suspect child maltreatment. If maltreatment is suspected, this would lead to a report to child protection services (CPS) in jurisdictions with mandatory reporting obligations (e.g., Canada, United States) or to child social services for those jurisdictions bound by occupational policy documents (e.g., United Kingdom) [
18]. Confirmation or verification of maltreatment would then occur through an investigation by CPS or a local authority; they, in turn, may seek consultation from one or more health care providers with specific expertise in child maltreatment. Therefore, throughout this review we will refer to identification tools as those that aid in the identification of
potential child maltreatment.
Methods
A protocol for this review is registered with the online systematic review register, PROSPERO (PROSPERO 2016:CRD42016039659) and study results are reported according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) checklist (see supplemental file
1). As the review by Bailhache et al. [
13] considered any English or French materials published between 1961 and April 2012 (only English-language materials were retrieved from their search), we searched for English-language materials published between 2012 and July 2, 2019 (when the search was conducted). Additional inclusion criteria are found in Table
1. Inclusion criteria for this review were matched to those in Bailhache et al.’s [
13] review. We included diagnostic accuracy studies [
17] that 1) evaluated a group of children by a test, examination or other procedure (hereafter referred to as the index test) designed to identify children potentially exposed to maltreatment and also 2) evaluated the same group of children (or ideally a random subsample) by a reference standard (acceptable reference standards are listed in Table
1) that confirmed or denied exposure to potential maltreatment. We excluded articles that assessed psychometric properties of child maltreatment measures unless diagnostic data was available in the paper.
Table 1
Inclusion and exclusion criteria
Inclusion criteria 1. Population. Children (0 < 18) 2. Intervention (index test). Any instrument that assessed if children had been exposed to physical abuse, sexual abuse, emotional abuse, or neglect. Studies had to describe the index test with enough information to be replicable. 3. Comparator (reference test). Studies had to have an acceptable reference standard, i.e. “expert assessments, such as child’s court disposition; substantiation by the child protection services or other social services; assessment by a medical, social or judicial team using one or several information sources (caregivers or child interview, child symptoms, child physical examination, and other medical record review)” [ 13]. 4. Outcomes. Studies had to assess one of the following outcomes: sensitivity, specificity, positive predictive value, or negative predictive value. 5. Study design. Studies need not include a comparison population (e.g., case series could be included if the intention was to evaluate one of the outcomes listed above). |
Exclusion criteria 1. Ineligible population. Studies that only addressed adults’ or children’s exposure to intimate partner violence. 2. Ineligible intervention (index test). Studies that identified a clinical indicator for child maltreatment, such as retinal hemorrhaging, but not child maltreatment itself and tools that identified a different population (e.g., general failure to thrive, children’s exposure to intimate partner violence). 3. Ineligible comparator (reference test). Studies that did not have an acceptable reference standard (e.g., parent reports of abuse were ineligible). 4. Ineligible outcomes. Studies that at minimum did not set out to evaluate at least one of the following accuracy outcomes: sensitivity, specificity, positive predictive value, negative predictive value. 5. Ineligible publication types. Studies published as abstracts were excluded, as not enough information was available to critically appraise the study design. Also excluded were studies published in non-article format, such as books or theses. The latter were excluded for pragmatic issues, but recent research suggests that inclusion of these materials may have little impact on results [ 19]. |
The searches for the review update were conducted in seven databases: Medline, Embase, PsycINFO, Cumulative Index to Nursing and Allied Health Literature, Sociological Abstracts, the Education Resources Information Center, and Cochrane Libraries (see supplemental file
2 for example search). Forward and backward citation chaining was also conducted to complement the search. All articles identified by our searches were screened independently by two reviewers at the title and abstract and full-text level. An article suggested for inclusion by one screener was sufficient to forward it to full-text review. Any disagreements at full text stage were resolved by discussion.
Data extraction and analysis
For all included studies, one author extracted the following data: study design, the study’s inclusion criteria, form of potential child maltreatment assessed, index tool, sample size, reference standard, and values corresponding to sensitivity and specificity. While our original protocol indicated that we would extract and analyze data about child outcomes (e.g., satisfaction, well-being), service outcomes (e.g., referral rates), and child well-being outcomes (e.g., internalizing symptoms, externalizing symptoms, suicidal ideation) from the studies (e.g., from randomized trials that evaluated screening versus another identification strategy and assessed associated outcomes), no such data were available. Extracted data were verified by a second author by cross-checking the results in all tables with data from the original articles. Disagreements were resolved by discussion.
Sensitivity and specificity are “often misinterpreted and may not reflect well the effects expected in the population of interest” [
14]. Other accuracy measures, such as false positives and false negatives, can be more helpful for making decisions about the use of an instrument [
14], but determining them requires a reasonable estimate of prevalence in the intended sample (in this case of the exposure, child maltreatment) and in the intended setting (e.g., emergency department). Although there are no clear cut-off points for acceptable proportions of false negatives and positives, as acceptable cut-offs depend on the clinical setting and patient-specific factors, linking false positives and negatives to downstream consequences (e.g., proportion of children who will undergo a CPS investigation who should not or who miss being investigated) can assist practitioners in determining acceptable cut-offs for their practice setting.
For those studies where prevalence estimates were available, sensitivity and specificity values were entered into GRADEpro software in order to calculate true/false positives/negatives per 100 children tested. This free, online software allows users to calculate true/false positives/negatives when users enter sensitivity and specificity values of the index test and an estimate of prevalence. In GRADEpro, true/false positives/negatives can be calculated across 100, 1000, 100,000, or 1,000,000 patients. We selected 100 patients as a total, as it allows easy conversion to percentage of children. We also give an example of true/false positives/negatives per 100,000 children tested, which is closer to a population estimate or numbers across several large, emergency departments. To calculate these values, two prevalence rates were used (2 and 10%) based on the range of prevalence of child maltreatment in emergency departments in three high-income country settings [
20], as most of the identified screening tools addressed children in these settings. Use of these prevalence rates allow for a consistent comparison of true/false positives/negatives per 100 children across all applicable studies. For consistency and to enhance accuracy of calculations in GRADEpro of true/false positives/negatives proportions per 100, where possible, all sensitivity and specificity values and confidence intervals for the included studies were recalculated to six decimal places (calculations for confidence intervals used: p ± 1.96 × √p(1-p)/n]). In GRADEpro, the formula for false positives is (1 - specificity)*(1 - prevalence) and the formula for false negatives is (1 - sensitivity)*(prevalence). As the majority of studies differed in either a) included populations or b) applied index tests, we were unable to pool data statistically across the studies. Instead, we narratively synthesized the results by highlighting the similarities and differences in false positives/negatives across the included studies.
For the population estimate, we modeled the effects of the SPUTOVAMO checklist for children with physical abuse or neglect on downstream consequences for children under 8 years of age presenting to the emergency department with any physical injury. We calculated true/false positives/negatives per 100,000 using the lower end of the prevalence range (2%) [
20]. Based on American estimates, we assumed that 17% of children who are reported to child welfare are considered to have substantiated maltreatment and among children with substantiated maltreatment, 62% may receive post-investigation services [
21]. We also modeled downstream consequences of false negatives, based on an estimate that 25 to 50% of children who are exposed to maltreatment need services for mental health symptoms [
22]. We modeled consequences of false positives by assuming that all suspicions lead to reports which lead to CPS investigations.
Critical appraisal
One author critically appraised each study using the QUADAS-2 tool [
23] and all data were checked by a second author, with differences resolved through consensus. The QUADAS-2 tool evaluates risk of bias related to a) patient selection, b) index test, c) reference standard, and d) flow and timing. Questions related to “applicability” in QUADAS-2 were not answered because they overlap with questions involved in the GRADE process [
17]. As the developers of QUADAS-2 note [
23], an overall rating of “low” risk of bias is only possible when all domains are assessed as low risk of bias. An answer of “no” to any of the questions indicates that both the domain (e.g., “patient selection”) and the overall risk of bias for the study is high. In this review, a study was rated as “high” risk of bias if one or more domains was ranked as high risk of bias, a study was ranked as “low” risk of bias when all domains were rated as low risk of bias and a study was ranked as “unclear” risk of bias otherwise (i.e., when the study had one or more domains ranked as “unclear” risk of bias and no domains ranked as “high” risk of bias).
Grading of recommendations, assessment, development and evaluation (GRADE)
Evidence was assessed using GRADE [
17]. GRADE rates our certainty that the effect we present is close to the true effect; the certainty that the effect we present is close to the true effect is rated as high, moderate, low or very low certainty. A GRADE rating is based on an assessment of five domains: (1) risk of bias (limitations in study designs); (2) inconsistency (heterogeneity) in the direction and/or size of the estimates of effect; (3) indirectness of the body of evidence to the populations, interventions, comparators and/or outcomes of interest; (4) imprecision of results (few participants/events/observations, wide confidence intervals); and (5) indications of reporting or publication bias. For studies evaluating identification tools and strategies, a body of evidence starting off with cross-sectional accuracy studies is considered “high” certainty and then is rated down to moderate, low, or very low certainty based on the five factors listed above.
Discussion
This review updates and expands upon the systematic review published by Bailhache et al. [
13] and was conducted to evaluate the effectiveness of strategies for identifying potential child maltreatment. Since the publication of Bailhache et al.’s [
13] systematic review, there have been 18 additional studies published. The included studies reported the sensitivity and specificity of three screening tools (the SPUTOVAMO checklist, the Escape instrument, and a 6-item screening questionnaire for child sex trafficking), as well as the accuracy of an identification tool for medical child maltreatment, “triggers” embedded in an electronic medical record, four clinical prediction tools (Burns Risk Assessment for Neglect or Abuse Tool, Pediatric Brain Injury Research Network clinical prediction rule, Predicting Abusive Head Trauma, and Hymel’s 4- or 5- or 7-variable prediction models), and two predictive symptoms of abusive head trauma (parenchymal brain lacerations and hematocrit levels ≤30% on presentation). As the Bailhache et al. [
13] systematic review identified no screening tools, the creation of the SPUTOVAMO checklist, Escape instrument, and 6-item child sex trafficking screening questionnaire represents a notable development since their publication. The recent creation of an identification tool for child sex trafficking also reflects current efforts to recognize and respond effectively to this increasingly prevalent exposure. Aside from these new developments, many of the other points discussed by Bailhache et al. [
13] were confirmed in this update: it is still difficult to assess the accuracy of instruments to identify potential child maltreatment as there is no gold standard for identifying child maltreatment; what constitutes “maltreatment” still varies somewhat, as does the behaviours that are considered abusive or neglectful (e.g., we have excluded children’s exposure to intimate partner violence, which is increasingly considered a type of maltreatment); and it is still challenging to identify children early in the evolution of maltreatment (many of the identification tools discussed in this review are not intended to identify children early and as such children are already experiencing significant consequences of maltreatment).
The studies included in this systematic review provide additional evidence that allow us to assess the effectiveness of strategies for identifying potential exposure to maltreatment. Based on the findings of this review (corresponding with the findings of Bailhache et al.’s [
13] review), we found low certainty evidence and high numbers of false positives and negatives when instruments are used to screen for potential child maltreatment. Although no studies assessed the effect of screening tools on child well-being outcomes or recurrence rates, based on data about reporting and response rates [
21,
22], we can posit that children who are falsely identified as potentially maltreated by screening tools will likely receive a CPS investigation that could be distressing. Furthermore, maltreated children who are missed by screening tools will not receive or will have delayed access to the mental health services they need.
We identified several published instruments that are not intended for use as screening tools, such as clinical prediction rules for abusive head trauma. Clinical prediction tools or rules, such as Hymel’s variable prediction model, combine medical signs, symptoms, and other factors in order to predict diseases or exposures. While they may be useful for guiding clinicians’ decision-making, and may be more accurate than clinical judgement alone [
59], they are not intended for use as screening tools. Instead, the tools “act as aids or prompts to clinicians to seek further clinical, social or forensic information and move towards a multidisciplinary child protection assessment should more information in support of AHT [abusive head trauma] arise” [
41]. As all identification tools demand clinician time and energy, widespread implementation of any (or a) clinical prediction tool is not warranted until it has undergone three stages of testing: derivation (identifying factors that have predictive power), validation (demonstrating evidence of reproducible accuracy), and impact analysis (evidence that the clinical prediction tool changes clinician behaviour and improves patient important outcomes) [
60]. Similar to the findings of a recent systematic review on clinical prediction rules for abusive head trauma [
41], in this review we did not find any clinical prediction rules that had undertaken an impact analysis. However, several recent studies have considered the impact of case identification via clinical prediction rules. This includes assessing if the Predicting Abusive Head Trauma clinical prediction rule alters clinicians’ abusive head trauma probability estimates [
61], emergency clinicians’ experience with using the Burns Risk Assessment for Neglect or Abuse Tool in an emergency department setting [
62], and cost estimates for identification using the Pediatric Brain Injury Research Network clinical predication rule as compared to assessment as usual [
63]. Additional research on these clinical predication rules may determine if such rules are more accurate than a clinician’s intuitive estimation of risk factors for potential maltreatment or how the tool impacts patient-important outcomes.
Many of the included studies had limitations in their designs, which lowered our confidence in their reported accuracy parameters. Limitations in this area are not uncommon. A recent systematic review by Saini et al. [
64] assessed the methodological quality of studies assessing child abuse measurement instruments (primarily studies assessing psychometric properties). The authors found that “no instrument had adequate levels of evidence for all criteria, and no criteria were met by all instruments” [
64]. Our review also resulted in similar findings to the original review by Bailhache et al. [
13], in that 1) most studies did not report sufficient information to judge all criteria in the risk of bias tool; 2) most studies did not clearly blind the analysis of the reference standard from the index test (or the reverse); 3) some studies [
26,
36,
37,
39] included the index test as part of the reference standard (incorporation bias), which can overestimate the accuracy of the index test; and 4) some studies used a case-control design [
29,
31,
36], which can overestimate the performance of the index test. A particular challenge, also noted by Bailhache et al. [
13], was the quality of reporting in many of the included studies. Many articles failed to include clear contingency tables in reporting their results, making it challenging for readers to fully appreciate missing values and potentially inflated sensitivity and specificity rates. For example, one study evaluating the SPUTOVAMO checklist reported 7988 completed SPUTOVAMO checklists. However, only a fraction of these completed checklists were evaluated by the reference standard (verification bias, discussed further below) (193/7988, 2.4%) and another reference standard (a local CPS agency) was used to evaluate an additional portion of SPUTOVAMO checklists (246/7988, 3.1%). However, the negative predictive and positive predictive value calculations were based on different confirmed cases. Ideally missing data and indeterminate values should be reported [
23]. Researchers have increasingly called for diagnostic accuracy studies to report indeterminate results as sensitivity analysis [
65].
Verification bias was a particular study design challenge in the screening studies identified in this review. For example, Dinpanah et al. [
25] examined the accuracy of the Escape instrument, a five-question screener applied in emergency department settings, for identifying children potentially exposed to physical abuse, sexual abuse, emotional abuse, neglect, or intimate partner violence. The authors report a sensitivity and specificity of 100 and 98 respectively. While the accuracy was high, their study suffered from serious verification bias as approximately 137 out of 6120 (2.2%) of children suspected of having been maltreated received the reference standard. For the children who did not receive the reference standard, there is no way to ascertain the number of children who were potentially maltreated, but unidentified (false negatives). Furthermore, as inclusion in this study involved a convenience sample of children/families who a) gave consent for participation and b) cooperated in filling out the questionnaire, we do not know if the children in this study were representative of their study population. In addition, unlike screening tools for intimate partner violence [
66,
67], none of the screening for possible maltreatment tools have been evaluated through randomized controlled trials; as such, we have no evidence about the effectiveness of such tools on reducing recurrence of maltreatment or improving child well-being.
This review identified one study which evaluated a screening tool that did not suffer from serious verification bias or incorporation bias. Sittig et al. [
27] evaluated the ability of the SPUTOVAMO five-question checklist to identify potential physical abuse or neglect in children under the age of 8 years who presented to an emergency department with any physical injury. While no children exposed to potential physical abuse were missed by this tool, at a population level a large number of children were falsely identified as potentially physically abused (over 13,000); furthermore, at a population level, many children potentially exposed to neglect were missed by this tool (334 per 100,000). Qualitative research suggests that physicians report having an easier time detecting maltreatment based on physical indicators, such as bruises and broken bones, but have more challenges identifying less overt forms of maltreatment, such as ‘mild’ physical abuse, emotional abuse, and children’s exposure to intimate partner violence [
68]. The authors of this study suggest that the SPUTOVAMO “checklist is not sufficiently accurate and should not replace skilled assessment by a clinician” [
27].
The poor performance of screening tests for identifying children potentially exposed to maltreatment that we found in this review leads to a similar conclusion to that reached for the World Health Organization’s Mental Health Gap Action Programme (mhGAP) update, which states that “there is no evidence to support universal screening or routine inquiry” [
69]. Based on the evidence, the mhGAP update recommends that, instead of screening, health care providers use a case-finding approach to identify children exposed to maltreatment by being “alert to the clinical features associated with child maltreatment and associated risk factors and assess for child maltreatment, without putting the child at increased risk” [
69]. As outlined in the National Institute for Health and Clinical Excellence (NICE) guidance for identifying child maltreatment, indicators of possible child maltreatment include signs and symptoms; behavioural and emotional indicators or cues from the child or caregiver; and evidence-based risk factors that prompt a provider to consider, suspect or exclude child maltreatment as a possible explanation for the child’s presentation [
70]. The NICE guidance includes a full set of maltreatment indicators that have been determined based on the results of their systematic reviews [
70]. This guidance also discusses how providers can move from “considering” maltreatment as one possible explanation for the indicator to “suspecting” maltreatment, which in many jurisdictions invokes a clinician’s mandatory reporting duty. In addition, there are a number of safety concerns that clinicians must consider before inquiring about maltreatment, such as ensuring that when those children who are of an age and developmental stage where asking about exposure to maltreatment is feasible, this should occur separately from their caregivers and that systems for referrals are in place [
71].
The findings of this review have important policy and practice implications especially since, as noted in the introduction, there is an increasing push to use adverse childhood experiences screening tools in practice [
15,
16]. While we are not aware of any diagnostic accuracy studies evaluating adverse childhood experiences screening tools, it is unclear how these tools are being used in practice, or how they will in the future be used in practice [
72]. For example, does a provider who learns a child has experienced maltreatment via an adverse childhood experiences screener then inform CPS authorities? What services is the child entitled to based on the findings of an adverse childhood experiences screener, if the child indicates they have experienced child maltreatment along with other adverse experiences? The findings of the present review suggest that additional research is needed on various child maltreatment identification tools (further accuracy studies, along with studies that assess acceptability, cost effectiveness, and feasibility) before they are implemented in practice. The findings also suggest the need for more high-quality research about child maltreatment identification strategies, including well-conducted cohort studies that follow a sample of children identified as not maltreated (to reduce verification bias) and randomized controlled trials that assess important outcomes (e.g., recurrence and child well-being outcomes) in screened versus non-screened groups. The results of randomized controlled trials that have evaluated screening in adults experiencing intimate partner violence underscore the need to examine the impacts of screening [
66,
67]. Similar trials in a child population could help clarify risks and benefits of screening for maltreatment. Future systematic reviews that assess the accuracy of tools that attempt to identify children exposed to maltreatment by evaluating parental risk factors (e.g., parental substance use) would also complement the findings of this review.
Strengths and limitations
The strengths of this review include the use of a systematic search to capture identification tools, the use of an established study appraisal checklist, calculations of false positives and negatives per 100 where prevalence estimates were available (which may be more useful for making clinical decisions than sensitivity and specificity rates), and the use of GRADE to evaluate the certainty of the overall evidence base. A limitation is that we included English-language studies only. There are limitations to the evidence base, as studies were rated as unclear or high risk of bias and the overall certainty of the evidence was low. Additional limitations include our reliance on 2 and 10% prevalence rates commonly seen in emergency departments [
20] and our use of American estimates to model potential service outcomes following a positive screen (e.g., number of children post-investigation who receive services). These prevalence rates likely do not apply across different countries where prevalence rates are unknown. For example, one study evaluated the Escape instrument in an Iranian emergency department. While the authors cite the 2 to 10% prevalence rate in their discussion [
25], we are unaware of any studies estimating prevalence of child maltreatment in Iranian emergency departments. When known, practitioners are encouraged to use the formulas in the methods section (or to use GRADEpro) to estimate false positives and negatives based on the prevalence rates of their setting, as well as known estimates for service responses in their country, in order to make informed decisions about the use of various identification strategies. Furthermore, our modelling of services outcomes assumes that 1) all positives screens will be reported and 2) that reports are necessarily stressful/negative. While many of the included studies that used CPS as a reference standard reported all positive screens, it is unclear if this would be common practice outside of a study setting (i.e., does a positive screen trigger one’s reporting obligation?). Further research is needed to determine likely outcomes of positive screens. It is also important to recognize that while reviews of qualitative research do identify that caregivers and mandated reporters have negative experiences and perceptions of mandatory reporting (and associated outcomes), there are some instances where reports are viewed positively by both groups [
68,
73]. Finally, because our review followed the inclusion/exclusion criteria of Bailhache et al. [
13] and excluded studies that did not explicitly set out to evaluate sensitivity, specificity, positive predictive values or negative predictive values, it is possible that there are additional studies where such information could be calculated.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.