Introduction
Motor development refers to the ability of children to move and interact with the environment and is very important in early childhood [
1]. Proper motor development provides an opportunity for children to explore and participate in the world around them [
2]. Several studies have shown that motor development is closely associated with children’s cognitive ability [
3], language [
4], executive functioning [
5], and quality of life [
6]. Children with poor motor development reportedly have poor academic performance as well as depression and anxiety [
7]. In addition, impaired motor development in early childhood can impact learning abilities, which may persist through adolescence or even later in life [
8]. Motor disorders in children are associated with a lower quality of life in several domains, including physical, cognitive, emotional and social functioning [
6]. Children with motor dyspraxia (developmental disorder) require motor intervention to promote their motor skills and to prevent postural abnormalities [
9]. Therefore, early prediction of motor function is important for further intervention and education [
10]. Many assessment instruments or scales have been developed to accurately and efficiently screen for motor development problems in children [
11,
12]. The Peabody Developmental Motor Scales-2 (PDMS-2) is widely used in paediatric practice and research studies to assess the gross and fine motor skills of children from birth to 6 years of age [
13]. The PDMS-2 has been improved and updated based on reviews of the PDMS, comments and queries from the testers and the authors’ own experiences [
14]. The key changes in PDMS include the collection of a more representative sample, the introduction of a different test structure and more specific scoring criteria [
15].
The measurement properties of an instrument were described and defined by the COnsensus-based Standards for the selection of health Measurements INstruments (COSMIN). According to the COSMIN methodology, reliability, validity and responsiveness are the main domains. The reliability was categorized into test-retest, interrater and intrarater reliability, and validity was categorized into content, construct (structural, cross-cultural, hypothesis testing) and criterion validity [
16]. Since the publication of PDMS-2, many studies have examined the measurement properties of this scale. The measurement properties of the original version have been assessed by English-speaking countries [
17‐
19], while the measurement properties of the translated versions have been assessed by non-English-speaking countries [
20,
21]. Although several studies have confirmed the reliability and validity of the PDMS-2 device to be sufficient, there are some contradictory reports on its reliability and validity. For example, the concurrent validity of the PDMS-2 and the Bayley Scales of Infant Development II Motor Scale (BSID-II) was simultaneously reported to be “high correlation” [
22] and “low correlation” [
19]. Despite the heterogeneity of studies on the measurement properties of PDMS-2, no systematic review has addressed this issue. Since PDMS-2 is widely used by clinicians, therapists, psychologists and diagnosticians [
14], establishing consistent evidence on its measurement properties is highly warranted.
The COSMIN methodology is typically employed to evaluate the measurement properties of various tools/scales of a certain field [
23,
24]. Hulteen et al. employed the COSMIN methodology in their systematic review of the measurement properties of several motor assessment scales in children and adolescents [
25]. The COSMN methodology can also be used to review the measurement properties of a single measurement instrument, such as the Body Image Scale [
26]. As reported results are inclusive of the measurement properties (reliability, validity, and responsiveness) of PDMS-2, the COSMIN could be an alternative methodology to delineate this inconsistency. Therefore, we searched for studies that determined the measurement properties of PDMS-2 and employed the COSMIN methodology to conduct a systematic review of the measurement properties of PDMS-2. In this review, we summarize the state of research on the measurement properties of PDMS-2 and synthesize the quality of evidence via the COSMIN methodology.
Methods
Literature search strategy
The PubMed, EMBASE, Web of Science, CINAHL and MEDLINE databases were searched for relevant studies that assessed the different measurement properties of PDMS-2 through January 2023. The search terms or keywords used to identify the name of the scale/instrument (PDMS-2) were “Peabody developmental motor scales-2” OR “PDMS-2” OR “Peabody developmental motor scales-second edition” OR “Peabody developmental motor scales-2nd “. The search term utilized to determine the scale measurement properties was a filter developed by the Patient Reported Outcome Measures (PROMs) Group at the University of Oxford (a high-sensitivity search filter that has been validated by Terwee et al. [
27]. For the article search, we followed the latest version of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA, 2020) guidelines [
28]. The full texts of the selected articles were downloaded from the journal’s homepage. In addition, we contacted our university library or external collaborators for the full-text articles upon necessary. The study protocol was registered in PROSPERO (
https://www.crd.york.ac.uk/prospero/; CRD42022376335).
Inclusion and exclusion criteria
The included literature met the following criteria: (1) the study was conducted on children aged 0–6 years; (2) the study addressed the evaluation of the PDMS-2 measurement properties; and (3) at least one of the scale’s measurement properties was evaluated in the study. The measurement properties of the PDMS-2 include content validity, structural validity, internal consistency, cross-cultural validity/measurement invariance, reliability, measurement error, criterion validity, hypothesis testing for construct validity, and responsiveness. The collected literature was excluded if it met any of the following criteria: (1) used PDMS-2 to investigate children’s motor development; (2) used PDMS-2 to assess the effectiveness of an intervention; (3) was a review and systematic review; or (4) had only an abstract without a full-text article or nonpeer review.
Literature selection and data extraction
The literature search, article selection and data extraction were independently performed by two researchers (YZ and JH), and the results were compared with the help of another author (YQ). Any disagreements were resolved by discussion with other review authors (WY and MK). The literature was imported into EndNote, and duplicates were first excluded. Subsequently, the titles and abstracts of the collected articles were read, and irrelevant articles were excluded. The full texts of the remaining articles were subsequently read and screened according to our study criteria.
The following information was extracted from the literature: first author name, year of publication, studied population and source, region, sample size, age and sex of the children, use of the PDMS-2 language, measurement properties of the PDMS-2 (content validity, structural validity, internal consistency, cross-cultural validity/measurement invariance, reliability, measurement error, criterion validity, hypothesis testing for construct validity, and responsiveness), and data on the measurement properties.
Evaluation of the risk of bias and quality of evidence of the included studies
We used the COSMIN risk of bias checklist [
29] to assess the methodological quality of the studies. The checklist consists of ten sections, including “PROM development, content validity, structural validity, internal consistency, cross-cultural validity/measurement invariance, reliability, measurement error, criterion validity, hypothesis testing for construct validity, and responsiveness”. Appropriate boxes were selected according to the measurement properties of the study. The methodological quality of the studies was assessed as “very good”, “adequate”, “doubtful” or “inadequate” on an item-by-item basis according to the standard score given in the boxes. The overall methodological quality rating of the studies was based on the “worst score principle”. The worst score of the criteria in the box was regarded as the overall methodological quality rating of the study.
The quality of evidence was synthesized according to the modified version of the Grading of Recommendations Assessment, Development and Evaluation (GRADE) method [
24]. This method is an improvement on the original version to accommodate the COSMIN method. The evidence levels could be categorized as “high”, “moderate”, “low” or “very low” according to the standard. The starting level of evidence for the included studies was “high”, and the data were subsequently downgraded according to the characteristics of the included studies. Unlike the original GRADE method, the modified version removes the “publication bias” factor. The quality of evidence was downgraded according to the risk of bias, inconsistency, indirectness, and imprecision.
Overall rating of the measurement properties
The overall rating of each measurement property of the PDMS-2 was assessed by the COSMIN methodology for systematic reviews of the PROM user manual (COSMIN manual) [
30] and the COSMIN methodology for assessing the content validity of the PROM user manual [
31]. The items included “content validity, structural validity, internal consistency, cross-cultural validity/measurement invariance, reliability, measurement error, criterion validity, hypothesis testing for construct validity, and responsiveness” (Table
S1). The reported items for each measurement property were rated as “sufficient (+), “insufficient (-), or “indeterminate (?)” (Table
S2). The overall rating of each measurement property was given as “sufficient (+)”, “insufficient (-)”, “inconsistent (±)”, or “indeterminate (?)”. Inconsistent results were analysed in groups to explore the reasons for this difference.
For reliability, studies were considered sufficient if the Pearson correlation coefficient [
32] or Spearman’s rho correlation coefficient [
33] was ≥ 0.80. Hypothesis testing for construct validity requires the reviewer team to set hypotheses in advance. The hypothesis for this study was as follows: for construct convergent or concurrent validity, the correlation coefficient was expected to be ≥ 0.50 for the correlations with the comparator instrument if a similar construct was measured with respect to the PDMS-2. Construct validity was rated as sufficient (+) if at least 75% of the results were in accordance with the hypotheses, insufficient (−) if at least 75% of the results were not, or indeterminate (?) if no hypotheses were defined.
Discussion
To the best of our knowledge, this is the first systematic review in which the COSMIN methodology was used to assess the measurement properties of PDMS-2. In this study, we evaluated the different properties of PDMS-2, which were reported in 22 articles. According to the COSMIN manual, any measurement instrument or scale with sufficient evidence for content validity (any level quality) or internal consistency (at least low quality) can be categorized as “A” [
30]. Our results showed that the content validity of the PDMS-2 had sufficient moderate-quality evidence, and the internal consistency of the PDMS-2 had sufficient high-quality evidence. These findings revealed that PDMS-2 can be graded as ‘A’, which can be used in motor development research and in clinical settings. The COSMIN manual further states that the results obtained from any “A” grade scale can be trusted [
30].
According to the COSMIN manual, content validity is the most important property of a measurement instrument or scale [
30]. Bums and Grove stated that content validity is obtained from three sources: literature, patient judgement (judgement of representatives of the relevant populations), and expert judgement [
60]. The most commonly used source of content validity is expert judgement [
61], and the COSMIN method combines patient judgement with expert judgement to assess three parts of content validity: relevance, comprehensiveness, and comprehensibility [
30]. In our assessment, only one study reported the content validity of the PDMS-2 [
59]. However, in this study we examined the content validity of the PDMS-2 by asking experts in related fields but not patients/participants [
59]. When using the PDMS-2, patients (children) must complete their movements only following the instructions of the evaluator and do not need to understand the meaning of the PDMS-2 items [
14]. Therefore, no studies assessing the comprehensibility of PDMS-2 were found, but we still consider the content validity of PDMS-2 to be sufficient.
For the assessment of structural validity, the COSMIN quality criterion includes two criteria, namely, CTT and item response theory (IRT) [
30,
62]. All the studies addressing structural validity in our analyses used the CTT method. Although the CTT easily assesses structural validity, the results from the IRT are said to be more reliable in educational and psychometric fields [
63]. Due to its high accuracy, IRT is a highly validated method for assessing the structural validity of PDMS-2 [
63]. However, at present, no study has used the IRT to evaluate the structural validity of the PDMS-2, and further studies are necessary to address the importance of IRT.
According to the COSMIN manual, cross-cultural validity/measurement invariance has been defined as “the degree to which the performance of the items on a translated or culturally adapted measurement instruments are an adequate reflection of the performance of the items of the original version of the measurement instruments” [
30]. In our analyses, we determined that no studies have assessed the cross-cultural validity/measurement invariance of the PDMS-2 by the COSMIN recommended method. We suggest further research on the cross-cultural validity/measurement invariance of the PDMS-2.
The results of the construct validity test demonstrated that the PDMS-2 is well correlated with most of the same-domain measurement instruments. However, the results of the three studies of the PDMS-2 device with BSID-II differed, which might be due to differences in sample type. Of these three studies, one study recruited normally developing children [
19], and two studies recruited exceptional children [
22,
57]. The concurrent validity of the PDMS-2 with the BSID-II among normal children was insufficient because of the small sample size (
n = 15, i.e., < 50) [
19]. However, the concurrent or convergent validity among exceptional children was found to be sufficient for obtaining high-quality evidence (sample size 198, > 100) [
22,
57]. The COSMIN stated that high-quality studies provide stronger evidence than low-quality studies and can be considered decisive in determining the overall rating when ratings are inconsistent [
30]. Overall, our findings revealed that the results of the assessment of PDMS-2 with BSID-II were sufficient. Next, we addressed the convergent validity of the PDMS-2 and M-ABC devices in two studies [
47,
48]; the results were sufficient for the gross motor quotient (GMQ) and inconsistent for the fine motor quotient (FMQ). As the sample size was small and the assessment ratings were inconsistent, the quality of PDMS-2 and M-ABC was considered very low evidence.
The risk of bias of reliability and measurement error was not judged according to the retest interval recommended by the COSMIN risk of bias checklist (approximately two weeks) due to the rapid growth rate of children aged 0 to 6 years. However, we judged the risk of bias in the studies (approximately one week) using another method described by Lee et al. [
32]. A suitable measurement error requires that the smallest detectable change (SDC) in the measurement instrument is less than the MIC [
64]. Only one study was conducted on the SDC and MIC [
54]. The MIC is the best result that can be calculated from multiple studies and using multiple anchors [
65]. Therefore, it is clear that one study alone is not convincing and involves multiple anchors, and we suggest further studies to verify the MIC results.
Responsiveness measures the ability of a scale to change over time in the construct to be measured [
30]. The results of the two included studies [
53,
54] showed sufficient responsiveness of PDMS-2, but the quality of evidence of these two studies was low. There are two reasons for these results. First, these two studies did not describe the intervention details. The second reason is that Wang et al. [
53] used a statistical method (Guyatt’s responsiveness ratio), which is not recommended by COSMIN [
30]. According to the COSMIN manual, Guyatt’s responsiveness ratio takes the minimal important change into account [
30]. A marginally important change concerns the interpretation of the change score, not the validity of the change score [
30]. Low-quality evidence does not mean validating the sufficient or insufficient responsiveness of the PDMS-2 before and after the intervention.
In addition to the abovementioned outcome measures in COSMIN, interpretability and feasibility are also important variables for evaluating the measurement properties of PDMS-2 [
30]. In our assessment, one study [
54] reported no ceiling or floor effects when using the PDMS-2 to assess the motor development of children. Reporting such no ceiling or floor effects indicates good interpretability of the PDMS-2. According to the results of previous studies of PDMS-2 [
14], we assumed that the use of PDMS-2 is highly feasible and that a specific environment and/or equipment are not necessary to assess motor development in children.
The synthesized evidence of the measurement properties of PDMS-2 is comparable to that of other well-known similar domain measurement instruments, such as M-ABC, BOT-2, Bayley-III, and BSID-II. For instance, a previous study reported that the interrater reliability, test-retest reliability and content validity of the M-ABC were good, but mixed results were reported for internal consistency and cross-cultural validity [
66]. The BOT-2 scale was reported to have excellent interrater reliability, test-retest reliability, and internal consistency [
66]. Another study reported that the internal consistency and test-retest reliability of the Bayley-III were good [
35]. In addition, the interrater reliability, internal consistency, and test-retest reliability of the BSID-II were reported to be sufficient [
67]. Our findings demonstrate that the PDMS-2 has sufficient content validity, structural validity, internal consistency, reliability and measurement error with moderate to high-quality evidence.
Limitations and future perspectives
Our results could not establish the quality of evidence for the cross-cultural validity of PDMS-2 because few or no studies have assessed the cross-cultural validity of PDMS-2 via the COSMIN-recommended methodology. For the article search, the Cochrane reviews used various additional sources, including dissertations, editorials, and conference proceedings. However, the probability of finding additional relevant articles for systematic reviews from these sources appears to be low [
24]. As we excluded the nonpeer reviewed articles in our study, our conclusions may not be influenced by these articles; however, we cannot completely exclude them.
To date, no study has addressed the cross-cultural validity of PDMS-2 by the COSMIN recommended method. In addition, only one study assessed the measurement error of PDMS-2. Therefore, further studies are necessary to assess the cross-cultural validity and measurement error of PDMS-2. These measurement properties can be used in the assessment to determine the overall rating and quality of evidence by the COSMIN methodology. We further suggest that future studies on the responsiveness of PDMS-2 that can be used in the COSMIN methodology.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.