Background
The Strengths and Difficulties Questionnaire (SDQ) [
16,
17,
19] is widely used to screen for psychosocial problems among adolescents. The questionnaire is valued for several reasons, among which the availability of SDQ versions for adolescents themselves, their parent(s) and teacher(s), and its focus on both strengths and difficulties, whereas many other questionnaires only focus on problems. The SDQ covers five domains of psychosocial behaviour: emotional difficulties, conduct difficulties, hyperactivity/inattention, social difficulties and prosocial behaviour. The questionnaire is relatively short, especially in comparison to the well-known Child Behavior Checklist (CBCL [
1]) and its self-report version the Youth Self Report (YSR [
2]) that contain scales measuring similar concepts.
An individual’s SDQ scale scores are typically interpreted using norms based on the general population. For the SDQ, cutoffs based on these norms are typically determined so that the scores of the 10% most extremely scoring individuals (i.e., high on the difficulties scales, low on the prosocial behavior scale) are classified as ‘abnormal’, the scores of the 10% next-to-most-extremely scoring individuals as ‘borderline’, and the rest as ‘normal’ [
16]. Thus, the classifications are based on norms corresponding with the 80th and 90th percentiles for the difficulties scales, and the 10th and 20th percentiles for the prosocial behavior scale.
Since its development, norms were published for the original English SDQ and for several translations of its self-reported and parent-reported versions. The use of these versions is supported by ample evidence for their validity for screening purposes [
6,
18,
34,
44,
45,
51]. To gain understanding of how useful the norms for these SDQ versions are among adolescents, three aspects are important to consider. The first is the availability of age-specific norms. As severity of psychosocial problems is known to be related to age [
10,
12], norms for adolescents should be calculated based on a sample consisting of only adolescents. We found such norms for multiple parent-reported SDQ translations (e.g., Danish [
5], Dutch [
24], English, USA [
20], English, Australia [
26], Italian [
42], Japanese [
29], Swedish [
7], and several self-reported SDQ translations (e.g., Danish [
5], English, UK [
19], Hebrew [
23]). Only the norms for the Swedish parent-reported version include norms per year of age (10 to 13 years). These norms show that SDQ scale scores correspond to different percentile ranks across age groups. This suggests that norms per year of age are more appropriate than norms covering larger age ranges.
The second aspect to consider is the national or geographical background of the individuals in the adolescent sample that the norms were based on. For both the parent-reported [
5,
7,
24,
42] and the self-reported [
5,
19] SDQ versions, the SDQ scale score identified as cutoff for the ‘abnormal’ classification (90th percentile) differed somewhat across language versions, suggesting that norms are potentially of limited use within national, cultural or geographical populations other than the population the norms were determined for.
The third aspect to consider is whether the available norms are gender-specific or not. Gender-specific norms allow for comparing an adolescent’s scores to the scores of other adolescents of the same gender. Applying the ‘abnormal’ cutoffs based on these norms results in identification of the 10% most extremely scoring adolescents per gender group. In contrast, joint norms allow for comparing an adolescent’s scores to those of adolescents in general. Applying the ‘abnormal’ cutoffs based on these norms results in identification of the 10% most extremely scoring adolescents, thereby potentially identifying relatively more males than females for some subscales, and vice versa for others. The preference for either gender-specific or joint norms depends on whether SDQ scales measure the intended strengths and difficulties in the same way among male and female adolescents (i.e., whether measurement invariance holds across gender). Joint norms are more appropriate if measurement invariance holds, and gender-specific norms are if it does not. Note that even when a measurement invariance analysis [
28] would yield no evidence against measurement invariance, measurement invariance cannot be ruled out. If all items within a scale have a different meaning for boys than for girls, there is no way to distinguish between lack of measurement invariance and difference in means of latent scores across genders. Underlying this gender-specific versus joint norm preference is a debate about (a) to what extent the DSM-IV [
4] and ICD-10 [
54] criteria on which the SDQ items were based, are valid for both genders (e.g., SDQ scales were found to be predictive for Attention-Deficit/Hyperactivity Disorder (ADHD) and Autism Spectrum Disorder (ASD) e.g. [
6,
16,
49]; for both disorders, the criteria have been identified as being based on male representations of the disorders [
3,
13,
30,
52]), (b) how stereotypes affect the accuracy of recognizing and reporting an adolescent’s problem behavior by individuals who are key to referral and diagnostic processes, and (c) who needs to be identified with the help of SDQ scale scores (e.g., do we want to identify adolescents who manage to compensate for their symptoms or not?).
For Dutch adolescents, norms based on an adolescent sample are available for the parent-reported SDQ version only [
24]. These norms are neither age-specific nor gender-specific, and they have two additional weaknesses. The first is that the accuracy of these norms may be affected, because the normative sample was potentially not fully representative of the Dutch adolescent population and relatively small (
n = 395). Consequently, the resulting cut-off scores may be based on biased norm estimates with substantial uncertainty due to sampling fluctuations. The second weakness of these norms is that they only include norm scores approximately corresponding with the 90th percentile, therewith identifying the ‘abnormal’ category; norms for other categories are lacking. This dichotomization of SDQ scores implies a loss of information, and is arguably less useful for mental healthcare professionals. To better facilitate them, it would be useful to (re)determine norms for the Dutch adolescent-reported and parent-reported SDQ versions. The use of these two versions is supported by ample evidence for their construct validity, including convergent and discriminative [
40,
45,
50,
51], and criterion validity [
40,
49,
51].
The aim of the current study is to present gender-specific and joint normative data per year of age for the self-reported and parent-reported SDQ versions for use among 12- to 17-year-old Dutch adolescents. We aim to present accurate norms for the Dutch general adolescent population by calculating norms using data from adolescent samples of decent sizes (self-report version: n = 993; parent-report version: n = 736), while accounting for potential sample representativity problems regarding gender, socioeconomic status, and ethnic background. Additionally, the data were interpreted with the widely used British norms and with the published but potentially moderately useful Dutch norms, that are both neither age-specific nor gender-specific.
Discussion
The SDQ is widely used to screen for psychosocial problems among adolescents. Norms for interpreting SDQ scale scores are available for multiple language versions of the questionnaire. However, for none of these language versions joint norms and gender-specific norms per year of age were established, even though the occurrence of psychosocial problems is known to be related to age [
10,
12] and gender [
9,
27,
48]. We addressed this issue by providing such norms for the Dutch self-reported and parent-reported SDQ versions for use among 12- to 17-year-old adolescents. The norms showed the presence of age- and gender-effects in the reported extent to which problems occur.
The Dutch self-reported and parent-reported SDQ versions were introduced in 2003 [
45], with UK joint norms available for interpreting SDQ scale scores [
16,
19]. In 2019, Dutch norms were provided for the parent-reported SDQ version [
24]. In our norm groups, we found cutoffs based on the UK norms and the pre-existing Dutch norms to yield detection rates substantially different from the intended 10% of the most extremely scoring adolescents. Compared to the pre-existing UK and Dutch norms, we presume our newly established norms to be more useful for interpreting Dutch adolescents’ scores because they are (a) fairly recent (norms can become outdated [
14,
53]), (b) age-specific, (c) available for the self-reported and the parent-reported SDQ versions, (d) established using regression-based (i.e., continuous) norming, and (e) based on decent sample sizes, with representativity issues corrected for. Besides, we provide not only joint norms, but also gender-specific norms, therewith facilitating comparison of an adolescent’s scores to different reference groups.
The norms that we provide are so-called relative norms, and the resulting norm-referenced test scores express the relative position of the adolescent in comparison to his peers [
25]. This relative norming approach is common in screening practice, as scores of individuals are typically interpreted relative to problem occurrence rates in a community population. A different approach is the criterion-referenced approach, where the criterion-referenced test score expresses the position of an adolescent in relation to an external criterion or standard [
25]. This approach has been applied to obtain norms for the total difficulties scale of the self-reported SDQ version, with student’s subjective school well-being scores as the criterion [
33]. The criterion-referenced approach is preferred over the relative approach when a clear, univocal external criterion is present for a test, and that criterion can be measured reliably. In the absence of such a clear ‘golden’ criterion, the relative approach is preferred, because of its clear interpretation, i.e. where the adolescent stands in relation to his peers.
In this paper, reliability estimates for the SDQ scales were presented. These estimates suggested that the conduct and peer difficulties scales of both the self-reported and the parent-reported SDQ versions as well as for the prosocial behaviour scale of the self-reported SDQ version are insufficiently reliable to warrant their use. While we acknowledge that these scales should be interpreted with some caution, we would also like to point out that criterion validity evidence was found for the conduct difficulties scale of both SDQ versions and the parent-reported social difficulties scale [
49,
51]. These findings suggest that these scales are useful for screening purposes. The contradiction between some of the reliability findings and criterion validity findings possibly indicates that the scales in question measure sufficiently accurate in cases where it matters: among adolescents with more severe problems but not among adolescents without such problems. Investigating this issue goes beyond the scope of this paper, but it should be further examined, possibly using the test information function from the Item Response Theory (IRT) framework.
As a final note we would like to emphasize that the SDQ is not a diagnostic instrument. SDQ scores are meant to provide mental healthcare professionals with a preliminary indication of the nature and occurrence of problems an adolescent is experiencing. If the SDQ scores offer reason for concern, further clinical assessment is needed to determine how to best help the adolescent. Herewith it is important that clinicians are aware of the possible risk of stigmatizing an individual. These words of caution are further supported by the fact that we could not identify ‘borderline’ cutoff values for some scales of the self-reported SDQ version, because the scale scores were strongly skewed and the number of possible scores is rather limited. This shows that these scales in particular can only make a crude distinction between adolescents in terms of problem occurrence at the higher levels, and that further assessment of the individual child is needed to improve the understanding of the problems measured with these scales.
Limitations
The validity of the norms presented in this paper is potentially affected by four aspects. The first is our effort to correct for norm group deviations from the Dutch adolescent population regarding ethnic background and gender by applying weights. To the best of our knowledge, this is an acceptable way to deal with these norm group representativity issues that presumably introduced little bias.
The second specifically regards the gender-specific norms. In the Dutch language, sex and gender are often indicated with the same word. As this word was used in the questionnaires, we do not know whether the resulting indications actually indicate gender. Taking into account the increased prevalence of adolescents with a gender identity that does not match their biological sex, calling our norms gender-specific might be somewhat inaccurate, as we cannot be sure that gender was provided for adolescents whose biological sex contrasts their gender identity.
The third aspect that potentially affects the validity of the norms in this paper is that the norm groups used to establish the norms resulted from combining three community samples, with the data from the most recent samples being gathered four to seven years after the data from the other samples were gathered. By handling these data as if it were one community sample, we assume that the occurrence rate of problems in the community population has not changed over time. We consider this assumption tenable, given the relatively short time span of maximally seven years between collecting the data of the three samples.
The fourth aspect that potentially affects the validity of the norms presented here is that our samples do not contain adolescents attending special education for lower cognitive levels (i.e. IQ lower than 55), for language, hearing, or vision impairments, or for behavioral problems. This means that adolescents with severe behavioural problems, who typically attend special education, are not represented in this study. We presume that the effect for representativity of the data is limited, because the parent and adolescent norm groups seem representative of their respective populations in terms of the socio-economic status, by proxy of the mother’s educational level.
Besides the self-reported and parent-reported SDQ versions, there is also a Dutch SDQ version available that can be completed by teachers. In this paper, we focus on the self-reported and parent-reported version, because the use of these two versions is supported by ample validity evidence [
40,
44,
49‐
51]. Such information is not available for the Dutch teacher-reported version. A potential reason for the absence of such information is that teachers are less likely to be used as informants than adolescents themselves and their parents, because adolescents, compared to children, spend a very limited amount of time with each of their teachers. That being said, we do not know of existing evidence indicating that the teacher version should not be used during adolescence. Therefore, it could be useful to establish norms for the teacher-reported SDQ version as well.
In this paper, norms were established for 8 difficulties scale, but not for the impact scale that was later added to the SDQ [
17]. If an adolescent experiences difficulties in any of the domains covered by the SDQ, the impact scale is meant to provide in indication of the chronicity, distress, and social impairment for the adolescent as well as burden for others. Consequently, we considered independently norming the impact scale irrelevant. Although beyond the scope of this paper, it would be useful to establish a method for interpreting impact scale scores in relation to scores on the difficulties scale of the Dutch self-reported and parents-reported SDQ versions.
Conclusions
This study provides joint and gender-specific norms (percentiles) per year of age for all adolescent self-reported and parent rated Dutch SDQ scales, including the externalizing and internalizing difficulties scales. We provide percentiles for all possible scores of each SDQ scale, which allows for retrieving the ‘classic’ cutoffs (< 80th percentile = ‘normal’, 80–90 percentile = ‘borderline’, > 90th percentile = ‘abnormal’) as well as cutoffs corresponding to any other desired percentile. These normative data thus also allow for cross-country/cultural comparisons of adolescents’ psychosocial behavior.
The gender-specific norms yield different results than joint norms do. They confirm that females tend to report more internalizing problems and males and their parents tend to report more externalizing problems. The results show that detection rates depend on the reference group that is used to interpret SDQ scale scores provided by adolescents and their parents. Note that the results cannot be used to settle the debate on whether norms used in practice should be gender-specific or not. The latter question can be answered once agreement is reached in ongoing debate on, among other things, how valid the DSM-IV/ICD-10 criteria are for both genders and if/how stereotyping affects the processes for referral and diagnosing. Not knowing what the outcome of that debate will be, we present both types of norms, thereby facilitating the comparison of an adolescent’s scores to different reference groups: all similarly aged other adolescents or all similarly aged adolescents of the same gender.
In the Netherlands, an individual’s SDQ scale scores are typically interpreted using norms that were established decades ago based on a British sample. These norms are neither age-specific nor gender-specific. Our study shows that using those norms for interpreting SDQ scale scores provided by Dutch adolescents and their parents results in much lower detection rates than the intended 10% of the most extremely scoring adolescents. We strongly advice a reconsideration of using the British norms in Dutch (mental) healthcare practice.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.