Introduction
Depression has become the leading cause of disability and the major contributor to suicide around the world, thus posing a heavy health burden on society [
1]. With an estimated prevalence of 25% [
2], addressing depression as a public health priority is urgent. Adolescent depression deserves additional concerns since depression tends to have its onset in adolescence [
3]. Given that early treatment remediates the long-term trajectory of depression, adolescence is an essential period for evaluating and intervening in depression. Recent research reported that the global prevalence of depression among adolescents is estimated to be more than 25% during the COVID-19 pandemic [
4,
5]. Monitoring depression during adolescence to improve the early detection and intervention of depression has been recommended in many countries [
6,
7]. Recently, China’s National Health Commission and Ministry of Education have also successively recommended incorporating depression screening into the content of students’ health examinations [
8,
9]. Screening for depression is the cornerstone of early recognition, diagnosis, and management [
10]. Carrying out universal depression screening among adolescents based on appropriate screening tools to ensure early detection and intervention has generally reached a consensus [
11].
In depression screening, using questionnaires to detect potential depression by identifying individuals with scores above a cutoff threshold is a common practice. Of all the tools for measuring depression, the Patient Health Questionnaire-9 (PHQ-9) is the most popular screener at present [
12]. Developed based on the Diagnostic and Statistical Manual of Mental Disorders-IV (DSM-IV), the PHQ-9 reflects nine symptoms of Major Depressive Disorder (MDD) [
13]. The scale is responded to on a 4-point Likert scale (0 = not at all, 3 = nearly every day). The total score of PHQ-9 scores ranges from 0 to 27 by simply summing up item scores, with a higher total score indicating more severe depression. A score of 10 or higher is recommended as a reasonable cutoff for potential depression [
14,
15]. Owing to its brevity, simple scoring method, satisfying psychometric properties, as well as clinical utility, the PHQ-9 has been translated into various languages and used widely worldwide [
16]. It has also shown stable and favorable psychometric properties among Chinese adolescents [
17‐
19]. Moreover, the PHQ-9 has been recommended by the National Health Commission in China to be used for screening for depression among medical and health institutions and schools since 2020 [
9].
However, in situations emphasizing efficiency (e.g., busy clinical practice, large-scale epidemiological studies, studies where depression is a secondary outcome and not the focus of the investigation), measures shorter than the PHQ-9 are even more desirable. To cope with these situations, researchers proposed a short version of the PHQ-9, which consists of two items for evaluating anhedonia and depressed mood [
20]. These two symptoms considered core MDD symptoms in DSM-5 were extracted from the PHQ-9 to form the PHQ-2. The PHQ-2 is usually used in a two-step procedure in which the full PHQ-9 scale or the remaining PHQ-9 items are only applied after a positive screening of the PHQ-2 [
14,
21]. Incorporating such an ultra-short version with the PHQ-9 in large-scale depression screening may be a resource-efficient approach as it can greatly improve screening efficiency and reduce the burden on respondents.
Although some studies have validated the utility of the PHQ-2, items of the PHQ-2 may need to be reconsidered when the aim is to provide a primary measurement for depression screening among adolescents. Several reasons may justify the reconsideration. First of all, specifying anhedonia and depressed mood as ‘core symptoms’ was mainly based on clinical experience by observing adults seeking treatment or undergoing treatment, but the manifestation of depression symptoms in adolescents may be different from that in adults. For instance, by comparing the presentation of DSM-IV depression symptoms in adolescents and adults with MDD, researchers found that somatic symptoms (e.g., loss of energy, appetite change) were more common in adolescent MDD than in adult MDD, and loss of energy was associated with the highest probability of adolescent MDD [
22]. However, the existing PHQ-2 does not include items reflecting somatic symptoms as both anhedonia and depressed mood belong to affective/cognitive aspects. Not assessing somatic symptoms like energy loss in adolescents may result in potential depression cases being missed. Besides, the screening ability of the PHQ-9 original algorithm, which emphasizes anhedonia and depressed mood, is unsatisfactory [
23,
24]. Following the diagnosis criteria of DSM-IV, the PHQ-9 initially suggested the following algorithm: if five or more items score 2 or higher (more than half the days), and at least one item should include anhedonia or depressed mood, the presence of depression can be considered. Although this algorithm follows the rules of DSM-IV more closely, it fails to be more accurate than the simple addition scoring (summing up item scores) that is more commonly used currently [
24]. This implies that the importance of at least one of the two items (anhedonia and depressed mood) may be overestimated, or the significance of other items may be underestimated.
Notably, by aggregating findings from network analysis in clinical and population studies, a recent systematic review found that fatigue and depressed mood were the most critical MDD symptoms across studies, with anhedonia being slightly less central in networks of MDD [
25]. From the emerging perspective of network analysis, the mental disorder is conceptualized as a complex dynamic network composed of interacting symptoms [
26,
27]. In other words, the connection between symptoms constitutes the disorder, not the symptom caused by the disorder. Different symptoms (called nodes in the network) own different importance to the network constituted. Nodes with more or stronger connections with other nodes are considered central nodes (or core nodes). Central nodes are presumed to play a more prominent role in the occurrence and development of mental disorders because the activation of central nodes might directly affect other nodes [
27]. Therefore, items measuring core symptoms identified by network analysis maybe be more suitable to be used in depression screening as the presence of core symptoms implies a high risk of developing more severe depression. Additionally, studies have found that after the outbreak of COVID-19, the network structure of psychopathology symptoms changed to some extent [
28‐
30], and node centrality of each symptom in the network might have altered. Consequently, updated data are needed to analyze the core symptoms of depression and provide a more cutting-edge reference as the pandemic continues. Collectively, emerging evidence suggests that there may be a better ultra-short form beyond the PHQ-2, at least for Chinese adolescents.
Against the above background, by analyzing data from Chinese adolescent samples, this study aimed to identify the core items of the PHQ-9 by network analysis and combine the core items into a new short version. The reliability, validity, cutoff, sensitivity, and specificity of the new short version were calculated and compared with the PHQ-2. The study would provide empirical evidence about the core items of the PHQ-9 and may provide a new ultra-short version of the PHQ-9 for rapid depression screening among Chinese adolescents.
Discussion
Using two separate data sources obtained from Chinese adolescents in two cities with different economic levels, we identified fatigue and depressed mood were two core items of the PHQ-9. The two items were combined to form the PHQ-2 N. The PHQ-2 N displayed satisfactory internal consistency reliability and criterion validity. With the PHQ-9 as the reference, the PHQ-2 N displayed better sensitivity and/or specificity than the PHQ-2. A score of 2 or 3 would be the optimal cutoff for the PHQ-2 N.
Based on node strength from network analysis, we identified depressed mood and fatigue as the core items. Despite differences in PHQ scores between males and females, the network analysis yielded similar results for both genders. The results of the present study support the results of previous network analyses that also used the PHQ-9 to measure depression in adolescents [
42,
43]. Notably, the finding seems not limited to adolescent samples. A systematic review synthesizing results from network analyses of depression symptoms [
25] highlighted the critical role of depressed mood and fatigue. Additionally, findings from a recent randomized clinical trial (mean age of participants was 40.18) also suggested that depressed mood and fatigue seemed to be the most central MDD symptoms and thus may be viable targets for antidepressant interventions [
26]. Network analysis tests connections between symptoms, and symptoms closely connected to other symptoms are regarded as central symptoms. Central depression symptoms like depressed mood and fatigue are assumed to have a widespread impact on the development of depression (which often occurs in adolescence or early adulthood) because their activation may trigger other symptoms. Although more studies are needed to determine the root cause symptom (symptom that first appear and activate other symptoms), this study, along with previous findings from network analysis suggests that depressed mood and fatigue are at the core of the network of depression symptoms and adolescents scored higher at these two symptoms would face a higher risk of depression. Hence, within the scope of developing a prescreen scale for depression screening among adolescents, assessing depressed mood and fatigue may be particularly important.
Moreover, the PHQ-2 N can measure more comprehensive content than the PHQ-2. MDD symptoms are reflected in affective, cognitive, and somatic aspects [
44]. Individuals diagnosed with MDD may have different profiles of symptoms [
22,
45]. For example, phenotypic heterogeneity has been recognized in the manifestation of depression symptomatology in adults and adolescents and fatigue was more likely to be endorsed as a symptom in adolescents [
22]. Correspondingly, specific symptoms measured by the PHQ-9 can also be divided into cognitive-affective and somatic dimensions [
46‐
48]. Both depressed mood and anhedonia are consistently regarded as belonging to the cognitive-affective dimension while fatigue pertains to the somatic dimension across studies [
46,
49,
50]. Hence, compared to the PHQ-2 with only cognitive/affective items, an ultra-short form such as the PHQ-2 N involving both cognitive/affective- and somatic-related symptoms is more comprehensive and may be more suitable in screening adolescent depression.
In addition, with the PHQ-9 as the reference, the PHQ-2 N displayed more advanced sensitivity and specificity. In other words, compared with the PHQ-2, the PHQ-2 N had a lower proportion of false positives and false negatives and thereby had a better screening ability in distinguishing between depressed and non-depressed adolescents. This adds evidence to the importance of measuring fatigue and depressed mood as discussed above. The PHQ-2 N would detect more cases (PHQ-9 ≥ 10) and avoid more false positives. As shown in Table
4, relative to the PHQ-2, the PHQ-2 N screened fewer positive screens and thus requires fewer adolescents to undergo the full PHQ-9 or other treatment with the cutoff being 2 or 3, reducing the burden of respondents involved in the screening. The results support that the PHQ-2 N may be a better ultra-short version than the PHQ-2.
In line with previous studies examining the optimal cutoff of the PHQ-2 [
14,
20], the current study suggested that the PHQ-2 N had balanced sensitivity and specificity at the cutoff score of 2 and 3. Sensitivity and specificity differ upon the threshold score of 2 and 3. As the cut-point increased, specificity improved at the expense of reduced sensitivity inevitably (Table
3). Therefore, the cutoff should be further determined according to the purpose of use. Specifically, if the goal is to improve the detection rate as much as possible, 2 points would be prudent and more certain that all those with a PHQ-9 total score meeting the threshold are detected.
Some strengths and implications of this study are worth mentioning. First of all, we used two independent samples consisting of adolescents in cities with different economic levels which strengthens the robustness of the results. Second, the sample size of both samples was large and the gender distribution was balanced, which allowed us to conduct gender-stratified analyses to take gender differences in depression into account. Third, we have generated normative data for the three PHQ scales, as our data were collected after the COVID-19 pandemic, which had a negative impact on adolescents’ mental health and led to increased depression [
51], along with the consideration that the pandemic is still ongoing, our normative data of PHQ scales can offer a more up-to-date reference. Fourth, all measures used in the study have been tested for reliability and validity. As far as the authors can determine, this study is the first to achieve the goal of abbreviating the PHQ-9 through the statistical procedure. Although our samples include only Chinese adolescents, we did provide a simple and effective screening tool (PHQ-2 N) for rapid and large-scale depression screening in Chinese adolescents.
This study is exploratory in nature and there are limitations that need to be addressed in future studies. First of all, since the primary purpose of this study was to establish a preliminary screening scale, this study included only general adolescents and lacked diagnostic measures to evaluate the criterion validity of the PHQ-2 N. Consequently, the findings of the current study may not be generalizable to the clinical population. Although a systematic review of depression networks suggested that the sample type (clinical vs. population-based settings) did not confound the result that fatigue and depressed mood are the most central symptoms [
25], future studies are encouraged to add diagnostic gold standards in adolescent samples to verify or modify the findings of this study. Moreover, this study only analyzed data from adolescent respondents recruited from two Chinese cities, and it is unclear whether the findings can be generalized to other samples of adolescents or even adults in other countries. Considering the same item may display different nuances depending on translation, which can lead to different interpretations of the symptom content across different cultures and contexts, we suggest future studies examine the psychometric properties of the PHQ-2 N in a wider range of populations and areas to confirm or refute the findings. Given the PHQ-2 has more published evidence of its reliability and validity, further research comparing the PHQ-2 and PHQ-2 N is warranted.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.