Introduction
Cancer is a leading cause of morbidity and mortality worldwide, and the number of newly diagnosed cancer cases per year is increasing (Sung et al.
2021). Effective diagnosis and prevention are key points for improving patient prognosis. Although new cancer screening methods are available, including low-dose computed tomography (LDCT) for lung cancer scanning and gastrointestinal endoscopy to detect gastroenteric tumors, the detection of tumors and survival rates remain unsatisfactory (Ajani et al.
2022; Benson et al.
2021; Blandin Knight et al.
2017). Their usage has been limited due to factors such as radiation exposure, invasion, high false-positive rates, and high costs. Therefore, there is an urgent need to develop a reliable, non-invasion, accurate, and cost-effective approach for detecting cancers.
Cell-free DNA (cfDNA) is a DNA fragment released into the bloodstream by cell apoptosis or necrosis (Kustanovich et al.
2019). In cancer patients, a fraction of cfDNA is released by tumor cells, which is termed circulating tumor DNA (ctDNA) (Aggarwal et al.
2021). Accumulating evidence suggests that circulating DNA fragments do not undergo a random fragmentation process. They carry genetic and epigenetic information from the cell and tissue of origin, which can be associated with tissue sources, disease status, chromatin accessibility, and nuclease activities (Chan et al.
2016; Jiang et al.
2018; Lo et al.
2021; Sun et al.
2015). As a result, cfDNA analysis, as a non-invasive approach, is increasingly finding applications in the fields of tumor diagnosis and treatment. The most prevalent method for cfDNA detection is somatic mutation sequencing. However, the sensitivity of mutation-based approaches may be compromised in patients with a limited number of recurrent mutations and the presence of non-tumor mutations resulting from clonal hematopoiesis of indeterminate potential (Cescon et al.
2020; Genovese et al.
2014). Additionally, several methylation-based cfDNA assays have been developed. Nevertheless, a significant challenge hindering its clinical implementation is the current incapability of existing detection technologies to simultaneously achieve high sensitivity, low cost, and deep sequencing coverage (Luo et al.
2021).
Recently, several cfDNA fragmentation-based approaches, including fragment size, end motifs, and nucleosome footprints, etc., have become new multi-omics technologies after ctDNA mutation and methylation (Garcia-Pardo et al.
2022; Wang et al.
2019). For example, tumor-secreted DNA tends to possess a shorter length compared to non-tumor-secreted DNA. Thus, detecting cfDNA fragment size can help distinguish cancer patients from healthy subjects (Chabon et al.
2020; Mouliere et al.
2018). However, the sensitivities of the approaches based on fragment size features may not be sufficient for clinical use in some tumors with low ctDNA shedding rates (Cristiano et al.
2019; Mouliere et al.
2018). Alternatively, analysis of cfDNA fragmentation features, including nucleosome position, occupancy and spacing (referred to as nucleosome footprints analysis) can reveal the tumor tissue-of-origin (Snyder et al.
2016; Vanderstichele et al.
2022). However, current approaches for cfDNA nucleosome remain challenging due to a lack of robust computational methods (Doebley et al.
2022). The profile of cfDNA end motifs represents a distinct type of plasma DNA fragmentation signature, revealing a large number of tumor derived changes (Jiang et al.
2020). As demonstrated by previous research, the hepatocellular carcinoma (HCC)-derived DNA fragments carry a different distribution of end motifs compared to non-tumoral DNA (Jiang et al.
2018). Moreover, a recent study revealed that patients with HCC exhibited a preferential pattern of cfDNA 4-mer end motifs compared to non-HCC subjects, with an AUC of 0.86 (Jiang et al.
2020). Such preferred end motifs were also observed among other cancer types (Jiang et al.
2020; Wang et al.
2023a). More importantly, the tumor-associated preferred end motifs are more pervasive, hence more readily detectable, and may therefore serve as an emergent class of ctDNA signatures. These findings collectively suggested that the end motif approach could outperform other fragment measures in identifying a variety of cancers (Cristiano et al.
2019). However, the utility of cfDNA end motifs in multiple cancer types still needs verification. Thus, we hypothesized that differences in cfDNA end motifs could enhance sensitivity for detecting cancer, as demonstrated by previous studies (Jiang et al.
2020,
2018) and facilitate non-invasive genomic analysis of cancer. In this study, we utilized 4 bp-end-motif profile to establish a robust model for detection multiple cancers. This model, based on 4 bp-end-motif, could also predict the response to immunotherapy. Our findings demonstrate that the proposed approach can aid in cancer detection and guide treatment.
Methods
Patient cohorts and sample collection
This study enrolled 106 individuals with histologically confirmed cancer and 100 healthy volunteers from the Affiliated Geriatric Hospital of Nanjing Medical University, China. The cancer cohort consisted of various cancer types, including lung cancer (33), gastric cancer (21), colon cancer (18), esophageal carcinoma (4), breast cancer (2), hepatocellular carcinoma (4), duodenal carcinoma (2), cholangiocarcinoma (2), renal cancer (4), bladder cancer (2), pancreatic cancer (3), ovarian cancer (3), endometrial cancer (1), mediastinal tumor (1), tongue cancer (1), thyroid cancer (1), cholangiocarcinoma (2), prostatic cancer, lymphoma (1), metastatic hepatic carcinoma (1), and lymph node metastasis carcinoma (1). Clinical characteristics, including immunotherapy information, were collected. The clinical information of individuals with cancer and volunteers without cancer is listed in Supplementary Table
S2, 3. We performed plasma sample collection, shipping, and storage, cfDNA extraction, library preparation, and whole-genome sequencing (WGS) analysis uniformly as described in Supplementary Materials and Methods. In brief, the blood draw of the participants was performed from January 2022 to June 2022. The steps of cfDNA extraction, library preparation, and WGS were performed immediately after each other in batches by the College of American Pathologists (CAP)-accredited clinical laboratory (Beijing GenePlus Technology Inc., China). The study was approved by the ethics committee at the Affiliated Geriatric Hospital of Nanjing Medical University (approval no.014) and complied with the Declaration of Helsinki. All participants signed written informed consent forms.
cfDNA extraction and whole-genome sequencing
We performed plasma sample collection, cfDNA extraction followed by WGS, as described in Supplementary Materials and Methods. Briefly, the venous blood samples were collected during routine physical checks (healthy volunteers) or on the day of therapy, prior to the first treatment (cancer patients). All samples were collected, shipped, and processed uniformly. A total of 5–10 ng of plasma cfDNA per sample was subject to PCR-free WGS library construction with the VAHTS® Universal DNA Library Prep Kit for Illumina V3 (Vazyme). The libraries underwent paired-end sequencing on DNBSEQ-T7. To minimize bias, the sample operating team was blinded to the case or control status of the samples during the whole process.
Raw sequencing data processing was carried out as described in Supplementary Materials and Methods. The libraries in this study had a mean sequencing depth 5×. We extracted 4 bp-end-motif of cfDNA fragments from the WGS data for model construction. The cfDNA 4 bp-end-motif referred to the 5′ end 4 bp sequences, as reported by Jiang et al. (
2020). The proportion and frequency of each 4 bp-end-motif over the total motifs (256, 4
4) was calculated for each sample. Random forest (RF) models incorporating variable importance ranking were constructed and evaluated based on the training cohort using fivefold cross-validation as the resampling method to avoid overfitting of the model to new data. The predictive model’s performance was validated on the testing set.
Statistical analysis
For statistical analysis, the receiver operating characteristic (ROC) curves were generated using the pROC package (1.18.4). Based on true positive (TP), true negative (TN), false positive (FP), and false negative (FN) of cancer prediction, we calculated the sensitivity [TP/(TP + FN)], specificity [TN/(TN + FP)], positive (PPV) [TP/(TP + FP)] and negative predictive values (NPV) [TN/(TN + FN)], accuracy [ (TP + TN)/(TP + FP + TN + FN)], as well as their corresponding 95% confidence intervals. Heatmap clustering analysis was generated using the pheatmap package (1.0.12) in R. The Mann–Whitney test, Fisher's exact test, and ANOVA analysis were performed using SPSS, and the Wilcoxon test was conducted using R.
Discussion
In this report, we analyzed the 4 bp-end-motif feature derived from the WGS data, which identifies as the first 4-nucleotide sequence on each 5′ fragment end of plasma DNA after alignment to the reference genome (Jiang et al.
2020). Our findings demonstrated that profiling the distribution and sequences of cfDNA end motifs can effectively differentiate tumors from non-tumoral samples and can predict immunotherapy response.
In our study, the 4 bp-end-motif machine learning model, using the random forest algorithm, achieved an AUC of 0.983 (95% CI 0.96–1.00) and exhibited a sensitivity of 0.90 at a specificity of 0.87 in the testing cohort. In addition, the model showed excellent performance across various cancer stages, with high AUC values in both stage II-III (AUC: 0.97, 95% CI 0.94–1.0) and stage IV (AUC: 0.96, 95% CI 0.94–0.97). This performance surpassed that of other reported prediction models based on cfDNA end-motif profiles (Guo et al.
2022; Jiang et al.
2020; Wang et al.
2023a,
b). Moreover, the model’s performance varied across different cancer types, with gastrointestinal cancer group showing the highest AUC (0.97, 95% CI 0.94–0.99), followed by lung cancer group (AUC: 0.92, 95% CI 0.86–0.97) and finally the other cancer group (AUC: 0.86, 95% CI 0.78–0.94). We noted an upward trend in the distribution of cancer scores from stage II to stage IV. Taken together these findings that the possible reason of our model outperform other existing models might be related to the association with late tumor stage and cancer-related heterogeneity (Bettegowda et al.
2014; Liu et al.
2020; van der Pol and Mouliere
2019). It’s worth noting that our model exhibited some biases in detecting different tumor types, likely due to an uneven sample proportion used in the model construction.
Previous study has suggested that the proportion of tumor-derived DNA fragment size is enriched in certain cancer types (e.g., liver, colorectal, lung, and breast) that shed more ctDNA into the bloodstream, while detection rates are lower in low-shedding cancer types like renal, bladder, pancreatic, and glioma (Mouliere et al.
2018). Notably, our study encompassed some cancers with ‘low-shedding rates’ classified within gastrointestinal cancer and other cancer types. Remarkably, the prediction model, constructed using 4 bp-end-motif, yielded high AUC values across various cancer types, indicating its consistent capacity to distinguish patients with cancer from non-cancer subjects. As mentioned earlier, cfDNA methylation-based methods have been used for multi-cancer detection, exhibited sensitivities for stage II–IV ranging from 0.43 to 0.93 (Liu et al.
2020). The sensitivity of our model was 1.0 in stage II–III, and 0.97 in stage IV for all cancer types, which is significantly better than their report. Furthermore, methylation-based methods usually rely on immunoprecipitation enrichment or targeted enrichment, limiting their ability to simultaneously achieve high sensitivity, low cost, and deep sequencing coverage (Luo et al.
2021). In contrast, our approach, focusing on plasma DNA end motifs, offers the advantage of achieving maximal diagnostic power with a relatively small number of DNA molecules analyzed (Jiang et al.
2020). These findings indicate that the detection of cfDNA end motifs is both highly sensitive and broad-spectrum, making it suitable for multiple cancer diagnosis (Jiang et al.
2020).
Recently, research has highlighted the role of cfDNA in predicting tumor recurrence and guiding treatment. For instance, a study by Y. Wang et al. revealed that cfDNA fragmentomics can potentially predict the response to neoadjuvant chemoradiotherapy (nCRT) in locally advanced rectal cancer (AUC: 0.96) (Wang et al.
2023b). Powles et al. reported that patients with urothelial carcinoma who tested positive for ctDNA were more sensitive to adjuvant atezolizumab (Powles et al.
2021). Additionally, our study reanalyzed 44 patients who received immunotherapy, and found that the baseline plasma 4 bp-end-motif based model exhibited high sensitivity in distinguishing responders from non-responders, achieving a sensitivity of 0.846 at a specificity of 0.742. These findings highlight the potential of utilizing pre-treatment cfDNA end motifs to identify patients likely to benefit from immunotherapy. Notably, our study is the first to demonstrate the potential of cfDNA end motif profiles as therapeutic biomarkers for immunotherapy in multi-cancer patients. However, the underlying mechanisms behind these observations remain poorly understood. Recent data have suggested that variations in plasma DNA profiles were associated with the deletion of the deoxyribonuclease 1-like 3 (DNASE1L3) gene. Serpas et al. showed that genetically inactivating the DNASE1L3 gene led to changes in the relative frequencies of cfDNA 4-mer end motifs (Serpas et al.
2019). Patients with DNASE1L3 gene deletion exhibited aberrations in size and a reduction of a ‘CC’ end motif of plasma DNA (Chan et al.
2020). Furthermore, many human cancers exhibit down-regulation of DNASE1L3 expression, and DNASE1L3 deficiency in mice led to delayed tissue recovery, increased chronic inflammation, immune cell dysfunction, impaired antitumor immunity, and ultimately affected the antitumor immune responses (Li et al.
2023; Liu et al.
2021). These findings suggest that the ability of end motif profiling to predict immunotherapy response may be linked to immune function dysregulation mediated by aberrant DNASE1L3 expression. Several biomarkers used to guide therapy selection, such as programmed death-ligand 1 (PD-L1) expression, tumor mutation burden (TMB), and microsatellite instability (MSI), often rely on invasive tumor tissue biopsies (Anagnostou et al.
2022). Additionally, minimal residual disease (MRD) has gained attention as a ctDNA-based detection approach primarily used for monitoring recurrence and guiding adjuvant immunotherapy in early-stage disease (Chaudhuri et al.
2017; Powles et al.
2021; Tie et al.
2022).However, there are limited reports on its role as a predictive marker for immunotherapy in advanced cancers. Collectively, our study demonstrates that cfDNA end motif profiles offer significant promise for non-invasive assessment and provide a new avenue for predicting clinical outcomes.
Several limitations exist in this study. Firstly, despite the remarkable performance in detecting multi-cancers and predicting immune response, the underlying mechanism of cfDNA end motifs remains not fully understood, necessitating further research into their mechanisms. Furthermore, the imbalance of baseline characteristics (age/gender/BMI) and the limited size of our sample population may introduce bias. Expanding the sample size and equilibrating the demographics in future studies will enhance the statistical power and provide more accurate estimates of the model’s predictive performance. Additionally, our study lacks a benign lesion group of comparison, which means our results do not offer insights into the model’s ability to discriminate between cancer and benign lesions. Further validation with independent and large-scale cohorts is necessary to address this limitation.
In conclusion, our study supports the nation that detecting the 4 bp-end-motif feature holds promising clinical application for cancer detection and provides profound insights for the design of personalized treatment strategies.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.