Background
Acute lymphoblastic leukemia (ALL) is the most common malignancy in children, accounting for approximately 30% of all childhood cancers and approximately 80% of all childhood leukemia. Of these incidences of childhood leukemia, 80% are derived from B-cell progenitors (B-ALL) [
1]. B-ALL has one of the highest cure rates of pediatric malignancies [
2,
3]. However, although most patients can be cured, higher performance therapies for ALL are required to improve the unfavorable outcomes. Thus, it is necessary to achieve an improved understanding of its potential pathogenesis.
Transcriptome profile analysis is an advanced method that has been applied to reveal the etiology underlying the specific biological processes of ALL. It benefits from ribose nucleic acid sequencing (RNA-seq), which is the most extensively used technology for detecting transcriptomes [
4]. Previous studies comparing the pathophysiologies of ALL to chronic lymphocytic leukemia (CLL) or comparing the molecular pathogenesis features of adult and pediatric ALL, have focused on analyzing the expression profiles of microRNAs (miRNAs), messenger RNA (mRNA), and miRNA/mRNA networks [
5‐
7]. Long non-coding RNAs (lncRNAs), which are one of the most studied non-coding RNA types, are over 200 nucleotides in length. LncRNAs can act as gene expression modulators at the epigenetic, transcriptional, and post-transcriptional levels. While the roles of lncRNAs in ALL and their underlying mechanisms have been continuously investigated, related data of research articles remain scarce. Fernando et al. demonstrated that the lncRNA CASC15 is fundamental for the cellular survival and proliferation of ALL cells, as it regulates SOX4 expression [
8]. Another study has suggested that the lncRNA BALR-6 may have an effect on proliferation and apoptosis in ALL cell lines [
9]. In addition, in vitro studies have shown that the lncRNA MALAT1 affects ALL cell proliferation and apoptosis via regulating the miR-205-PTK7 axis. Furthermore, the lncRNA CRNDE has been shown to upregulate CREB expression by suppressing miR-345-5p, promoting cell proliferation, and reducing cell apoptosis in ALL [
10,
11]. All of these above studies indicate that lncRNAs could affect the expression of mRNAs via co-expression networks. Nevertheless, the role of lncRNA-mRNA networks in the pathogenesis of B-ALL is not fully understood. This study analyzed the expression profiles of lncRNAs and mRNAs in B-ALL, compared with non-leukemic blood disease controls, by using Next Generation Sequencing (NGS). Specific focus was given to differential mRNA and lncRNA expression levels and lncRNA-mRNA networks to identify the potential underlying mechanism of B-ALL. This approach may provide novel insights into further functional studies, and may reveal new biomarkers for the diagnosis of B-ALL.
Materials and methods
Patients and samples
Bone marrow samples were obtained at diagnosis from ten pediatric B-ALL patients. Further samples from ten pediatric donors with common blood diseases (CBDs), diagnosed as anemia and agranulocytosis; these were used as controls to lessen the potential variation. All samples were collected from patients at the Affiliated Wuxi Children’s Hospital of Nanjing Medical University. The pediatric B-ALL patients included in present study were diagnosed in accordance with the revised French-American-British classification and by analysis of leukemic cells with respect to morphology, immunophenotype, and cytogenetics. Bone marrow samples were submitted to cell separation using Ficoll-Paque (GE Healthcare, USA) and the isolated mononuclear cell fraction was used for the subsequent extraction of the total RNA. The study protocol was approved by the institutional review Board of the Affiliated Wuxi Children’s Hospital of Nanjing Medical University (WXCH 2020–08-004). This trail was conducted basing on the written informed consent of all patients, and particularized information is summarized in Supplementary Table S
1.
RNA extraction, strand-specific library construction and sequencing
Total RNA from mononuclear cells was extracted using a Trizol reagent kit (Invitrogen, Carlsbad, CA, USA) consistent with the manufacturer’s instructions. RNA quality was appraised using an Agilent 2100 Bioanalyzer (Agilent Technologies, Palo Alto, CA, USA). After total RNA was extracted, rRNAs were removed to retain mRNAs and non-coding RNAs (ncRNAs). The enriched mRNAs and lncRNAs were fragmented into short pieces using fragmentation buffer, and were then reverse transcribed into complementary deoxyribose nucleic acid (cDNA) using random primers. Second-strand cDNA were synthesized using DNA polymerase I, RNase H, dNTP (dUTP instead of dTTP), and buffer. Next, the cDNA fragments were purified using a QiaQuick polymerase chain reaction (PCR) extraction kit (Qiagen, Venlo, The Netherlands). They were then end repaired, poly (A) added, and bound to Illumina sequencing adapters. Then, the digested second-strand cDNA using Uracil-N-Glycosylase (UNG) were size selected by programming steps including agarose gel electrophoresis, PCR amplified, and sequenced using Illumina HiSeqTM 4000 (or other platforms) by Gene Denovo Biotechnology Co. (Guangzhou, China). The datasets used during our study are available through HARVARD Dataverse Persistent ID doi:
https://doi.org/10.7910/DVN/LK9T4Z.
Filtering of clean reads
All following data processing and analysis were performed according to the schematic illustration in Supplementary Fig. S
1. Reads obtained from the sequencing machines included raw reads containing adapters or, which could have affected the following assembly and analysis. To procure high quality clean reads, raw reads including low quality bases or adapters that may interfere subsequent analysis were filtered using fastp (version 0.18.0). The parameters were as follows: 1) removing reads containing adapters, 2) removing reads containing more than 10% of unknown nucleotides (N), and 3) removing low quality reads containing more than 50% of low quality (Q-value≤20) bases.
Differentially expressed transcripts (DETs) analysis
The DETs of coding RNAs and lncRNAs were analyzed respectively. DESeq2 software was used to analyze differential expression of mRNAs and lncRNAs between two different groups (CBD group versus B-ALL group). Those transcripts with the parameters of a false discovery rate (FDR) of < 0.05 and an absolute fold change (FC) of ≥2 were considered to be differentially expressed. Enrichment analysis of Gene Ontology (GO) functions, Kyoto Encyclopedia of Genes (KEGG) pathways, Disease Ontology (DO), Reactome Enrichment Analysis, and Gene Set Enrichment Analysis (GESA) were performed by choosing the differentially expressed coding RNAs.
Functional enrichment analysis
GO enrichment analysis was performed basing on three terms including biological processes, cellular components and molecular functions (
http://www.geneontology.org/). KEGG enrichment analysis distinguished significantly enriched biological pathways. DO enrichment analysis identified significantly enriched human disease DO terms in DETs. Reactome enrichment analysis identified significantly enriched reactions including signaling, innate and acquired immune function, transcriptional regulation, translation, apoptosis, and classical intermediary metabolism in DETs. GESA analysis was performed using the software packages GSEA and MSigDB to identify genes in specific GO terms\pathways\DO terms showing significant differences between the two groups. All above analyses meeting the condition of
p < 0.05 were considered to be significant.
LncRNA-mRNA association analysis
The lncRNA-mRNA association analysis was based on lncRNA trans-regulation analysis to reveal the interaction between lncRNA and mRNA and identify target genes of lncRNAs.
Validation of RNA-seq results
A number of mRNAs and lncRNAs between the paired groups were selected to validate the RNA-seq data by quantitative real time PCR (qRT-PCR). After determining the best annealing temperatures, qRT-PCR was performed on the ABI 7500 system (Applied Biosystems, Foster City, CA,USA) to detect the relative expression levels of mRNA and lncRNA using SYBR Premix ExTaq II (DRR820; TaKaRa). The relative expression levels of each mRNA/lncRNA were calculated using comparative cycle threshold method (∆∆Ct). Standardization was performed based on the internal reference of GAPDH. All data are presented herein as the average of three independent experiments.
Statistical analysis
Hypergeometric Test was used to calculate the enrichment of GO terms, KEGG pathways, and DO enrichment analysis to show the significance of enrolled transcripts. Quantitative data was assessed with the use of two independent samples t-test. Data are presented herein as the mean ± standard deviation. The edgeR software package (version 3.28.1) was used to identify differentially expressed lncRNAs and mRNAs between paired groups. p < 0.05 was considered to indicate a statically significant difference.
Discussion
Previous studies regarding the transcript results of B-ALL patients have identified a number of lncRNAs and vital protein–coding genes that are associated with the etiology or clinical outcome of B-ALL [
17‐
19]. To the best of the authors’ knowledge, although some studies have performed transcriptome profile analysis to investigate the role of lncRNA in the pathogenesis of B-ALL, the small sample sizes of parallel controls and the restricted numbers of isolated cell types limited the general conclusions of these studies [
20‐
22]. All above related reports were summarized in Supplementary Table S
6. Moreover, microarray data prevented the identification of novel transcripts. In the present study, a comprehensive analysis of lncRNA and mRNA profiling data was conducted between B-ALL patients and CBD controls by NGS. Several differently expressed lncRNAs and mRNAs were identified, in which some were validated by qRT-PCR; functional annotations were also carried out. Furthermore, lncRNA-mRNA network analyses were preformed to extract candidate lncRNAs interacting with the aberrantly expressed mRNAs, revealing that five key lncRNAs were co–expressed with at least seven mRNAs in the network, bases on a very high threshold of 0.99 for the Pearson’s correlation. These findings could promote the understanding of the functional mechanism through which lncRNA interacts with mRNA, thus playing an important role in B–ALL pathogenesis.
The biological functions and potential pathways of mRNAs associated with differentially expressed lncRNAs were preliminarily predicted by GO and KEGG pathway analyses. Extraordinarily, significantly enriched pathways were closely associated with signal transduction, metabolic process, infections, and the immune system, which all play important roles in B-ALL development and disease treatment. The KEGG and co-expression analysis results showed that the Rap1 signaling pathway was exclusively highly enriched in B-ALL, when compared to controls. It was demonstrated that the Rap1 signaling pathway is associated with leukemia cell adhesion and migration, suggesting that Rap1 could participate the process of notch activation and leukemogenicity of T-ALL. This suggests that the Rap1 signal may function in the notch-dependent B-ALL development and its progression [
14,
23,
24]. Infante et al. reported that Rap1b is a subtype of Rap1, and that its depletion could be used to reduce tissue invasion in T-ALL. Furthermore, another integrative network analysis of pediatric acute lymphoblastic leukemia revealed that gene expression and methylation consistently targeted the Rap1 signaling pathway [
25]. In addition, the direct activation of Rap1 can lessen the cellular actions of leptin and correct glucose imbalance in obese mouse, which suggests that Rap1 is associated with energy metabolism [
26]. VEGFA and FLT1, two differentially expressed genes, presented the highest correlation score of 0.99 by STRING in the Rap1 signaling pathway. Previous studies have shown that increased levels of Helios Treg cells promoted angiogenesis in the bone marrow of ALL mice via the VEGFA/VEGF receptor 2 pathway [
27]. Furthermore, FLT-1 is generally expressed in pediatric ALL and VEGFA/FLT-1 signaling enhances the migration and survival of leukemia [
28,
29]. FLT-1 activation on ALL cells results in cell migration and proliferation in vitro. In addition, FLT-1 neutralization affects leukemia localization, increases leukemia apoptosis, and impedes the exit of ALL cells, thus prolonging the survival of inoculated mice. All of the above results indicate that the VEGFA/FLT-1 interaction could play important roles in regulating the development of B-ALL. PRKCZ is a calcium- and diacylglycerol-independent serine/threonine protein kinase that functions in the phosphatidylinositol 3-kinase (PI3K) pathway and the mitogen-activated protein (MAP) kinase cascade. It is also involved in NF-kappa-B activation, mitogenic signaling, cell proliferation, and inflammatory response. AKT, PI3K2CB, and some other genes also feature in the pathway upstream of PRKCZ. The PI3K/Akt and NF-κB signaling pathways have been extensively demonstrated to participate in the regulation of migration, and in the viability of leukemic cells, both in vivo and in vitro [
30‐
32]. It has also been reported that the knockdown of PRKCZ leads to more rapid losses of DNA mismatch repair enzyme (MSH2) proteins, resulting in significant reductions in DNA mismatch repair (MMR) and increased resistance to thiopurines for ALL [
33]. Arsenic trioxide (ATO) has been demonstrated to interrupt the function of the PI3K/Akt pathway in ALL; PRKCZ may be responsible for the provocation of resistance to ATO [
34]. The most aberrant upregulated lncRNA, ENST00000473898, showed a high correlation score of 0.95 with PRKCZ, suggesting that an in-depth study should be conducted to elucidate its potential regulatory mechanism.
The lncRNA-mRNA network constructed in the present study revealed that some lncRNAs may exert significant effects in B-ALL by regulating downstream mRNAs. Affinito et al. specially constructed a lncRNAs-mRNAs co-expression network in childhood B-ALL, based on previous RNA-sequencing experiments [
35]. In said experiments, gene expression profiling using RNA-seq of leukemic cells purified from three pediatric B-ALL patients (compared with that observed in mature B cells from the peripheral blood of three healthy donors) highlighted that 24 key lncRNAs and their co-expressed mRNAs may play important roles in B-ALL pathogenesis [
36]. Unlike this study, however, here larger sample sizes of B-ALL patients were used alongside quantity-matched controls. Furthermore, the control group used the same cell type of mononuclear cells derived from bone marrow. Nevertheless, there is a potential limitation for our study. Considering the difficulty in obtaining the healthy control specimens, we previously ignored the real problem that both healthy and CBD bone marrow mononuclear cells are a mixture of different categories of cells including monocytes, lymphocytes, hematopoietic stem cells, and progenitor cells. Hence, the comparison we performed, in fact, is to identify the differential expression profiles of leukemic B-cell precursors versus a heterogeneous population of cells. To some extent, CBD in our study represented the average lncRNA and mRNAs expression profiles of mononuclear cell system.
In addition, trans-patterns were used to predict the co-expression network of lncRNAs with mRNAs, unlike the guilt–by–association approach of the aforementioned previous study. The present study also obtained more high thresholds of FDR (< 0.01) and high Pearson correlation scores (0.99). Here, five novel lncRNAs were identified, which interacted with at least seven mRNAs, including FAT1, SOX11, and EYA4. The functional roles of these lncRNAs remain unknown. Nevertheless, according to the alternative mRNAs in coexpression networks, these lncRNAs may exert exclusive characters by regulating downstream mRNAs. Among them, FAT1 was highly expressed in a large proportion of cases of T-ALL and B-ALL was implicated in Wnt signaling and hippo signaling. FAT1 was also found to cooperate with NOTCH in driving T-ALL in vivo; this suggests that it might be a potential biomarker in carcinogenic roles [
37,
38]. Moreover, previous studies have shown SOX11 to be a useful diagnostic marker for mantle cell lymphoma [
39]. It has been reported that the high expression of SOX11 leads to alterations of gene expression that are typically associated with cell adhesion, migration, and differentiation. Furthermore, its expression marks a group of patients with good outcomes [
40]. Furthermore, Huang et al. demonstrated that AML1-ETO-fused protein triggers the epigenetic silencing of the EYA4 gene, contributing to leukemogenesis in t (8;21) AML. This suggests that the EYA4 gene might be a novel therapeutic target for AML. In addition, although here the co–expression lncRNA–mRNA pairs were captured from RNA–Seq data, it is well established that co–expression correlation does not imply causation. Thus, gene perturbation experimental data would be necessary to gain insight into the possible regulatory relationships, or to infer causal gene co–expression patterns.
Conclusions
In conclusion, the present study identified differentially expressed lncRNAs and mRNAs between CBD and B-ALL, via NGS. Some of these genes were randomly selected and assessed by qRT-PCR in vitro. Go and KEGG analyses were performed, and the obtained co-expression network indicated that the lncRNAs MSTRG.27994.3, MSTRG.21740.1, ENST00000456341, MSTRG.14224.1, and MSTRG.20153.1 may have bioactive effects in B-ALL. Therefore, the present study may provide pathological sagacity into the mechanisms underlying B-ALL. More importantly, these results provide a basis for further studies into the function and mechanism of special mRNAs and lncRNAs in B-ALL. However, further functional investigations are required to confirm these results.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.