Identification of pathogenic genes and transcription factors in respiratory syncytial virus

Background Respiratory syncytial virus (RSV) is a major cause of acute lower respiratory infections in children, especially bronchiolitis. Our study aimed to identify the key genes and upstream transcription factors in RSV. Methods To screen for RSV pathogenic genes, an integrated analysis was performed using the RSV microarray dataset in GEO. Functional annotation and potential pathways for differentially expressed genes (DEGs) were further explored by GO and KEGG enrichment analysis. We constructed the RSV-specific transcriptional regulatory network to identify key transcription factors for DEGs in RSV. Results From three GEO datasets, we identified 1059 DEGs (493 up-regulated and 566 down-regulated genes, FDR < 0.05 and |Combined.ES| > 0.8) between RSV patients and normal controls. GO and KEGG analysis revealed that ‘response to virus’ (FDR = 7.13E-15), ‘mitochondrion’ (FDR = 1.39E-14) and ‘Asthma’ (FDR = 1.28E-06) were significantly enriched pathways for DEGs. The expression of IFI27, IFI44, IFITM3, FCER1A, and ISG15 were shown to be involved in the pathogenesis of RSV. Conclusions We concluded that IFI27, IFI44, IFITM3, FCER1A, and ISG15 may play a role in RSV. Our finding may contribute to the development of new potential biomarkers, reveal the underlying pathogenesis and also identify novel therapeutic targets for RSV.

Similar to other respiratory viruses, RSV infection is absorbed by airway epithelial cells, alveolar macrophages, and intraepithelial dendritic cells, induces direct antiviral responses through cytokines and chemokines, and initiates adaptive immune responses [6]. The severity of RSV infection is partly due to currently known risk factors, including medical complications and young age [7]. However, most infants hospitalized for respiratory syncytial virus infection have reportedly been previously healthy and have no risk factors for serious illness [8,9]. Therefore, currently known risk factors do not fully explain the significant variability of the severity of the disease.
Thus, it is important to find biomarkers related to the diagnosis of RSV.
In our study, we performed an integrated analysis of three gene expression datasets to identify the DEGs and transcription factors (TFs) associated with RSV. We identified the differentially expressed genes (DEGs) and TFs of RSV in this integrated analysis. Functional annotation and PPI network construction were performed to explore the biological function of DEGs. Our purpose is to provide clues to reveal the underlying mechanism of RSV and further develop potential new diagnosis and treatment for RSV.

Microarray expression profiling in GEO and identification of DEGs in RSV
The gene expression profiles of children RSV were gained from GEO database with following key search terms: ("respiratory syncytial viruses"[MeSH Terms] OR Respiratory syncytial virus [All Fields]) AND "Homo sapiens"[porgn] AND "gse" [Filter]. Datasets meet the following criteria would be included in our study: (1) selected datasets should be whole-genome mRNA expression profile by array; (2) these data were derived from blood samples of patients with RSV and normal controls; (3) datasets were normalized or original.
After downloading the selected datasets, we deleted the undetectable gene (ie, the genes whose expression value was less than 0 was more than 20% of the total sample size). There were 8834 genes in the intersection of the three datasets. For each dataset, log2 is converted to scale standardization. MetaMA was applied to obtain the DEGs. Genes with FDR < 0.05 and |Combined.ES| > 0.8 were selected as DEGs.

Functional annotation of DEGs and PPI network construction
GeneCoDis3 was employed to perform GO and KEGG pathway enrichment analysis. The threshold of FDR < 0.05 was considered as significant. Top 50 up-and down-regulated DEGs were searched with the BioGrid, and PPI network was constructed with Cytoscape software.

Construction of TF regulatory network
With UCSC Genome Bioinformatics (http://genome. ucsc.edu), the corresponding promoters of the top 20 up-regulated or down-regulated DEGs were acquired. Transcription factors (TFs) involved in regulating these DEGs were collected from the match tools in TRANSF AC. The transcriptional regulatory network was visualized by using Cytoscape software.

QRT-PCR confirmation
We collected blood samples from three RSV patients and three healthy children, and RNA samples were isolated from which to verify the expression level of candidate genes using qRT-PCR. The clinical characteristics of individuals included in this study were displayed in Table S1. We obtained the written informed consent from every participant and the approval from the ethics committee of The Affiliated Hospital of Qingdao University (QYFY W2LL25724). The human 18srRNA was used as endogenous control in analysis.
Validation in the GEO dataset and receiver operating characteristic (ROC) analysis GSE34205, GSE38900, GSE42026 and GSE105450 were downloaded from GEO database. GSE34205 performed on GPL570, including 22 healthy controls and 51 RSV. GSE38900 performed on GPL10558, including 8 healthy controls and 28 RSV. GSE42026 performed on GPL6947, including 33 healthy controls and 22 RSV. GSE105450 performed on GPL10558, including 38 healthy controls and 89 RSV. The same data processing was performed for these four datasets as for the integration analysis. The expression levels of selected DEGs were validated with these four datasets. Then, by using pROC package in R language, we performed the ROC analysis to assess the diagnostic value of DEGs. The area under the curve (AUC) was further calculated.

Differential expression analysis of genes in RSV
After filtering, a total of three datasets (GSE103842, GSE80179 and GSE77087) were retained for the analysis, the details of these three datasets were shown in the  Figure S1). By integrated analysis, 1059 DEGs (493 up-and 566 down-regulated) were obtained in RSV with FDR < 0.05 and |Combined.ES| > 0.8. Among them, IFI27 and MEGF6 was the most up-and down-regulated genes, respectively ( Table 2). The heatmap of top 100 upand down-regulated DEGs produced by cluster analysis is shown in Fig. 1.

Functional annotation
In Fig

Discussion
RSV is the most common viral pathogen causing acute lower respiratory tract infections in infants, children and older people [10]. In this study, we performed an integrated analysis using data obtained from the GEO database. KEGG, GO and other biological information databases, and R analysis tools were used to analyze the DEGs. We obtained 1059 DEGs in RSV (493 genes were upregulated, 566 genes were down-regulated). We also identified important signaling pathways that affect the pathogenesis of RSV such as 'response to virus' and 'Asthma'. In addition, based on the promoter sequence of DEGs obtained from UCSC, a TF regulatory network was constructed using the match tool of the TRANFAC website to obtain the corresponding TFs.
IFI27 is a hydrophobic mitochondrial protein composed of 122 amino acid [11]. IFI27 belong to a group of small interferon stimulated genes (ISGs) [12,13]. Rosebeck and Leaman et al. reported that IFI27 maintains a low background expression in various mammalian cells and participates in a variety of biological processes, including apoptosis and congenital immunity [14,15]. IFI27 expression was elevated in the psoriatic lesions and uterine fibroids, ovarian cancer, and other diseases [16,17]. It has also been shown to have a direct antiviral effect against certain viruses [18]. Hans-Olav Fjaerli et al. reported the gene IFI27 is up-regulated in whole blood of infants hospitalised with RSV [19]. According to our study, up-regulated IFI27 was among the top 20 differentially expressed mRNAs and was enriched in the GO item mitochondrion (FDR = 1.39E-14).
IFI44 is a member of the type I interferon-inducible gene family. Microtubule-associated protein 44 (IFI44) has been reported to be antiproliferative [20]. IFI44, also termed interferon-inducible protein 44 or p44 as it Fig. 8 The ROC curves of DEGs in RSV. The ROC curves were used to show the diagnostic ability of these selected DEGs with sensitivity (the proportion of true positive) and 1-specificity (the proportion of false positive). The x-axis shows 1-specificity and y-axis shows sensitivity. a IFI27, b IFI44, c IFITM3, d FCER1A, e EEF2, f ISG15 aggregates to form microtubular structures, is part of the type I IFN-inducible gene family. Its promoter region contains an IFN-α stimulation responsive elements, which can mediate type I IFN-inducible gene pathway [21]. Jacqueline U. McDonald et al. identified IFI44 gene serve as potential targets for future investigation in RSV disease [22]. In our study, IFI44 was up-regulated and among the top 20 differentially expressed mRNAs, which support the previous researches. Furthermore, IFI44 was enriched the GO term response to virus (FDR = 7.13E-15).
IFITM3 is a member of the interferon-inducible transmembrane protein family, which play a role in regulating antiviral signaling, inflammation, and somatogenesis [23]. In the IFITM3 knockout mouse model, IFITM3 has been reported to inhibit RSV cell infection and control the pathogenesis of the disease [24]. In our integrated analysis, IFITM3 was up-regulated and among the top 20 differentially expressed mRNAs.
The Fc fragment of IgE, a high affinity I, is a receptor for alpha polypeptides, also known as FCER1A, a protein encoded by the FCER1A gene in humans [25]. Highaffinity IgE receptors play an important role in allergic diseases, coupled allergens, and mast cells, triggering inflammation and immediate allergic reactions, which are characteristic of diseases such as hay fever and asthma. Infants with severe RSV infections will subsequent develop asthma later during childhood [26]. In the KEGG analysis, the item of 'Asthma' (FDR = 1.28E-06) was significantly enriched and the down-regulated FCER1A was enriched in this pathway. In addition, FCER1A was among the top 20 differentially expressed mRNAs. Moreover, in the transcription factors regulation network, FCER1A (degree = 10) was among the top 10 targeted genes with high degree.
IFN-stimulated genes (ISGs) produce an antiviral state that plays an important role in determining host innate and adaptive immune responses [27]. One of the most highly induced genes in the IFN response is ISG15, which encodes a 17 kDa small UBL protein that forms a covalent conjugate with cellular proteins that mediate a large number of antiviral responses [28,29]. Rubén González-San et al. found that ISG15 is up-regulated in respiratory pseudostratified epithelial cells and infant nasopharyngeal lavage fluids infected with RSV [30]. In our results, ISG15 was up-regulated and was the hub protein in the PPI network.

Conclusion
In conclusion, five DEGs (IFI27, IFI44, IFITM3, FCER1A, and ISG15) were identified to be involved in RSV. From the three GEO datasets analyzed, we identified 1059 DEGs (493 up-regulated and 566 downregulated genes) between RSV and normal controls. Our findings may contribute to the elucidation of new potential biomarkers, reveal the underlying pathogenesis and identify novel therapeutic targets for the treatment of RSV. Our study also had limitations. The samples used for study was blood samples for a mucosa-limited infection disease, and no functional experiments was performed to validate the results. To confirm the exact function of the biomarkers found in this study, more samples will be collected and more in deep research on functional experiments will be included in our future work.
Additional file 1: Figure S1. PCA of three datasets used in this study.
Additional file 2: Table S1. Clinical features of patients with RSV and controls.