Integrative Meta-Analysis of Multiple Gene Expression Profiles in Acquired Gemcitabine-Resistant Cancer Cell Lines to Identify Novel Therapeutic Biomarkers

In molecular-targeted cancer therapy, acquired resistance to gemcitabine is a major clinical problem that reduces its effectiveness, resulting in recurrence and metastasis of cancers. In spite of great efforts to reveal the overall mechanism of acquired gemcitabine resistance, no definitive genetic factors have been identified that are absolutely responsible for the resistance process. Therefore, we performed a cross-platform meta-analysis of three publically available microarray datasets for cancer cell lines with acquired gemcitabine resistance, using the R-based RankProd algorithm, and were able to identify a total of 158 differentially expressed genes (DEGs; 76 up- and 82 down-regulated) that are potentially involved in acquired resistance to gemcitabine. Indeed, the top 20 up- and down-regulated DEGs are largely associated with a common process of carcinogenesis in many cells. For the top 50 up- and down-regulated DEGs, we conducted integrated analyses of a gene regulatory network, a gene co-expression network, and a protein-protein interaction network. The identified DEGs were functionally enriched via Gene Ontology hierarchy and Kyoto Encyclopedia of Genes and Genomes pathway analyses. By systemic combinational analysis of the three molecular networks, we could condense the total number of DEGs to final seven genes. Notably, GJA1, LEF1, and CCND2 were contained within the lists of the top 20 up- or down-regulated DEGs. Our study represents a comprehensive overview of the gene expression patterns associated with acquired gemcitabine resistance and theoretical support for further clinical therapeutic studies. DNA damage, reduced apoptosis, and altered metabolism of drugs, and (iii) increased energy-dependent efflux of hydrophobic drugs.


Introduction
DNA damage, reduced apoptosis, and altered metabolism of drugs, and (iii) increased energy-dependent efflux of hydrophobic drugs.
Gemcitabine (2, 2´-difluorodeoxycytidine) is a deoxycytidine analog with a broad spectrum that was established as the first-line chemotherapeutic treatment for locally advanced and metastatic pancreatic cancer in the late 1990s; it has generally been used to treat a variety of solid tumors, including breast, ovarian, pancreatic and non-small cell lung carcinoma, especially in combination with the platinum-based drugs, cisplatin and carboplatin (Toschi et al., 2005;Mini et al., 2006;Toschi and Cappuzzo, 2009;Zhang et al., 2013;de Sousa Cavalcante and Monteiro, 2014). Gemcitabine is first transported into the cancer cell by nucleoside transporters, which include the concentrative and equilibrative transporters. When phosphorylated by deoxycytidine kinase (DCK) to generate its active forms, the diphosphate and triphosphate, gemcitabine interferes with DNA replication and inhibits cancer cell growth, by modulating dNTP pools via the inhibition of ribonucelotide reductase (composed of RRM1 and RRM2 polypeptides). Several mechanisms of acquired gemcitabine resistance (AGR) in cancers have been reported; these are anatomical (e.g., desmoplasia, epithelial-mesenchymal transition (EMT), and inherent cancer stem cell resistance), pathophysiological (e.g., abnormal tumor growth, tumor angiogenesis, altered cancer cell survival, and anti-apoptosis), or pharmacological (e.g., the necessity of phosphorylation for prodrug activation) (Nakano et al., 2007;Toschi and Cappuzzo, 2009;Tufman and Huber, 2010;Hung et al., 2012;de Sousa Cavalcante and Monteiro, 2014). However, none of these mechanisms was confirmed as the etiology of AGR, and the fundamental reason for-and exact process of-AGR is still being studied, in many different ways. Despite gemcitabine's relatively broad and frequent use, the acquired resistance to it during cancer treatment is a common phenomenon that is caused by the establishment of complex crosstalk between many different cellular pathways. High-throughput microarray technologies, based on genome-wide probe selection and regression analysis, are widely used to elucidate global gene expression in complex diseases such as cancer. By applying these technologies, researchers have improved understanding of the cellular and molecular changes occurring during the development of acquired drug resistance in cancers. In three previous studies that used microarrays, various differentially expressed genes (DEGs) were identified as candidate factors that may influence AGR (Tooker et al., 2007;Saiki et al., 2012). However, published lists of identified genes show large inconsistencies because of the small sample size, low sample quality, and different laboratory protocol and platform in each individual study. In order to overcome these limitations, we identified DEGs that consistently appeared in all pairwise samples by meta-analysis of multiple microarray datasets, and performed integrative analysis of systemic molecular networks at the gene and/or protein level, in order to establish a theoretical framework for prospective molecular biological and clinical experiments. To our knowledge, we are the first to perform a cross-platform meta-analysis of the gene expression profiles associated with AGR in different cancer cell lines.

Selection of microarray datasets eligible for meta-analysis
We conducted a narrow search of microarray datasets for meta-analysis, according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines published in 2009. We collated data from microarray gene expression studies related to acquired-gemcitabine-resistant cancer cell lines from the PubMed, National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/), and European Bioinformatics Institute ArrayExpress (http://www. ebi.ac.uk/arrayexpress/) databases. For objective assessment, two independent reviewers extracted data from the original studies; any discrepancies that arose between these reviewers were resolved either by consensus or by consultation with a third reviewer. The keywords "gemcitabine," "gemcitabine cancer," "acquired gemcitabine resistance (resistant)," "acquired drug resistance (resistant)," and "gene and/or expression and/or profile" were used in the search for studies. We included a study in the analysis if it contained the following: (i) gene expression profiling of gemcitabine-resistant cancer cell lines or gemcitabine-resistant derivatives of cancer cell lines that had been generated by stepwise selection, and (ii) sufficient data and the correct platform to facilitate meta-analysis. Studies that reported non-human data or used intrinsically drug-resistant cells were excluded from the meta-analysis.

Meta-analysis of multiple microarray datasets
We carried out a cross-platform meta-analysis of gene expression profiles in the selected microarray datasets using the rank product method (RankProd package in the R software, http://www.r-project.org/) implemented in the web-based INMEX tool for meta-analysis of expression data (http://www.inmex.ca/INMEX/) (Xia et al., 2013;Song et al., 2014;Toro-Dominguez et al., 2014). Before meta-analysis of the three datasets, all genes or probe IDs from each dataset were annotated as Entrez database IDs, for data consistency, and the expression values for corresponding genes in the samples were log 2 -transformed and quantile-normalized so that they had zero mean and unit variance. According to the RankProd algorithm non-parametric permutation test, which considers all possible pair-wise comparisons, the DEGs that appeared consistently in whole datasets were assigned to a higher rank, depending on the percentage of false-positives predicted in a given number of replicates, multiplied across the different microarray datasets.

GO hierarchy and KEGG pathway enrichment analysis
In order to discern biological attributes of the identified DEGs in the acquired-gemcitabine-resistant cancer cell lines, functional enrichment via Gene Ontology (GO) hierarchy and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis was performed using Database for Annotation, Visualization, and Integrated Discovery (DAVID) bioinformatics resources (http://david.abcc. ncifcrf.gov/), with a significance threshold of p<0.05.

Gene regulatory network analysis
In order to determine the gene regulatory network of the identified DEGs, we carried out enrichment analysis for potential transcription factors and microRNAs, based on a comparison of upstream DNA sequences with an assembly of previously discovered gene sets retrieved from the Molecular Signatures database (MSigDB, http:// www.broadinstitute.org/gsea/msigdb/index.jsp), with a significance threshold of p<0.05 (Shi et al., 2014). The hypergeometric algorithm and Benjamini-Hochberg adjustment were used for statistical processing and multiple-test revision of the network analysis, respectively.

Gene co-expression network analysis
In order to predict the biological activity of the identified DEGs at the gene level, we constructed a Integrative Meta-analysis of Multiple Gene Expression Profiles in Acquired Gemcitabine-Resistant Cancer Cell Lines gene co-expression network of the top 50 up-and downregulated DEGs using the GeneMANIA web server (http://www.genemania.org/) (Warde-Farley et al., 2010;Molina-Navarro et al., 2014). The relationships between genes in the network were determined by GO term (biological process)-based weighting, and filtered by including only gene co-expression relationships with a significance threshold for weight value of p>0.05. Within the network, distinct modules were identified based on the fast-greedy HEN (G) algorithm, using the Community Clusters GLay plug-in (http://cytoscape.wodaklab.org/ wiki/CommunityStructureLayout) in the Cytoscape software (http://www.cytoscape.org/) (Gupta et al., 2012;Firoz et al., 2014). Overrepresented biological functions within each module were examined using the functional enrichment analyses in the DAVID and g:Profiler (http:// biit.cs.ut.ee/gprofiler/) programs.

Protein-protein interaction network analysis
In order to predict the biological activity of the identified DEGs at the protein level, the top 50 (20) up-and down-regulated DEGs were imported into a protein-protein interaction (PPI) network downloaded from the Biological General Repository for Interaction Datasets (BioGRID, http://thebiogrid.org/, (Pan, 2012). The PPI network was screened on a genome-wide scale using the Cytoscape software. Within the PPI network, we identified the hub proteins of distinct protein clusters using the Cytoscape plug-in, ClusterONE (http://apps. cytoscape.org/apps/clusterone) (Nepusz et al., 2012). Overrepresented biological functions within each protein cluster surrounding a hub protein were examined by the functional enrichment analyses in the DAVID and g:Profiler online programs.

Selection of microarray datasets related to AGR for meta-analysis
From microarray datasets in the NCBI GEO database, we extracted 22 GEO samples within three GEO series (GSEs) related to AGR that met our criteria (see Materials and Methods; Figure 1A). All three GSEs were solely derived from cancer cell lines that had acquired drug resistance by stepwise treatment with gemcitabine, such as lung adenocarcinoma, pancreatic cancer, and epidermoid carcinoma (Table 1). With regard to the microarray platforms used, GSE 35141 was obtained using Illumina BeadChip (Illumina, San Diego, CA) and the other two datasets (GSE 6914 and GSE 3344) were obtained using Affymetrix GeneChip (Affymetrix, Santa Clara, CA).

Identification of up-or downregulated DEGs in the metaanalysis
From cross-platform microarray meta-analysis based on the RankProd algorithm, we identified a total of 158 DEGs, including 76 up-and 82 down-regulated genes with a significance threshold of p<0.05. The top 20 upand down-regulated genes among the total DEGs are listed in order of significance (by p value) in Table 2. The up-regulated DEGs with the lowest value of p<1.0×10 -10 were CALB1, ADAM28, TRIM22, MSMB, TLE4, INHBB, ADH1C, IL1R2, and TRIM21. The down-regulated DEGs with the lowest observed value of p=0.00250 were ARHGAP29 and PTX3. The up-and down-regulated DEGs with the largest mean log 2 fold change were CALB1 (calbindin 1, 28 kDa) and ARHGAP29 (Rho GTPase activating protein 29), respectively. In order to interpret the biological significance of the identified DEGs in different cancer cell lines with AGR, we attempted the systemic approach of using various in silico analyses that might identify gene regulation, gene co-expression, and PPI networks, accompanied by functional enrichment analysis ( Figure 1B).

Functional and pathway enrichment analysis of all the identified DEGs
A total of 158 DEGs identified by the meta-analysis were classified according to GO hierarchy functional category (biological process, molecular function,  and cellular component) and KEGG pathway, with a significance threshold of p<0.05 (Table 3). The most overrepresented GO terms under biological process were enriched in the following descending order: "Defense response" (GO 0006952), "Response to extracellular stimulus" (GO 0009991), and "Response to drug" (GO 0042493). The most enriched GO terms under molecular function and cellular component were "Tumor necrosis factor receptor superfamily binding" (GO 0032813) and "Extracellular region (GO 0005576). The most enriched KEGG pathway terms were (descending order): "Cytokine-cytokine receptor interaction" (Hsa 04060), "Metabolism of xenobiotics by cytochrome P450" (Hsa 00980), and "Regulation of actin cytoskeleton" (Hsa 04810).

Gene regulation network analysis of the top 50 up-and down-regulated DEGs
In order to identify the network regulating gene expression of the top 50 up-and down-regulated DEGs, which might directly influence AGR, we analyzed potential regulatory elements that target the DEGs depending on their upstream DNA sequence (Table 4). The target sites of the following transcription factors were significantly enriched in the DEGs: JUN, LEF1, NFAT, MAZ, MLLT4, and TCF1. The target sites of the following microRNAs were also significantly enriched in the DEGs: miR-200B/200C/429, miR-19A/19B, miR-520G/520H, miR-524, miR-23A/23B, miR-153, miR-409, miR145, miR-9, and miR-129.

Gene co-expression network analysis of the top 50 up-and down-regulated DEGs
We constructed a co-expression network for the top 50 up-and down-regulated DEGs, with gene-correlation interactions consisting of 143 nodes and 764 edges, by mapping the DEGs onto a massive database of functional-interaction datasets in the GeneMANIA web  (Figure 2). The network was further subdivided into five functional modules that were closely connected by >20 nodes, using the fast-greedy HEN (G) algorithm of the Cytoscape GLay plug-in, followed by functional enrichment analysis according to GO hierarchy and KEGG pathway. For example, "Module 1," of maximum size in the network, was significantly enriched using biological terms of the GO hierarchy such as "Programmed cell death" (GO 0012501) and "Response to organic substance" (GO 0010033).
The smallest module, "Module 4," was significantly enriched using biological terms such as "Cell proliferation" (GO 0008283) and "Calcium signaling pathway" (Hsa 04020), with regard to GO hierarchy and KEGG pathway, respectively.

Protein-protein interaction network analysis of the top 50 up-and down-regulated DEGs
We constructed a PPI network for the top 50 up-and down-regulated DEGs, composed of 1004 nodes and 1186 edges, by mapping them onto a very large database of PPI datasets downloaded from the BioGRID web server (Figure 3). From fifteen distinct protein clusters surrounding hub proteins, with >15 nodes, twelve functional hub cluster proteins were specifically identified, based on their p value and node density derived using the Cytoscape ClusterONE plug-in followed by functional enrichment analysis by GO hierarchy, as follows: SYT1, FHL1, CCND2, GJA1, SORBS2, LEF1, SATB1, RRM1, FGF2, DCK, BATF, and SERPINB9. While six of these hub proteins (SYT1, GJA1, LEF1, SATB1, RRM1, and BATF) represented up-regulated DEGs, the other six (FHL1, CCND2, SORBS2, FGF2, DCK, and SERPINB9) represented down-regulated DEGs. By constructing a PPI network for the top20 up-and down-regulated DEGs, we also observed the whole schematic diagram that the two kinds of DEGs interact directly or indirectly in the network (Figure 4).

Discussion
Gemcitabine is a molecular-targeted cancer drug for the standard chemotherapeutic treatment of patients with various solid tumors, but its clinical impact is limited by the high degree of inherent or acquired drug resistance; no definitive genetic factors have been reported to be solely responsible for the AGR of cancers. Investigating alterations in gene expression, which characterize the response of cancer cells to gemcitabine treatment during the process of AGR, would help us to understand the underlying mechanisms of drug resistance and improve the efficacy of therapeutic strategies for this deadly disease. In this context, we performed a cross-platform meta-analysis of three independent microarray datasets and attempted an integrative analysis of three systemic molecular networks (gene regulation network, gene co-expression network, and PPI network) for the identified DEGs. In the case of the top 20 up-and down-regulated DEGs, most of the genes have been reported to be involved in carcinogenesis of many tumor and cancer types, suggesting that they may be potential key factors in the AGR process. Functional enrichment analysis of all the identified DEGs revealed that they were mainly classified as related to biological functions such as the cell cycle, homeostasis, immune response, apoptosis, replication, and signal transduction that are associated with the general process of carcinogenesis. In particular, the following GO and KEGG enrichment terms had direct relevance to the mechanisms by which cancers acquire their gemcitabine-resistant property: the GO terms, "Response to drug," "Regulation of programmed cell death," "Regulation of secretion," and "Regulation of transcription, DNA-dependent," and the KEGG terms, "Metabolism of xenobiotics by cytochrome P450," "Drug metabolism," "Pyrimidine metabolism," "MAPK and Wnt signaling pathway," "Endocytosis," and "Pathways in cancer" (Nakano et al., 2007;Toschi and Cappuzzo, 2009;Tufman and Huber, 2010;Hung et al., 2012;de Sousa Cavalcante and Monteiro, 2014). In evaluating the biological significance of the identified DEGs in the complex process of AGR, comprehensive information on the topological positions of the DEGs within a network at the transcriptome level is no less valuable than the fold-change and p values of individual DEGs. Systemic analysis of the gene regulation network of the identified DEGs and their potential regulatory elements (targets of transcription factors and microRNAs) may facilitate a macroscopic view of AGR, by looking into regulatory mechanisms governing the expression and the function of (many different) genes and cellular processes, respectively, in cancers that acquire gemcitabine resistance. In practice, most of regulatory

Figure 4. The Constructed PPI Network of the Top 20
Up-and Down-regulated DEGs. The PPI network of the top 20 up-and down-regulated DEGs was constructed by mapping them into massive database of BIOGRID program. The node and edge of each hub cluster stand for protein encoded by genes with the identified DEGs and interaction of the proteins, respectively. The color of node signifies proteins that are encoded by the following DEGs: Light blue-up-regulated DEGs, Light red-down-regulated DEGs, and Light brown-additional genes in BioGRID Integrative Meta-analysis of Multiple Gene Expression Profiles in Acquired Gemcitabine-Resistant Cancer Cell Lines elements enriched by the top 50 up-and down-regulated DEGs were reported to be involved as oncogenes or tumor suppressors in a variety of human cancers including colon, gastric, prostate, lung, pancreatic, and breast cancer. For example, among transcription factors, it was reported that activation of the JUN-JNK complex was required for development of AGR in lung cancer H1299 cells, and increased expression of NFAT was correlated with tumor cell survival against apoptosis in drug-resistant pancreatic cancer (Teraishi et al., 2005;Griesmann et al., 2013). LEF1 (lymphoid enhancer-binding factor 1) was identified as a mediator of the Wnt/β-catenin signaling pathway during metastasis of lung adenocarcinoma (Bleckmann et al.). In the case of microRNAs, it was reported that the miR-200 family could serve as regulators of EMT in metastasis of ovarian, breast, and pancreatic cancer, and the attenuated expression of miR-200b/200c was found in gemcitabine-resistant pancreatic cancer cells (Ali et al., 2010). The microRNA, miR-145, is known to be a novel regulator of MUC13 that is highly involved in the progression of pancreatic cancer, and miR-19a was discovered as a prognostic factor for poor outcome in patients with non-small cell lung cancer (Lin et al., 2013;Khan et al., 2014). In another systemic approach for identifying expression patterns of DEGs that may be related to AGR in cancers, we evaluated the functional enrichment of DEGs into distinct modules, where they are co-located and form functional interactions with each other, within the network that was constructed by mapping the DEGs onto the massive gene co-expression database of the GeneMANIA online resource. Five modules, composed of the top 50 up-and down-regulated DEGs, were identified; genes within these modules and other genes already known to be in the network were largely involved in processes that are representative of AGR, including indefinite cell proliferation, for example, via abnormal apoptosis (modules 1, 3, and 4), intercellular membrane transport of small molecules (modules 2 and 4), deregulated transcription (modules 3 and 4), and arrested replication escape by reactivation of DNA synthesis (module 3). In parallel with the two above-mentioned network analyses, clustering and enrichment of functional hub clusters were analyzed at the protein level within the PPI network in order to identify hub proteins with a high degree of interaction. In many studies, hub nodes have been found to be necessary factors for the specific function that is executed by their corresponding network in an organic system, and play important functions in maintaining that network within the system. The twelve functional hub DEGs identified in the PPI network were enriched using GO terms for biological processes with a close relationship to the AGR process, such as abnormal apoptosis (hub clusters 3, 4, 5, 9, and 12), membrane transport of small molecules (hub cluster 1), deregulated transcription (hub clusters 2, 6, 8, and 11), and arrested replication escape by reactivation of DNA synthesis (hub clusters 7, 8, and 10). By comparing four lists of DEGs, for the gene regulation network, gene co-expression network, PPI network, and the top 50 up-and down-regulated DEGs, seven DEGs were shortlisted as AGR candidate genes from the total of 158 DEGs identified by meta-analysis; these included four up-regulated DEGs (SYT1, GJA1, LEF1, and SATB1) and three down-regulated DEGs (FHL1, CCND2, and SORBS2) that appeared in all four lists. In particular, GJA1, LEF1, and CCND2 were affiliated to the lists of the top 20 up-and down-regulated DEGs more likely to be crucial for the etiology of AGR.
LEF1 belongs to a family that shares conserved amino acid sequence homology with high-mobility group protein 1 (Nguyen et al., 2009;Bleckmann et al., 2013). Many previous studies showed that a LEF1/ TCF4 complex is closely associated with poor survival in patients with different cancers (colon, gastric, breast, lung, and pancreatic), via regulation of cell proliferation, migration, invasion, and metastasis, as a transcription factor that mediates the Wnt/β-catenin signaling pathway in the nucleus. Another upregulated DEG, GJA1 (encoding gap junction protein, alpha 1, 43 kDa), is a member of the connexin gene family that encode components of gap junctions, which act as intercellular channels for the diffusion of low molecular weight materials from cell to cell (McLachlan et al., 2006;Li et al., 2007). GJA1, better known as connexin 43 (Cx43), was reported to function as a tumor suppressor that inhibits tumor growth, via regulation of EMT and angiogenesis, in breast and prostate cancer. Gene expression analysis of cisplatinsensitive and -resistant ovarian cancer cells showed that GJA1 expression was highly elevated in cisplatin-resistant ovarian cancer cells, by whole-genome oligonucleotide microarray analysis, and by immunoblotting and immunofluorescence analyses using a Cx43-specific antibody. CCND2 (cyclin D2), identified by the downregulated DEG, is known to function as a regulatory subunit of cyclin-dependent kinases that are involved in the G1/S transition in the mitotic cell cycle (Koyama-Nasu et al., 2013). Alteration of CCND2 gene expression promoted phosphorylation and subsequent inactivation of the retinoblastoma tumor suppressor protein, RB1, which causes dysregulation of the G1/S transition as a common event in the tumorigenesis of many cancers. There was no choice but to exclude DCK (deoxycytidine kinase) and RRM1, already known as molecular targets in cytotoxic mechanisms of gemcitabine, from the final gene lists; although they were contained in module 3 of the gene co-expression network and hub clusters 8 and 13 of the PPI network (highly related to AGR), we could not analyze their potential regulatory elements in the gene regulation network, owing to computational limitation using databases containing only experimentally discovered relationships.
In conclusion, by performing a cross-platform meta-analysis of three microarray datasets for different cancer cell lines with AGR, we have identified a total of 158 candidate DEGs that have a high probability of being involved in the molecular mechanism of AGR. We have also provided a comprehensive overview of the gene expression pattern of the AGR-related DEGs by attempting integrated in silico analysis of three molecular networks. This topological approach of integrative network analysis could help to provide new insights into the complex nature of AGR and may be useful for studying prospective chemotherapeutic strategies.