Evolution of the Mir-155 Family and Possible Targets in Cancers and the Immune System

MicroRNAs (miRNAs) are approximately 22nt, endogenous and non-coding RNAs that play important roles in the post-transcriptional repression of organisms (Bartel., 2004). Generally, the miRNAs are conducted by the RNA polymerase II, forming the primary transcripts (pri-miRNAs). Then the pri-miRNAs are cleaved by RNase III enzyme Drosha with the help of DGCR8 (Landthaler et al., 2004; Price et al., 2011), followed by the release of a hairpin-shaped RNA molecule called miRNA precursor (pre-miRNA). The transport of premiRNAs into the cytoplasm by the Exportin 5 will then trigger the alternative splicing by Dicer enzyme and form an imperfect symmetry (miRNA: miRNA*duplex), consisted of one mature (3’arm) and one immature (5’arm) sequences (Yi et al., 2003; Zhang et al., 2004; MacRae et al., 2006). After that, the mature one enters the RNAinduced silencing complex (RISC) while the other will be degradated or retained in some species (Schwarz et al., 2003; Bartel, 2004). The sequence analysis of pre-miRNA and miRNA: miRNA*duplex in different species can help us to understand the distribution and evolution of a miRNA family. Besides, it has been hot topic to identify the regulation function of miRNAs in wide ranging


Introduction
MicroRNAs (miRNAs) are approximately 22nt, endogenous and non-coding RNAs that play important roles in the post-transcriptional repression of organisms (Bartel., 2004). Generally, the miRNAs are conducted by the RNA polymerase II, forming the primary transcripts (pri-miRNAs). Then the pri-miRNAs are cleaved by RNase III enzyme Drosha with the help of DGCR8 (Landthaler et al., 2004;Price et al., 2011), followed by the release of a hairpin-shaped RNA molecule called miRNA precursor (pre-miRNA). The transport of pre-miRNAs into the cytoplasm by the Exportin 5 will then trigger the alternative splicing by Dicer enzyme and form an imperfect symmetry (miRNA: miRNA*duplex), consisted of one mature (3'arm) and one immature (5'arm) sequences (Yi et al., 2003;Zhang et al., 2004;MacRae et al., 2006). After that, the mature one enters the RNAinduced silencing complex (RISC) while the other will be degradated or retained in some species (Schwarz et al., 2003;Bartel, 2004). The sequence analysis of pre-miRNA and miRNA: miRNA*duplex in different species can help us to understand the distribution and evolution of a miRNA family. Besides, it has been hot topic to identify the regulation function of miRNAs in wide ranging 1 College of Orient Science and Technology, 2 College of Veterinary Medicine, Hunan Agriculture University, Changsha, China *For correspondence: superlluo@163.com Guang-Bing Xie 1 , Wei-Jia Liu 1 , Zhi-Jun Pan 1 , Tian-Yin Cheng 2 *, Chao Luo 1 * signaling cascades and modulators of different cell-type and context dependent activities (Farooqi et al., 2014).
It is commonly believed that microRNAs are evolutionarily conserved among different organisms (Xu et al., 2012). However, a considerable amount of studies have stated that most microRNAs follows dynamic evolution, especially the novel microRNAs. For example, miR-941 was found to be a rapid regulatory evolution microRNA in the human linage, which leads to a diversity of evolution patterns (Hu et al., 2012). Then many current studies either focus on the investigation of cluster evolution or the local duplication evolutionary patterns of a microRNA family, for example, the mir-2 family, the let-7 family, the mir-17 family, the mir-395 family, the mir-548 family (Guddeti et al., 2005;Concepcion et al., 2012;Hertel et al., 2012;Liang et al., 2012;Marco et al., 2012a). However, it has been seldom studied how the independent and single-member microRNA family evolves up to date. Moreover, the current studies have usually emphased on the construction of protein-protein interaction (PPI) network, but pay rare attention on the miRNA regulatory network.
MiRNA-155 (mir-155) is an important regulatory factor participating in many physiological and pathological processes of organism (O'Connell et al., 2012;Elton et al., 7548 2013), definitely such as malignant growth (Babar et al., 2012;Mattiske et al., 2012), viral infections (Vargova et al., 2011;Wang et al., 2011) and cardiovascular diseases (Corsten et al., 2012). Further, the mir-155 family is an independent and single-member's family with unclear evolutionary patterns. Therefore, we extracted all the messages about the mir-155 family in different species, and performed multiple sequence alignment, as well as comparative genomics in the current study. We expect the study can supply new insights in the evolutionary patterns of independent and single-member miRNA family and the miRNA regulatory network.

Data source
First, the Taxonomy Tools of the NCBI (http://www. ncbi.nlm.nih.gov/taxonomy/) was used to determine the distribution of the mir-155 family. Then all the members of mir-155 family, as well as their corresponding species, annotations, deep sequencing data, miRNA/miRNA* complex and pre-miRNA sequences from different animal species were extracted from the miRase database (Release 20 http://mirbase.org/index.shtml) (Kozomaraand Griffiths-Jones, 2011).

The construction of Phylogenetic trees
To identify the ancestral sequence, Phylogenetic trees of mir-155 family were reconstructed with MEGA 4.0 (Tamura et al., 2007) by 1,000 bootstrap resampling using Neighbor-Joining (NJ) method. It has been reported that the mir-100 is the only microRNA which is shared by all bilateria (Grimson et al., 2008;. Therefore, we also analyzed the relationship between the mir-100 and mir-155 using Neighbor-Joining (NJ) method. After that, we screened the species containing the ancestral sequences of mir-155 from the mir-100 family. Then we compared the sequences mir-155 and mir-100 of the same species to determine the origin of the mir-155 family.

Relative Arm Usage
To quantify the relative amount of mature miRNA products from each arm of the same miRNAhairpin precursor, we performed the ''relative arm usage'' measure as reported (Marco et al., 2010). The quantity of relative arm usage was calculated as follows: Relative arm usage=log 2 (N 5 '/N 3 '); where N 5 ' is the number of reads mapped to the 5' arm of the hairpin precursor and N3' the number of reads from the 3' arm. Positive values indicate a bias toward 5' arm usage while the negative values a bias toward 3'arm. Zero means that mature sequences are produced at equal levels from both arms. Besides, miRDB is an online database system used for microRNA target prediction and functional annotation (Wang, 2008). In the current study, we used the miRDB database to predict the target sites of mir-155.

Prediction of the target genes and the target mRNAs
BioMart is a freely available, open source, federated database system that provides a unified access to disparate, geographically distributed data sources . The Ensembl and Ensembl Genomes BioMarts can supply us a point of access to the high-quality gene annotation, variation data, functional and regulatory annotation and evolutionary relationships from genomes spanning the taxonomic space (Kinsella et al., 2011). To predict the expressed target mRNAs of mir-155, we fetched the messages of 3'UTR of the whole-genome from ENSEMBEL (Release 72) via Biomart. Here we only chose the canonical seed to predict the 3'UTRs (Bartel, 2009), and the target sites were performed with miRecords (Xiao et al., 2009) and miRTarbase (Hsu et al., 2011).

Functional and pathway enrichment analysis
Molecule Annotation System (MAS 3.0, http://bioinfo. capitalbio.com/mas3/) is convenient analysis management software for the analysis of significantly enriched Gene Ontology (GO), pathway terms with user-friendly web interface and stable and fast analysis procedure. The selected target genes were queried using MAS to obtain richness of the functions and Kyoto EneyclopedinOf Genes &Genemes (KEGG) pathway. Based on the hypergeometric distribution, the pathways or GO terms with p<0.01 and q<0.01 were chose for the next analysis.

Construction of an miRNA-target gene-cancer regulatory network
Cytoscape is a popular bioinformatics package for biological network visualization and data integration (Smoot et al., 2011). We selected the cancer-related pathways from the enriched KEGG pathways, and then constructed a regulatory network of mir-155-pathwaytarget gene with the help of Cytoscape 2.8.

Construction of an mir-155-target gene-protein regulatory network
Human Protein Reference Database (HPRD) an object-oriented database that integrates a great deal of information relevant to the function of human proteins in health and disease (Prasad et al., 2009). In the present study, we chose the genes which were regulated by mir-155 and enriched in the immune related functions after functional enrichement analysis. Then we analyzed and constructed the mir-155-target gene-protein regulatory network using Cytoscape 2.8. All the protein interaction data used here should be verified by experiment before.

Mir-155 family all distributes in vertebrates
The annotation in the miRBase database (Relesse 20) showed that mir-155 family totally had 20 members. Based on the results of taxonomy browser of NCBI, we found that the members of mir-155 family all distributed in vertebrates (Figure 1).

Mir-155 is a conserved microRNA family in vertebrates
According to the multiple alignment of all precursor sequences (Figure 2a), we found that the 5' arm of the family precursor was highly conserved, and 3'arm has some changes the nucleotides. The highly conserved Asian Pacific Journal of Cancer Prevention, Vol 15, 2014 7549 DOI:http://dx.doi.org/10.7314/APJCP.2014.15.18.7547 Evolution of the Mir-155 Family and Possible Targets in Cancers and the Immune System nucleotides are usually located in 3'arm in many mircroRNAs family, but in the mir-155 family, the highly conserved nucleotides were 5'arm instead of 3'arm ( Figure 2a). Therefore,we speculated the arm switching had happened in mir-155 family.

The ancestral sequence and rare local duplication of the mir-155 family
To find out the ancestral sequences of mir-155 family, we constructed NJ phylogenetic tree against all the precursor sequences. According to the analysis of the Phylogenetic trees (Figure 3), the ancestral sequences of mir-155 family might have a mutation into two directions: a more inclined to fish, the other to amphibians, and then further to mammals,quadrumanas. The gene sequence ofzebrafish or Ictaluruspunctatus, etc which were the first branch separated from ancestral mir-155 gene families might have the more similar sequence with the ancestors. The ancestral sequences of mir-155 familywere DRE-mir-155and IPU-mir-155, and might come from two ways: one speculation is it occurs spontaneously; the other is it is a branch of the ancestral miRNA.Intriguingly, NJ tree showed that the two ancestral sequences of mir-155 shared the same nucleotide (Figure 4a, b). The only difference between the two ancestral sequences was that the sequence from D. rerio was longer than the sequence from I. punctatus (Figure 4c). Date have shown the mir-100 is the only microRNA shared by all bilateria (GrimsonSrivastava et al., 2008;Griffiths-JonesHui et al., 2011). Based on the comparison between the mir-100 and mir-155 family in the same species, we found that the original time of mir-155 family was just the time when the same sequences generated in mir-100 family. The original time of the mir-100 and mir-155 families was parallel, and mir-155 familiy came from the lineage-specific gain of miRNAs, but not from the original miRNAs (Figure 4a).

The mir-155 family is highly conserved in function
To verify whether the arm switching happened in mir-155 or not, we further calculated the "Relative Arm Usage" ( Table 1). The 5' arm was highly expressed compared with the 3' arm, which suggested the dominantly mature microRNA was generated from the 5' arm. Based on the  results predicted by miRDB, we found the target sites of 5' arm were more than those of the 3' arm, which also supported the inference of arm switching in the mir-155 family. The seed sequence is 2-8 nucleotides of the mature microRNAs which play a crucial role in transcript targeting (Lewis et al., 2003;Nahvi et al., 2009). Based on the results of multiple sequence alignment against the mature sequences, the seed sequence of 5' arm in the mir-155 family were 100% conserved while of 3' arm, showed nucleotide diversity (Figure 4c). Therefore, the function of the mir-155 family was highly conserved.

The target gene-pathway-cancer regulatory network of mir-155
There were totally 738 target genes regulated by mir-155 were screened in the current study.All the targets were queried for the KEGG pathway and GO enrichments by using MAS. The KEGG pathway analysis showed mir-155 was involved in a diversity of pathways, especially the various cancers. There were totally 88 pathways, and of them, 11 pathways were presented relation with various cancers (Table 2), such as Colorectal cancer, Pancreatic cancer. Therefore, we constructed the target gene-pathway-cancer regulatory network of mir-155 ( Figure 5) to uncover the relationship between mir-155 and cancers. The CCND1, EGFR genes were involved in most of the cancer-related pathways.

The construction of mir-155-target gene-protein regulatory network
Based on the GO analysis, the target genes were enriched in 366 terms of cellular molecular function, 533 of biological processes and 172 of cellular components. With the help of HPRD, there were totally 379 proteins with interaction data with mir-155 target genes were enriched in imunne-related GO terms, such as CD31, CRKL and Fgr. The mir-155-target gene-protein regulatory network was shown in Figure6. The target genes cebpb and VCAM1 and the protein SMAD2 might be the key factors in the mir-155 regulated immune activities.

Discussion
Mir-155 has been reported to play important roles in inhibition of malignant growth, viral infections and even in attenuating the progression of cardiovascular The relative arm usage is defined: log2(N5'/N3'), Positive values indicate a bias toward 5' arm usage and negative values a bias toward 3 arm. Zero means that mature sequences are produced at equal levels from both arms (Marco et al., 2010). Targets were predicted by miRDB (Wienholds et al., 2005), and "-" indicate there was no data available   (WangToomey et al., 2011;Zhi-Yong et al., 2011;CorstenPapageorgiou et al., 2012;MattiskeSuetani et al., 2012). Moreover, mir-155 has been regarded as a critical regulator of immune cell development, function, and disease (O'Connell et al., 2009), however, it is still unclear about the evolution of the mir-155 family. In the current study, we found that the mir-155 family was highly conserved in 3'arm unlike the mir-2 family and there was no gene copy coming out of this family, which made the nucleotide divergence, only emerged among species. The KEGG pathway enrichment analysis showed that the CCND1, EGFR genes might be the potential targets of mir-155 in the progress of cancers. Moreover, the target genes cebpb and VCAM1 and the protein SMAD2 might be the key factors in the mir-155 regulated immune activities.
The evolutionary process of microRNA is slower even than the most conserved genes in the metazoan genome (GrimsonSrivastava et al., 2008;Wheeler et al., 2009). Therefore, it is possible for microRNA families to be used in tracking the evolutionary origin of the most ancient animal phylogeny. In addition, the miRNAs have experienced various evolutionary patterns during the evolutionary history. The accepeted evolutionary patterns of microRNA have be shown as duplications (including local duplication and non-local duplication), losses, rearrangements, arm switching, new hairpin, hairpin shifting, seed shifting, editing (Blow et al., 2006;Maher et al., 2006;de Wit et al., 2009;MarcoHui et al., 2010;Griffiths-JonesHui et al., 2011;Marco et al., 2012b;Marco et al., 2013). Of them, duplication is the leading evolutionary pattern, which usually results in the clustered changes in most of lineages, for instence, the mir-2 and mir-17 family. However, the mir-155 family is very different from them in these aspects. The duplications are rarely happened in mir-155 family, as well as the family cluster. Further, the mir-155 family was highly conserved in 3'arm, while the 5' arm was highly expressed because of the arm switching. Therefore, the dominant mature microRNA was produced from the 5' arm, which was consistent with the predicted results of miRDB. The mature microRNAs are composed of non-canonical and canonical mature miRNA. In the current study, we found that the seed sequences of the non-canonical mature miRNA, 2-8 nucleotides of the mature microRNAs which play a crucial role in transcript targeting (LewisShih et al., 2003;NahviShoemaker et al., 2009), have undergone slight changes. Therefore, we speculate that the mir-155 family is suffering slow evolution, and the non-canonical mature miRNA might have the potential to change the mir-155 family.
MiRNA is well-preserved in a range of specimen types and is much sensitivity than proteins, therefore, miRNAs has the potential of biomarkers for various molecular diagnostic applications (Pritchard et al., 2012). Moreover, the mir-155 family is a vital small regulatory RNA in organism, and the function is widely involved in immune adjustment (Vigorito et al., 2013) and a variety of cancers (MattiskeSuetani et al., 2012). Therefore, we constructed the target gene-pathway-cancer regulatory network of mir-155, and found CCND1, EGFR genes might be the key targets of mir-155 in cancer-related pathways. In our study, CCND1 was enriched in the Colorectal cancer, Pancreatic cancer, Endometrial cancer, Melanoma, Bladder cancer, Non-small cell lung cancer, Prostate cancer, Small cell lung cancer and Thyroid cancer pathways. Moreover, it has been reported CCND1 amplification also occurs in breast, esophageal, hepatocellular, and head and neck cancer (Croce, 2008). The CCND1 gene controls the cell cycle by the regulation on cyclin-dependent kinase 4 (CDK4) or CDK6 . The mutations, amplification and overexpression of this gene will result in the alteration of cell cycle progression, and then further disturb normal mitosis, even contribute to tumorigenesis. EGFR gene is related to many epithelial cancers. Therefore, we speculate that mir-155 may have regulatory effect on many cancers which mainly performs through the control of the cell cycle by the CCND1 gene.
Furthermore, as shown in Figure 4, SMAD2 plays a very important role in the regulatory network of mir-155 target gens and immune-related proteins. As known, SMAD2 proteins are signal transducers and transcriptional modulators that mediate multiple signaling pathways, and is responsible for the transmission of extracellular signals from ligands of the Transforming Growth Factor beta (TGF-β) superfamily of growth factors into the cell nucleus. TGF-β is a pleiotropic cytokine taking part in the processes of fibrosis, angiogenesis, apoptosis and immunosupression (Louafi et al., 2010;Das et al., 2014). The TGF-β receptor-Smad2/3 pathway is essential for the cell apoptosis and proliferation (Zhi-Yong1 et al., 2011;DasXu et al., 2014). Moreover, it has been indicated that miR-155 antisense oligonucleotide can induce cell apoptosis and inhibit cell proliferation, and therefore, may be a therapeutic target of breast carcinoma (Zheng et al., 2013). This report also indicated that SMAD2 was a predicted target for mir-155. Therefore, the regulatory effect of mir-155 on immune function mainly performs through the SMAD protein.
In conclusion, the mir-155 family is a vertebratespecific and conserved family which is suffering slow evolution. In addition, mir-155 family is special for arm usage which the 5'arm acts as dominant function. The CCND1 gene and the SMAD protein might be the key targets of mir-155 in diverse cancer pathways and immune system.