• Title/Summary/Keyword: exon-intron boundary

Search Result 7, Processing Time 0.022 seconds

Survey on Nucleotide Encoding Techniques and SVM Kernel Design for Human Splice Site Prediction

  • Bari, A.T.M. Golam;Reaz, Mst. Rokeya;Choi, Ho-Jin;Jeong, Byeong-Soo
    • Interdisciplinary Bio Central
    • /
    • v.4 no.4
    • /
    • pp.14.1-14.6
    • /
    • 2012
  • Splice site prediction in DNA sequence is a basic search problem for finding exon/intron and intron/exon boundaries. Removing introns and then joining the exons together forms the mRNA sequence. These sequences are the input of the translation process. It is a necessary step in the central dogma of molecular biology. The main task of splice site prediction is to find out the exact GT and AG ended sequences. Then it identifies the true and false GT and AG ended sequences among those candidate sequences. In this paper, we survey research works on splice site prediction based on support vector machine (SVM). The basic difference between these research works is nucleotide encoding technique and SVM kernel selection. Some methods encode the DNA sequence in a sparse way whereas others encode in a probabilistic manner. The encoded sequences serve as input of SVM. The task of SVM is to classify them using its learning model. The accuracy of classification largely depends on the proper kernel selection for sequence data as well as a selection of kernel parameter. We observe each encoding technique and classify them according to their similarity. Then we discuss about kernel and their parameter selection. Our survey paper provides a basic understanding of encoding approaches and proper kernel selection of SVM for splice site prediction.

Genomic Structure Analyses of Five Kinds of Human Sialyltransferase Gene (5종류의 인간유래 시알산전이효소 유전자들의 게놈구조 분석)

  • Kang Nam-Young;Kim Sang-Wan;Kim Cheorl-Ho;Lee Young-Choon
    • Journal of Life Science
    • /
    • v.14 no.6 s.67
    • /
    • pp.1009-1017
    • /
    • 2004
  • Sialyltransferases cloned so far show the remarkable tissue-specific expression, which is correlated with the existence of cell type-specific sialylated sugar structure in glycoconjugates. In the previous studies, we found various mRNA isoforms of human sialyltransferases generated by alternative splicing and alternative promoter utilization. To understand the regulatory mechanisms for specific expression of human sialyltransferase genes and for production of their mRNA isoforms, in this study, we have isolated and characterized five kinds of human sialyltransferase genes: hST3Gal II, hST8Sia II, hST8Sia III, hST8Sia IV, and hST8Sia V. The hST3Gal II gene is composed of six exons, which span over 17kb, with exons ranging in size from 46 to over 1017 bp. The hST8Sia III gene comprises over 10 kb, and consists of only four exons, which is much smaller and simpler than other human sialyltransferase genes. In contrast, three genes (hST8Sia II, hST8Sia IV and hST8Sia V) span more than 70 kb, and comprise five or more exons. All exon-intron boundaries follow the GT-AG rule. In particular, the sialylmotif L, which is a highly conserved region in all cloned sialyltransferases, was found in one exon of hST8Sia III, whereas this motif is encoded by discrete exons in the other human sialyltransferases. Exon structures of these sialyltransferase genes show the structural diversity, as found in other human sialyltransferase genes reported so far. We determined the transcription start site of hST3Gal II gene by the 5'-RACE and cap site hunting experiments.

Characteristics of Gene Structure of Bovine Ghrelin and Influence of Aging on Plasma Ghrelin

  • Kita, K.;Harada, K.;Nagao, K.;Yokota, H.
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.18 no.5
    • /
    • pp.723-727
    • /
    • 2005
  • Ghrelin is a novel growth-hormone-releasing acylated peptide, which has been purified and identified in rat stomach. In the present study, the full-length sequence of bovine ghrelin cDNA was cloned by RT-PCR. The bovine ghrelin cDNA sequence derived in the present study included a 348 bp open reading frame and a 137 bp 3'UTR. The putative amino acid sequence of bovine prepro-ghrelin consisted of 116 amino acids, which contained the 27-amino acid ghrelin. The sequence analysis of the bovine ghrelin gene revealed that an intron existed between Gln$^{13}$ and Arg$^{14}$ of ghrelin. This exon-intron boundary matched the GT-AG rule of the splicing mechanism. Compared with rats, which have two tandem CAG sequences in the 3'end of intron, bovine ghrelin genome has only one CAG sequence. Therefore, although rats can produce 28 amino acid-ghrelin and 27 amino acid-des-Gln$^{14}$-ghrelin by alternative splicing, ruminant species, including bovines, might be able to produce only one type of ghrelin peptide, des-Gln$^{14}$-ghrelin. The influence of aging on plasma ghrelin concentration was also examined. Plasma ghrelin concentration increased after birth to approximately 600 days of age, and then remained constant.

Mutational Analysis of Prohibitin - A Highly Conserved Gene in Indian Female Breast Cancer Cases

  • Najm, Mohammad Zeeshan;Akhtar, Md. Salman;Ahmad, Istaq;Sadaf, Sadaf;Mallick, Mohd Nasar;Kausar, Mohd Adnan;Chattopadhyay, Shilpi;Ahad, Amjid;Zaidi, Shuaib;Husain, Syed Akhtar;Siddiqui, Waseem Ahmad
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.13 no.10
    • /
    • pp.5113-5117
    • /
    • 2012
  • Prohibitin (PHB) is a chaperone protein which is highly conserved evolutionarily. It shows significant homology with the Drosophila cc gene which is considered important for development and differentiation of Drosophila melanogaster. Investigations have revealed an involvement of PHB in cellular proliferation and development, apoptosis, signal transduction, mitochondrial function and regulation of the estrogen and androgen receptors. Therefore, we conducted the present study to analyze mutations in the highly conserved region in Indian female breast cancer patients. Conventional PCR-SSCP and Automated DNA sequencing were performed with a total of 105 breast cancer samples along with adjacent normal tissue. Of the total, 14.2% (15/105) demonstrated a mutation status of prohibitin observed in our study population. We identified a novel missense mutation (Thr>Ser), a novel deletion of T nucleotide in an intron adjacent to intron-exon boundary and a previously determined missense mutation (Val>Ala). A statistically significant correlation was obtained which suggested that prohibitin may be associated with tumor development and/or progression of at least some proportion of breast cancers.

Identification of Causal and/or Rare Genetic Variants for Complex Traits by Targeted Resequencing in Population-based Cohorts

  • Kim, Yun-Kyoung;Hong, Chang-Bum;Cho, Yoon-Shin
    • Genomics & Informatics
    • /
    • v.8 no.3
    • /
    • pp.131-137
    • /
    • 2010
  • Genome-wide association studies (GWASs) have greatly contributed to the identification of common variants responsible for numerous complex traits. There are, however, unavoidable limitations in detecting causal and/or rare variants for traits in this approach, which depends on an LD-based tagging SNP microarray chip. In an effort to detect potential casual and/or rare variants for complex traits, such as type 2 diabetes (T2D) and triglycerides (TGs), we conducted a targeted resequencing of loci identified by the Korea Association REsource (KARE) GWAS. The target regions for resequencing comprised whole exons, exon-intron boundaries, and regulatory regions of genes that appeared within 1 Mb of the GWA signal boundary. From 124 individuals selected in population-based cohorts, a total of 0.7 Mb target regions were captured by the NimbleGen sequence capture 385K array. Subsequent sequencing, carried out by the Roche 454 Genome Sequencer FLX, generated about 110,000 sequence reads per individual. Mapping of sequence reads to the human reference genome was performed using the SSAHA2 program. An average of 62.2% of total reads was mapped to targets with an average 22X-fold coverage. A total of 5,983 SNPs (average 846 SNPs per individual) were called and annotated by GATK software, with 96.5% accuracy that was estimated by comparison with Affymetrix 5.0 genotyped data in identical individuals. About 51% of total SNPs were singletons that can be considered possible rare variants in the population. Among SNPs that appeared in exons, which occupies about 20% of total SNPs, 304 nonsynonymous singletons were tested with Polyphen to predict the protein damage caused by mutation. In total, we were able to detect 9 and 6 potentially functional rare SNPs for T2D and triglycerides, respectively, evoking a further step of replication genotyping in independent populations to prove their bona fide relevance to traits.

Increasing Splicing Site Prediction by Training Gene Set Based on Species

  • Ahn, Beunguk;Abbas, Elbashir;Park, Jin-Ah;Choi, Ho-Jin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.6 no.11
    • /
    • pp.2784-2799
    • /
    • 2012
  • Biological data have been increased exponentially in recent years, and analyzing these data using data mining tools has become one of the major issues in the bioinformatics research community. This paper focuses on the protein construction process in higher organisms where the deoxyribonucleic acid, or DNA, sequence is filtered. In the process, "unmeaningful" DNA sub-sequences (called introns) are removed, and their meaningful counterparts (called exons) are retained. Accurate recognition of the boundaries between these two classes of sub-sequences, however, is known to be a difficult problem. Conventional approaches for recognizing these boundaries have sought for solely enhancing machine learning techniques, while inherent nature of the data themselves has been overlooked. In this paper we present an approach which makes use of the data attributes inherent to species in order to increase the accuracy of the boundary recognition. For experimentation, we have taken the data sets for four different species from the University of California Santa Cruz (UCSC) data repository, divided the data sets based on the species types, then trained a preprocessed version of the data sets on neural network(NN)-based and support vector machine(SVM)-based classifiers. As a result, we have observed that each species has its own specific features related to the splice sites, and that it implies there are related distances among species. To conclude, dividing the training data set based on species would increase the accuracy of predicting splicing junction and propose new insight to the biological research.

A Case of Lethal Neonatal Type Carbamoyl Phosphate Synthetase 1 Deficiency with Novel Mutation of CPS1 (새로운 CPS1 유전자 돌연변이에 의한 신생아형 carbamoyl phosphate synthetase 1 결핍 1례)

  • Suh, Seung-hyun;Kim, Yoo-Mi;Byun, Shin Yun;Son, Seung Kook;Kim, Seong Heon;Kim, Hyung Tae;Kim, Gu-Hwan;Yoo, Han-Wook
    • Journal of The Korean Society of Inherited Metabolic disease
    • /
    • v.16 no.2
    • /
    • pp.109-114
    • /
    • 2016
  • Carbamoyl phosphate synthetase 1 (CPS1) deficiency is an autosomal recessive urea cycle disorder which causes hyperammonemia. CPS1 is the first enzyme step in the urea cycle and almost patients present their symptoms during neonatal period. We report a case of CPS1 deficiency in a boy who developed symptoms including lethargy and seizure at 3 days of age. The ammonia level was up to $2,325{\mu}mol/L$, sodium benzoate (250 mg/kg/d) and high calories of both dextrose and lipid was promptly administered. Central access by experienced pediatric surgeon and emergent continuous hemodialysis by pediatric nephrologist was performed within 3 hours and ammonia was less than $100{\mu}mol/L$ at 5 days of age. Currently, he has showed excellent response to treatments including scavenging drugs and a low-protein diet. Despite of diffuse increasing signal intensity on cerebral white matters and basal ganglia on brain MRI, his development and weight gain were good at the last follow-up at 11 months of age. Molecular assay of the CPS1 gene demonstrated that patient had compound heterozygous for c.1529del ($p.Gly510Alafs^*5$) in exon 14 and c.3142-1G>C (IVS25(-1)G>C) in intron 25 and exon 26 boundary. The splicing mutation was novel mutation and inherited from patient's mother. Here, we report a neonatal lethal type CPS1 deficiency patient having novel mutation.

  • PDF