DOI QR코드

DOI QR Code

Detection and Prediction of Alternative Splicing with One-leaf One-node Tree

One-leaf One-node 트리를 이용한 선택 스플라이싱 탐지 및 예측

  • 박민서 (메사추세츠 대학교 컴퓨터과학과)
  • Received : 2010.08.23
  • Accepted : 2010.10.25
  • Published : 2010.10.28

Abstract

Alternative splicing is an important process in gene expression. Alternative Splicing can lead to mutations and diseases. Most studies detect alternatively spliced genes with ESTs (Expressed Sequence Tags). However, reliance on ESTs might have some weaknesses in predicting alternative splicing. ESTs have been stored in the libraries. The EST libraries are often not clearly organized and annotated. We can pick erroneous ESTs. It is also difficult to predict whether or not alternative splicing exists for those genes where ESTs are not available. To address these issues and to improve the quality of detection and prediction for alternative splicing, we propose the One-leaf One-node Tree Algorithm that uses pre-mRNAs. It is achieved by codons, three nucleotides, as attributes for each chromosome in Arabidopsis thaliana. The proposed decision tree shows that alternative and normal splicing have different splicing patterns according to triplet nucleotides in each chromosome. Based on the patterns, alternative splicing of unlabeled genes can also be predicted.

선택 스플라이싱은 유전자 발현의 중요한 과정 중 하나이다. 선택 스플라이싱이 발생함에 따라, 돌연변가 발생하여, 질병을 일으킬 수 있다. 대부분의 선택 스플라이싱 연구는 EST(Expressed Sequence Tag)를 이용한다. 그러나, EST를 이용하여 선택 스플라싱을 예측하는 데는 몇 가지 단점이 있다. EST가 저장되어 있는 라이브러리가 잘 정돈되어 있지 않거나, 잘못 열거되어 있을 경우, 실험 시 EST를 잘못 선택할 수 있다. 또한, EST가 아직 발견되지 않은 유전 서열에서는 선택 스플라이싱을 찾을 방법이 없다. 이 논문에서는 이러한 EST 기반 연구의 약점을 개선하고, 선택 스플라이싱의 탐지 및 예측의 질을 높이기 위해서, pre-mRNA에서 One-leaf One-node Tree 알고리즘을 제안한다. 이 트리는 Arabidopsis thaliana의 각 염색체에 대해서 실험되었다. 실험 결과, 모든 염색체에서 codons에 따라 일반 스플라싱과 선택 스플라싱이 다른 패턴을 가지는 것으로 나타났다. 트리 알고리즘에서 도출된 패턴으로 부터, 아직 발견되지 않은 선택 스플라싱도 예측할 수 있다.

Keywords

References

  1. T. Chuang, F. Chen, and M. Chou, "A compareative method for identification of gene structures and alternatively spliced variant," Bioinformatics, Vol.20, pp.3064-3079, 2004. https://doi.org/10.1093/bioinformatics/bth368
  2. R. Sorek, R. Shemesh, Y. Cohen, O. Basechess, G. Ast, and R. Shamir, "A Non-EST-Based Method for Exon-Skipping Prediction," Genome Research, Vol.14, pp.1617-1623, 2004. https://doi.org/10.1101/gr.2572604
  3. S. Stamm, J. Riethoven, V. Le Texier, C. Gopalakrishnan, V. Kumanduri, Y. Tang, N. Barbosa-Morais, and T. Thanaraj, "ASD: a bioinformatics resource on alternative splicing," Nucleic Acids Research, Vol.34, pp.D46–D55, 2006. https://doi.org/10.1093/nar/gkj031
  4. http://www.ncbi.nlm.nih.gov.
  5. B. Haas, A. Delcher, S. Mount, J. Wortman, R. Smith Jr, L. Hannick, R. Maiti, C. Ronning, D. Rusch, C. Town, S. Salzberg, and O. White, "Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies," Nucleic Acids Research, Vol.31, pp.5654-5666, 2003. https://doi.org/10.1093/nar/gkg770
  6. M. Campbell, B. Haas, J. Hamilton, S. Mount, and C. Buell, "Comprehensive analysis of alternative splicing in rice and comparative analyses with Arabidopsis," BMC Genomics, Vol.7, p.327, 2006. https://doi.org/10.1186/1471-2164-7-327
  7. R. Nurtdinov, I. Artamonova, A. Mironov, and M. Gelfand, "Low conservation of alternative splicing patterns in the human and mouse genomes," Human Molecular Genetic, Vol.12, pp.1313-1320, 2003. https://doi.org/10.1093/hmg/ddg137
  8. http://www.arabidopsis.org.
  9. http://www.tigr.org
  10. D. Black, "Mechanisms of alternative pre-messenger RNA splicing," Annual Review of Biochemistry, Vol.72, pp.291-336, 2003. https://doi.org/10.1146/annurev.biochem.72.121801.161720
  11. K. Iida, M. Seki, T. Sakurai, M. Satou, K. Akiyama, T. Toyoda, A. Konagaya, and K. Shinozaki, "Genome-wide analysis of alternative pre-mRNA splicing in Arabidopsis Thaliana based on full-length cDNA sequences," Nucleic Acids Re-search, Vol.32, pp.5096-5103, 2004. https://doi.org/10.1093/nar/gkh845
  12. M. Pertea, X. Lin, and S. Salzberg, "GeneSplicer: a new computational method for splice site prediction," Nucleic Acids Research, Vol.29, pp.1185-1190, 2001. https://doi.org/10.1093/nar/29.5.1185
  13. B. Wang and V. Brendel, "Genomewide comparative analysis of alternative splicing in plants," in Proceedings of the National Academy of Science of the United States of America, pp.7175-7180, 2006. https://doi.org/10.1073/pnas.0602039103
  14. W. Zhu, S. Schlueter, and V. Brendel, "Refined annotation of the Arabidopsis Thaliana genome by complete EST mapping," Plant Physiology, Vol.132, pp.469-484, 2003. https://doi.org/10.1104/pp.102.018101
  15. C. Iseli, V. Jongeneel, and P. Bucher, "ESTScan: A program for detecting, evaluating, and reconstructing potential coding regions in EST sequences," in Proceedings of the Seventh ISMB, pp.138-148, 1999.
  16. C. Jongeneel, "Searching the expressed sequence tag (EST) databases: panning for genes," Briefings in Bioinformatics, Vol.1, pp.76-92, 2000. https://doi.org/10.1093/bib/1.1.76
  17. J. Collins, M. Goward, C. Cole, L. Smink, E. Huckle, S. Knowles, J. M. Bye, D. Beare, and I. Dunham, "Reevaluating human gene annotation: a second-generation analysis of chromosome 22," Genome Research, Vol.13, pp.27-36, 2003. https://doi.org/10.1101/gr.695703
  18. D. Raghunandan, L. Guglielmo, D. K., and A. Animesh, "Clinical applications of DNA microarray analysis," Journal of Experimental Therapeutics and Oncology, Vol.3, pp.297-304, 2003. https://doi.org/10.1111/j.1533-869X.2003.01104.x
  19. S. Mehta, "DNA Microarrays in Health Care & Drug Discovery," http://plasticdog.cheme.colum bia.edu/.
  20. G. Hu, S. Madore, B. Moldever, T. Jatkoe, D. Balaban, J. Thomas, and Y. Want, "Predicting Splice Variant from DNA Chip Expression Data," Genome Research, Vol.11, pp.1237-1245, 2001. https://doi.org/10.1101/gr.165501
  21. E. Garrett-Mayer and G. Parmigiani, "Clustering and Classification Methods for Gene Expression Data Analysis," Johns Hopkins University, Dept. of Biostatistics Working Papers, Vol.70, 2004.
  22. T. Cover and P. Hart, "Nearest Neighbor Pattern Classification," in Proceedings of IEEE Transaction on Information Theory, pp.21-27, 1967. https://doi.org/10.1109/TIT.1967.1053964
  23. R. Fisher, "The use of multiple measurements in taxonomic problems," Annals of Eugenics, Vol.7, pp.178-188, 1936.
  24. V. Vapnik, Statistical Learning Theory. New York, NY: John Wiley & Sons, 1998.
  25. L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, Classification and Regression Trees. Wadsworth International Group, 1984.
  26. I. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques with JAVA Implementations. Academic Press, 2000.
  27. A. Nabhan and A. Rafea, "Tuning statistical machine translation parameters using perplexity," in Proceedings of the 2005 IEEE International Conference on Information Reuse and Integration, pp.338-343, 2005. https://doi.org/10.1109/IRI-05.2005.1506496
  28. E. Brand and R. Gerritsen, "Decision Trees," DBMS Online, 1988, http://www.dbmsmag. com/-9807m05.html.
  29. K. Delisle, "Decision Trees and Evolutionary Programming," Artificial Intelligence Depot., Tech. Report, http://aidepot.com/Tutorial/ DecisionTrees .html.
  30. C. Burge and S. Karlin, "Prediction of complete gene structures in human genomic DNA," Journal of Molecular Biology, Vol.268, pp.78-94, 1997. https://doi.org/10.1006/jmbi.1997.0951
  31. H. Zhang and C. Yu, "Tree-based analysis of microarray data for classifying breast cancer," Frontiers in Bioscience, Vol.7, pp.C63-C67, 2002. https://doi.org/10.2741/zhang