Comparison of External Information Performance Predicting Subcellular Localization of Proteins

단백질의 세포내 위치를 예측하기 위한 외부정보의 성능 비교

  • 지상문 (경성대학교 컴퓨터학부)
  • Received : 2010.05.11
  • Accepted : 2010.09.03
  • Published : 2010.11.15

Abstract

Since protein subcellular location and biological function are highly correlated, the prediction of protein subcellular localization can provide information about the function of a protein. In order to enhance the prediction performance, external information other than amino acids sequence information is actively exploited in many researches. This paper compares the prediction capabilities resided in amino acid sequence similarity, protein profile, gene ontology, motif, and textual information. In the experiments using PLOC dataset which has proteins less than 80% sequence similarity, sequence similarity information and gene ontology are effective information, achieving a classification accuracy of 94.8%. In the experiments using BaCelLo IDS dataset with low sequence similarity less than 30%, using gene ontology gives the best prediction accuracies, 93.2% for animals and 86.6% for fungi.

단백질의 세포내 위치와 단백질의 기능은 연관성이 크므로, 단백질의 세포내 위치 예측을 통해서 그 기능에 대한 정보를 얻을 수 있다. 예측 정확도를 높이기 위해서 아미노산 서열 정보이외의 외부 정보들을 효과적으로 이용하려는 연구가 활발하다. 본 논문에서는 아미노산 서열 유사성, 단백질 프로파일, 유전자 온톨로지, 모티프, 문헌 정보에 내재된 세포내 위치 예측 능력을 비교한다. 단백질간의 서열 유사성이 80% 이하인 PLOC 자료를 사용한 실험에서는 서열 유사성과 유전자 온톨로지를 이용하는 방법이 효과적이며, 94.8%의 예측정확도를 얻었다. 단백질 서열간의 유사성이 30% 이하로서 단백질간의 서열 유사성이 작은 BaCelLo IDS 자료는 유전자 온톨로지를 사용하는 것이 효과적이었고, 동물은 93.2%, 곰팡이는 86.6%의 예측정확도로 크게 향상된 성능을 얻었다.

Keywords

References

  1. H. Lodish, A. Berk, C.A. Kaiser, et al., Molecular Cell Biology, sixth Ed., p.710, W.H. Freeman and Company, New York, 2007.
  2. O. Emanuelsson, H. Nielson, S. Brunak, G. von Heijne, "Predicting subcellular localization of protein based on their N-terminal amino acid sequence," J. Mol. Biol., 300, pp.1005-1016, 2000. https://doi.org/10.1006/jmbi.2000.3903
  3. R. Nair, B. Rost, "Mimicking cellular sorting improves prediction of subcellular localization," J. Mol. Biol., 348, pp.85-100, 2005. https://doi.org/10.1016/j.jmb.2005.02.025
  4. A. Pierleoni, P. L. Martelli, P. Fariselli, R. Casadio, "BaCelLo: a balanced subcellular localization predictor," Bioinformatics, 22, e408-e416, 2006. https://doi.org/10.1093/bioinformatics/btl222
  5. A. Hoglund, P. Donnes, T, Blum, H.-W. Adolph, O. Kohlbacher, "MultiLoc: prediction of protein localization using n-terminal targeting sequences, sequence motifs and amino acid compositions," Bioinformatics, 22, pp.1158-1165, 2006. https://doi.org/10.1093/bioinformatics/btl002
  6. P. Horton, et al. "WoLF PSORT: protein localization predictor," Nucleic Acids Res., 35:W585-W587, 2007. https://doi.org/10.1093/nar/gkm259
  7. H. Shatkay et al., "SherLoc: high-accuracy prediction of protein subcellular localization by integrating text and protein sequence data," Bioinformatics, 23, pp.1410-1417. 2007. https://doi.org/10.1093/bioinformatics/btm115
  8. T. Blum, S. Briesemeister, O. Kohlbacher, "Multi- Loc2: integrating phylogeny and Gene Ontology terms improves subcellular protein localization prediction," BMC Bioinformatics, vol.10, no.274, doi: 10.1186/ 1471-2105-10-274. 2009.
  9. K.-J. Park, M. Kanehisa, "Prediction of protein subcellular location by support vector machines using compositions of amino acids and amino acid pairs," Bioinformatics, 19, pp.1656-1663, 2003. https://doi.org/10.1093/bioinformatics/btg222
  10. W.-W. Yang, B.-L. Lu, Y. Yang, "A comparative study on feature extraction from protein sequences for subcellular localization prediction," IEEE Symposium on CIBCB, pp.201-208, Toronto, Canada, 2006.
  11. Q. Cui, T. Jiang, B. Liu, S. Ma, "Esub8: a novel tool to predict protein subcellular localizations in eukaryotic organisms," BMC Bioinformatics, vol.5, no.66, 2004.
  12. K. Chou, Y. Cai, "Prediction of protein subcellular locations by GO-FunD-PseAA predictor," Biochem Biophys Res Commun, 320, pp.1236-1239, 2004 https://doi.org/10.1016/j.bbrc.2004.06.073
  13. S.-M. Chi, "Estimating amino acids composition of protein sequences using position-dependent similarity spectrum," Journal of KIISE : Software and Applications, vol.37, no.1, pp.74-79, JAN. 2010. (in Korean)
  14. M. A. Andrade, S. I. O'Donoghue, B. Rost, "Adaption of protein surfaces to subcellular location," J. Mol. Biol., 276, pp.517-525, 1998. https://doi.org/10.1006/jmbi.1997.1498
  15. M. Paetzel, A. Karla, N. C. Strynadka, R. E. Dalbey, "Signal peptidases," Chem. Rev., 102, pp.4549-4580, 2002. https://doi.org/10.1021/cr010166y
  16. V. Goder, M. Spiess, "Molecular mechanism of signal sequence orientation in the endoplasmic reticulum," The EMBO Journal, 22, pp.3645-3653, 2003. https://doi.org/10.1093/emboj/cdg361
  17. E. Granseth, G. von Heijne, A. Elofsson, "A study of the membrane-water interface region of membrane proteins," J. Mol. Biol., 346, pp.377-385, 2005. https://doi.org/10.1016/j.jmb.2004.11.036
  18. D. Xie, A. Li, M. Wang, Z. Fan, H. Feng, "LOCSVMPSI: a web server for subcellular localization of eukaryotic proteins using SVM and profile of PSI-BLAST," Nucleic Acids Res., 33, W105-W110, 2005. https://doi.org/10.1093/nar/gki359
  19. R. Nair, B. Rost, "Inferring sub-cellular localization through automated lexical analysis," Bioinformatics, 18 Supple(1), S78-S86, 2002. https://doi.org/10.1093/bioinformatics/18.suppl_1.S78
  20. S. Brady, H. Shatkay, "EpiLoc: a (working) text-based system for predicting protein subcellular location," Pac. Symp. Biocomput., pp.604-615, 2008.
  21. A. Fyshe, Y. Liu, D. Szafron, R. Greiner, P. Lu, "Improving subcellular localization prediction using text classification and the Gene Ontology," Bioinformatics, vol.24, no.21, pp.2512-2517, 2008. https://doi.org/10.1093/bioinformatics/btn463
  22. Z. Lei, Y. Dai, "Assessing protein similarity with Gene Ontology and its use in subnuclear localization prediction," BMC Bioinformatics, vol.7, no.491, 2006.
  23. W.-L. Huang, et al., "ProLoc-GO: Utilizing informative Gene Ontology terms for sequence-based prediction of protein subcellular localization," BMC Bioinformatics, vol.9, no.80, 2008.
  24. S. Briesemeister, et al., "SherLoc2: A high-accuracy hybrid method for predicting subcellular localization of proteins," J. Proteome Research, vol.8, no.11, pp.5363-5366, 2009. https://doi.org/10.1021/pr900665y
  25. S. Henikoff, J. G. Henikoff, "Amino acid substitution matrices from protein blocks," proc. natl. acad. sci., 89, pp.11915-11919, 1992. https://doi.org/10.1073/pnas.89.24.11915
  26. S. F. Altschul, et al., "Gapped BLAST and PSIBLAST: a new generation of protein database search programs," Nucleic Acids Res., 25, pp.3389-3402, 1997. https://doi.org/10.1093/nar/25.17.3389
  27. R. Nair, B. Rost, "Sequence conserved for subcellular localization," Protein Sci., 11, pp.2836-2847, 2002.
  28. C. S. Yu, Y. C. Chen, C. H. Lu, J. K. Hwang, "Prediction of protein subcellular localization," Proteins, 64, pp.643-651, 2006. https://doi.org/10.1002/prot.21018
  29. A. Bairoch, et al., "The universal protein resource (UniProt) in 2010," Nucleic Acids Res., D142-D148, 2010.
  30. M. Ashburner, et al., "Gene ontology: tool for the unification of biology. The Gene Ontology Consortium," Nat Genet, 25, pp.25-29, 2000. https://doi.org/10.1038/75556
  31. D. Barrell, et al., "The GOA database in 2009-an integrated gene ontology annotation resource," Nucleic Acids Res., 37, Database issue doi:10.1093 /nar/gkn803, 2009.
  32. S. Hunter, et al., "InterPro: the integrative protein signature database," Nucleic Acids Res., 37, Database issue D211-D215, 2009. https://doi.org/10.1093/nar/gkn785
  33. R. Casadio, P. L. Martelli, A. Pierleoni, "The prediction of protein subcellular localization from sequence: a shortcut to functional genome annotation," Brief Funct Genomic Proteomics, 7, pp.63-73, 2008. https://doi.org/10.1093/bfgp/eln003
  34. C.-C. Chang, C.-J. Lin, LIBSVM : a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
  35. A. Reinhardt, T. Hubbard, "Using neural networks for prediction of the subcellular location of proteins," Nucleic Acids Res., 26, pp.2230-2236, 1998. https://doi.org/10.1093/nar/26.9.2230
  36. M. Bhasin, G. P. S. Raghava, "ESLpred: SVMbased method for subcellular localization of eukaryotic proteins using dipeptide composition and PSI-BLAST," Nucleic Acids Res., 32, W414- W419, 2004. https://doi.org/10.1093/nar/gkh350