DOI QR코드

DOI QR Code

A novel method for predicting protein subcellular localization based on pseudo amino acid composition

  • Ma, Junwei (School of Control Science and Engineering, Dalian University of Technology) ;
  • Gu, Hong (School of Control Science and Engineering, Dalian University of Technology)
  • Received : 2010.05.24
  • Accepted : 2010.08.12
  • Published : 2010.10.31

Abstract

In this paper, a novel approach, ELM-PCA, is introduced for the first time to predict protein subcellular localization. Firstly, Protein Samples are represented by the pseudo amino acid composition (PseAAC). Secondly, the principal component analysis (PCA) is employed to extract essential features. Finally, the Elman Recurrent Neural Network (RNN) is used as a classifier to identify the protein sequences. The results demonstrate that the proposed approach is effective and practical.

Keywords

References

  1. Glory, E. and Murphy, R. (2007) Automated subcellular location determination and high-throughput microscopy. Dev. Cell 12, 7-16. https://doi.org/10.1016/j.devcel.2006.12.007
  2. Chou, K. C. and Shen, H. B. (2008) Cell-ploc: a package of web servers for predicting subcellular localization of proteins in various organisms. Nat. Protoc. 3, 153-162. https://doi.org/10.1038/nprot.2007.494
  3. Nakashima, H. and Nishikawa, K. (1994) Discrimination of intracellular and extracellular proteins using amino acid composition and residue-pair frequencies. J. Mol. Biol. 238, 54-61. https://doi.org/10.1006/jmbi.1994.1267
  4. Chou, K. C. (1995) A novel approach to predicting protein structural classes in a (20-1)-d amino acid composition space. Proteins: Struct. Funct. Genet. 21, 319-344 https://doi.org/10.1002/prot.340210406
  5. Cedano, J. Aloy, P. Perez-Pons, J. and Querol, E. (1997) Relation between amino acid composition and cellular location of proteins. J. Mol. Biol. 266, 594-600. https://doi.org/10.1006/jmbi.1996.0804
  6. Shen, H. B. and Chou, K. C. (2005) Predicting protein subnuclear location with optimized evidence-theoretic k-nearest classifier and pseudo amino acid composition. Biochem. Biophys. Res. Commun. 337, 752-756. https://doi.org/10.1016/j.bbrc.2005.09.117
  7. Lei, Z. and Dai, Y. (2005) An SVM-based system for predicting protein subnuclear localizations. BMC Bioinformatics. 6, 291-298. https://doi.org/10.1186/1471-2105-6-291
  8. Huang, W. Tung, C., Huang, H. and Ho, S. (2009) Predicting protein subnuclear localization using GO-amino-acid composition features. Biosystems. 98, 73-79. https://doi.org/10.1016/j.biosystems.2009.06.007
  9. Glory, E. and Murphy, R. F. (2007) Automated subcellular location determination and high-throughput microscopy. Dev. Cell. 12, 7-16. https://doi.org/10.1016/j.devcel.2006.12.007
  10. Shen, H. B. and Chou, K. C. (2009) A top-down approach to enhance the power of predicting human protein subcellular localization: Hum-mPLoc 2.0. Anal. Biochem. 394, 269-274. https://doi.org/10.1016/j.ab.2009.07.046
  11. Chou, K. C. and Shen, H. B. (2008) Cell-PLoc: a package of web servers for predicting subcellular localization of proteins in various organisms. Nat. Protoc. 3, 153-162. https://doi.org/10.1038/nprot.2007.494
  12. Chou, K. C. (2001) Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins: Struct. Funct. Genet. 43, 246-255. https://doi.org/10.1002/prot.1035
  13. Ding, Y. S. and Zhang, T. L. (2008) Using chou’s pseudo amino acid composition to predict subcellular localization of apoptosis proteins: an approach with immune genetic algorithm-based ensemble classifier. Pattern Recognit. Lett. 29, 1887-1892. https://doi.org/10.1016/j.patrec.2008.06.007
  14. Shen, H. B. and Chou, K. C. (2007) PseAAC: a flexible web server for generating various kinds of protein pseudo amino acid composition. Anal. Biochem. 373, 386-388. https://doi.org/10.1016/j.ab.2007.10.012
  15. Zeng, Y., Guo, Y., Xiao, R., Yang, L., Yu, L. and Li, M. (2009) Using the augmented chou's pseudo amino acid composition for predicting protein submitochondria locations based on auto covariance approach. J. Theor. Biol. 259, 366-372. https://doi.org/10.1016/j.jtbi.2009.03.028
  16. Shen, H. B. and Chou, K. C. (2007) Nuc-PLoc: a new web- server for predicting protein subnuclear localization by fusing PseAA composition and PsePSSM. Protein Eng. Des. Sel. 20, 561-567. https://doi.org/10.1093/protein/gzm057
  17. Shen, H. B. and Chou, K. C. (2006) Ensemble classifier for protein fold pattern recognition. Bioinformatics. 22, 1717-1722. https://doi.org/10.1093/bioinformatics/btl170
  18. Jolliffe, I. (2002) Principal component analysis. pp. 29-43, Springer-Verlag, Second Edition, New York, USA
  19. Elman, J. (1990) Finding structure in time. Cog. Sci. 14, 179-211. https://doi.org/10.1207/s15516709cog1402_1
  20. Shi, X. H., Liang, Y. C., Lee, H. P., Lin, W. Z., Xu, X. and Lim, S. P. (2004) Improved elman networks and applications for controlling ultrasonic motors. Appl. Artif. Intell. 18, 603-629. https://doi.org/10.1080/08839510490483279
  21. Dehling, H., Fleurke, S. and Klske, C. (2008) Parking on a random tree. J. Stat. Phys. 133, 151-157. https://doi.org/10.1007/s10955-008-9589-9
  22. Witten, I. and Frank, E. (2005) Data Mining: practical machine learning tools and techniques. pp.189-283, Morgan Kaufmann Publishers, Second Edition, San Francisco, USA.
  23. Yousef, M., Jung, S., Kossenkov, A., Showe, L. S. and Showe, M. (2007) Naive Bayes for microRNA target predictions-machine learning for microRNA targets. Bioinformatics. 23, 2987-2992. https://doi.org/10.1093/bioinformatics/btm484
  24. Bhasin, M., Garg, A. and Raghava, G. P. S. (2005) PSLpred: prediction of subcellular localization of bacterial proteins. Bioinformatics 21, 2522-2524. https://doi.org/10.1093/bioinformatics/bti309
  25. Gardy, J., Laird, M., Chen, F., Rey, S., Walsh, C., Ester, M. and Brinkman, F. (2005) Psortb v. 2.0: expanded prediction of bacterial protein subcellular localization and insights gained from comparative proteome analysis. Bioinformatics 21, 617-623. https://doi.org/10.1093/bioinformatics/bti057
  26. Yu, C. S., Lin, C. J. and Hwang, J. K. (2004) Predicting subcellular localization of proteins for Gram-negative bacteria by support vector machines based on n-peptide compositions. Protein Sci. 13, 1402-1406. https://doi.org/10.1110/ps.03479604
  27. Szafron, D., Lu, P., Greiner, R., Wishart, D., Poulin, B., Eisner, R., Lu, Z., Anvik, J., Macdonell, C., Fyshe, A. and Meeuwis, D. (2004) Proteome Analyst: custom predictions with explanations in a web-based tool for high-throughput proteome annotations. Nucleic Acids. Res. 32, 365-371. https://doi.org/10.1093/nar/gkh485
  28. Imai, K., Asakawa, N., Tsuji, T., Akazawa, F., Ino, A., Sonoyama, M. and Mitaku, S. (2008) SOSUI-GramN: high performance prediction for sub-cellular localization of proteins in Gram-negative bacteria. Bioinformatics. 2, 417-421.
  29. Hoffmann, H. (2007) Kernel pca for novelty detection. Pattern Recognition. 40, 863-874. https://doi.org/10.1016/j.patcog.2006.07.009
  30. Yang J., Gao, X., Zhang D. and Yang, J. Y. (2005) Kernel ICA: an alternative formulation and its application to face recognition. Pattern Recognition 38, 1784-1787. https://doi.org/10.1016/j.patcog.2005.01.023
  31. Yu, U., Lee, S. H., Kim, Y. J. and Kim, S. (2004) Bioinformatics in the post-genome era. BMB Rep. 37, 75-82. https://doi.org/10.5483/BMBRep.2004.37.1.075
  32. Ma, J. W., Liu, W. Q. and Gu, H. (2009) Predicting protein subcellular locations for gram-negative bacteria using neural networks ensemble. Proceedings of the 6th Annual IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology, pp.114-120, Tennessee, USA.
  33. Hinton, G. and Salakhutdinov, R. (2006) Reducing the dimensionality of data with neural networks. Science 313, 504-507. https://doi.org/10.1126/science.1127647
  34. Yeung, K. Y. and Ruzzo, W. L. (2001) Principal component analysis for clustering gene expression data. Bioinformatics. 17, 763-774. https://doi.org/10.1093/bioinformatics/17.9.763
  35. Bishop, C. (2006) Pattern recognition and machine learning. pp. 225-284. Springer, New York, USA.

Cited by

  1. Prediction of G-protein coupled receptors and their subfamilies by incorporating various sequence features into Chou's general PseAAC vol.134, 2016, https://doi.org/10.1016/j.cmpb.2016.07.004
  2. Identification of mycobacterial membrane proteins and their types using over-represented tripeptide compositions vol.77, 2012, https://doi.org/10.1016/j.jprot.2012.09.006
  3. PREDICTING SUBCHLOROPLAST LOCATIONS OF PROTEINS BASED ON THE GENERAL FORM OF CHOU'S PSEUDO AMINO ACID COMPOSITION: APPROACHED FROM OPTIMAL TRIPEPTIDE COMPOSITION vol.06, pp.02, 2013, https://doi.org/10.1142/S1793524513500034
  4. Predicting the subcellular localization of mycobacterial proteins by incorporating the optimal tripeptides into the general form of pseudo amino acid composition vol.11, pp.2, 2015, https://doi.org/10.1039/C4MB00645C
  5. An Efficient Approach for Prediction of Nuclear Receptor and Their Subfamilies Based on Fuzzy k-Nearest Neighbor with Maximum Relevance Minimum Redundancy 2016, https://doi.org/10.1007/s40010-016-0325-6
  6. Imbalanced Multi-Modal Multi-Label Learning for Subcellular Localization Prediction of Human Proteins with Both Single and Multiple Sites vol.7, pp.6, 2012, https://doi.org/10.1371/journal.pone.0037155
  7. AcalPred: A Sequence-Based Tool for Discriminating between Acidic and Alkaline Enzymes vol.8, pp.10, 2013, https://doi.org/10.1371/journal.pone.0075726
  8. An efficient approach for the prediction of ion channels and their subfamilies vol.58, 2015, https://doi.org/10.1016/j.compbiolchem.2015.07.002
  9. Identifying the Subfamilies of Voltage-Gated Potassium Channels Using Feature Selection Technique vol.15, pp.7, 2014, https://doi.org/10.3390/ijms150712940
  10. Application of Molecular Methods in the Identification of Ingredients in Chinese Herbal Medicines vol.23, pp.10, 2018, https://doi.org/10.3390/molecules23102728