DOI QR코드

DOI QR Code

다중레이블 조합을 사용한 단백질 세포내 위치 예측

Multi-Label Combination for Prediction of Protein Subcellular Localization

  • Chi, Sang-Mun (School of Computer Science and Engineering, Kyungsung University)
  • 투고 : 2014.05.24
  • 심사 : 2014.06.27
  • 발행 : 2014.07.31

초록

단백질이 존재하는 세포내 위치에 대한 지식은 단백질의 기능과 관련된 중요한 정보이다. 본 논문은 개선된 레이블 멱집합 다중레이블 분류방법을 제안하여 단백질이 존재하는 세포내의 다중 위치를 예측한다. 다중레이블 분류 방법 중에서 레이블 멱집합 방법은 특정 생물학적 기능을 수행하는 단백질의 세포내 위치간의 연관 관계를 효과적으로 모델링할 수 있다. 본 논문은 다중레이블을 다른 다중레이블들의 선형조합으로 나타낼 때의 조합가중치를 제약조건이 있는 최적화를 통하여 구하고, 이를 사용하여 여러 다중레이블의 예측 확률들을 조합하여 최종적인 예측을 수행한다. 인간 단백질 자료에 대한 실험에서 제안한 방법이 다른 단백질 세포내 위치 예측 방법에 비하여 높은 성능을 보였다. 이는 제안한 방법이 레이블 멱집합 방법에서 사용되는 다중레이블들내에 존재하는 중복 정보를 이용하여 다중 레이블의 예측확률을 성공적으로 강화할 수 있기 때문이다.

Knowledge about protein subcellular localization provides important information about protein function. This paper improves a label power-set multi-label classification for the accurate prediction of subcellular localization of proteins which simultaneously exist at multiple subcellular locations. Among multi-label classification methods, label power-set method can effectively model the correlation between subcellular locations of proteins performing certain biological function. With constrained optimization, this paper calculates combination weights which are used in the linear combination representation of a multi-label by other multi-labels. Using these weights, the prediction probabilities of multi-labels are combined to give final prediction results. Experimental results on human protein dataset show that the proposed method achieves higher performance than other prediction methods for protein subcellular localization. This shows that the proposed method can successfully enrich the prediction probability of multi-labels by exploiting the overlapping information between multi-labels.

키워드

참고문헌

  1. H.-B. Shen and K.-C. Chou, "A top-down approach to enhance the power of predicting human protein subcellular localization: Hum-mPLoc 2.0," Anaytical Biochemistry, vol. 394, no. 2, pp. 269-274, 2009. https://doi.org/10.1016/j.ab.2009.07.046
  2. S.-M. Chi and D. Nam, "WegoLoc: accurate prediction of protein subcellular localization using weighted gene ontology terms," Bioinformatics, vol. 28, no. 7, pp. 1028- 1030, 2012. https://doi.org/10.1093/bioinformatics/bts062
  3. J. He, H. Gu, and W. Liu, "Imbalanced multi-modal multi-label learning for subcellular localization prediction of human proteins with both single and multiple sites," Plos One, vol. 7, no. 6, e37155, 2012. https://doi.org/10.1371/journal.pone.0037155
  4. S. Mei, "Multi-label multi-kernel transfer learning for human protein subcellular localization," Plos One, vol. 7, no. 6, e37716, 2012. https://doi.org/10.1371/journal.pone.0037716
  5. G.-Z. Li, X. Wang, X. Hu, J.-M. Liu, and R.-W. Zhao, "Multilabel learning for protein subcellular location prediction," IEEE transactions on Nanobioscience, vol. 11, no. 3, pp. 237-243, 2012. https://doi.org/10.1109/TNB.2012.2212249
  6. S. Wan, M.-W. Mak, and S.-Y. Kung, "mGOASVM: multi-label protein subcellular localization based on gene ontology and support vector machines," BMC Bioinformatics, 13:290, 2012. https://doi.org/10.1186/1471-2105-13-290
  7. W.-Z. Lin, J.-A. Fang, X. Xiao, and K.-C. Chou, "iLoc-Animal: a multi-label learning classifier for predicting subcellular localization of animal proteins," Molecular BioSystems, vol. 9, no. 4, pp. 634-644, 2013. https://doi.org/10.1039/c3mb25466f
  8. X. Wang and G.-Z. Li, "Multilabel learning via random label selection for protein subcellular multilocations prediction," IEEE transactions on computational biology and bioinformatics, vol. 10, no. 2, pp. 436-446, 2013. https://doi.org/10.1109/TCBB.2013.21
  9. S.-M. Chi, "A performance comparison of multi-label classification methods for protein subcellular localization prediction," Journal of the Korea Institute of Information and Communication Engineering, vol. 18, no. 4, pp. 992- 999, Apr. 2014. https://doi.org/10.6109/jkiice.2014.18.4.992
  10. H. Lodish, et al., Molecular cell biology, 6th ed. New York, NY:W. H. Freeman and Company, 2008.
  11. G. Tsoumakas, I. Katakis, and I. Vlahavas, "Mining multilabel data," in Data Mining and Knowledge Discovery Handbook. Boston, MA: Springer, ch. 34, pp. 667-685, 2010.
  12. G. Madjarov, D. Kocev, D. Gjorgjevikj, and S. Dzeroski, "An extensive experimental comparison of methods for multi-label learning," Pattern Recognition, vol. 45, no. 9, pp. 3084-3104, 2012. https://doi.org/10.1016/j.patcog.2012.03.004
  13. M.-L. Zhang and Z-H. Zhou, "A review on multi-label learning algorithms," IEEE transactions on knowledge and data engineering, http://doi.ieeecomputersociety.org/10.1109 /TKDE.2013.39.
  14. J. Read, B. Pfahringer, H. Geoff, and F. Eibe, "Classifier Chains for Multi-label Classification," Machine Learning, vol. 85, no. 3. pp. 335-359, 2011.
  15. J. Read, B. Pfahringer, and H. Geoff, "Multi-Label Classification using Ensembles of Pruned Sets," in Proceeding of the 8th IEEE International Conference on Data Mining, pp. 995-1000, 2008.
  16. S.-M. Chi, "Prediction of protein subcellular localization by weighted gene ontology terms," Biochemical and biophysical research communications, vol. 399, no. 3, pp. 402-405, 2010. https://doi.org/10.1016/j.bbrc.2010.07.086
  17. M. Grant and S. Boyd, CVX: Matlab software for disciplined convex programming, version 2.0 beta. http://cvxr.com/cvx, September 2013.
  18. G. Tsoumakas, E. Spyromitros-Xioufis, J. Vilcek, I. Vlahavas, "Mulan: a java library for multi-Label learning," Journal of Machine Learning Research, vol. 12, pp. 2411- 2414. 2011.
  19. C.-C. Chang and C.-J. Lin, "LIBSVM : a library for support vector machines," ACM Transactions on Intelligent Systems and Technology, vol. 2, Issue 3, pp. 27:1-27:27, 2011.