DOI QR코드

DOI QR Code

Extraction of Protein-Protein Interactions based on Convolutional Neural Network (CNN)

Convolutional Neural Network (CNN) 기반의 단백질 간 상호 작용 추출

  • Received : 2016.10.31
  • Accepted : 2016.12.14
  • Published : 2017.03.15

Abstract

In this paper, we propose a revised Deep Convolutional Neural Network (DCNN) model to extract Protein-Protein Interaction (PPIs) from the scientific literature. The proposed method has the merit of improving performance by applying various global features in addition to the simple lexical features used in conventional relation extraction approaches. In the experiments using AIMed, which is the most famous collection used for PPI extraction, the proposed model shows state-of-the art scores (78.0 F-score) revealing the best performance so far in this domain. Also, the paper shows that, without conducting feature engineering using complicated language processing, convolutional neural networks with embedding can achieve superior PPIE performance.

본 논문에서는 학술 문헌에서 표현된 단백질 간 상호 작용(Protein-Protein Interaction) 정보를 자동으로 추출하기 위한 확장된 형태의 Convolutional Neural Network (CNN) 모델을 제안한다. 이 모델은 기존에 관계 추출(Relation Extraction)을 위해 고안된 단순 자질 기반의 CNN 모델을 확장하여 다양한 전역 자질들을 추가적으로 적용함으로써 성능을 개선할 수 있는 장점이 있다. PPI 추출 성능 평가를 위해서 많이 활용되고 있는 준거 평가 컬렉션인 AIMed를 이용한 실험에서 F-스코어 기준으로 78.0%를 나타내어 현재까지 도출된 세계 최고 성능에 비해 8.3% 높은 성능을 나타내었다. 추가적으로 CNN 모델이 복잡한 언어 처리를 통한 자질 추출 작업을 하지 않고도 단백질간 상호 작용 추출에 높은 성능을 나타냄을 보였다.

Keywords

Acknowledgement

Grant : 초고성능컴퓨팅 기반 건강한 고령사회 대응 빅데이터 기술개발

Supported by : 한국과학기술정보연구원

References

  1. L. Li, R. Guo, Z. Jiang, and D. Huang, "An approach to improve kernel-based Protein-Protein Interaction extraction by learning from large-scale network data," Methods, 2015.
  2. A. Airola, S. Pyysalo, J. Bjorne, T. Pahikkala, F. Ginter, and T. Salakoski, "All-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning," BMC Bioinformatics, Vol. 9, No. 11, pp. 1-12, 2008. https://doi.org/10.1186/1471-2105-9-1
  3. M. Miwa, R. Sætre, Y. Miyao, and J. Tsujii, "Protein- protein interaction extraction by leveraging multiple kernels and parsers," International Journal of Medical Informatics, Vol. 78, No. 12, pp. e39-e46, Dec. 2009. https://doi.org/10.1016/j.ijmedinf.2009.04.010
  4. S.-P. Choi and S.-H. Myaeng, "Simplicity is Better: Revisiting Single Kernel PPI Extraction," Proc. of the 23rd International Conference on Computational Linguistics, Stroudsburg, PA, USA, 2010, pp. 206-214.
  5. R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa, "Natural Language Processing (Almost) from Scratch," Journal of Machine Learning Research, Vol. 12, pp. 2493-2537, Nov. 2011.
  6. D. Zeng, K. Liu, S. Lai, G. Zhou, and J. Zhao, "Relation classification via convolutional deep neural network," Proc. of COLING, 2014, pp. 2335-2344.
  7. R. Bunescu et al., "Comparative experiments on learning information extractors for proteins and their interactions," Artificial intelligence in medicine, Vol. 33, No. 2, pp. 139-155, 2005. https://doi.org/10.1016/j.artmed.2004.07.016
  8. S. Pyysalo, A. Airola, J. Heimonen, J. Bjorne, F. Ginter, and T. Salakoski, "Comparative analysis of five protein-protein interaction corpora," BMC bioinformatics, Vol. 9, No. Suppl 3, p. S6, 2008.
  9. C. D. Manning, M. Surdeanu, J. Bauer, J. Finkel, S. J. Bethard, and D. McClosky, "The Stanford CoreNLP Natural Language Processing Toolkit," Association for Computational Linguistics (ACL) System Demonstrations, pp. 55-60, 2014.
  10. R. Saetre, K. Yoshida, M. Miwa, T. Matsuzaki, Y. Kano, and J.íichi Tsujii, "Extracting protein interactions from text with the unified AkaneRE event extraction system," IEEE/ACM Trans Comput Biol Bioinform, Vol. 7, No. 3, pp. 442-453, Sep. 2010. https://doi.org/10.1109/TCBB.2010.46
  11. Z. Yang, N. Tang, X. Zhang, H. Lin, Y. Li, and Z. Yang, "Multiple kernel learning in protein-protein interaction extraction from biomedical literature," Artificial Intelligence in Medicine, Vol. 51, No. 3, pp. 163-173, Mar. 2011. https://doi.org/10.1016/j.artmed.2010.12.002