Utilizing Various Natural Language Processing Techniques for Biomedical Interaction Extraction

  • Park, Kyung-Mi (Dept. of Computer Science and Engineering, Korea University) ;
  • Cho, Han-Cheol (Dept. of Computer Science and Engineering, Korea University) ;
  • Rim, Hae-Chang (Dept. of Computer Science and Engineering, Korea University)
  • Received : 2010.11.25
  • Accepted : 2011.04.12
  • Published : 2011.09.30


The vast number of biomedical literature is an important source of biomedical interaction information discovery. However, it is complicated to obtain interaction information from them because most of them are not easily readable by machine. In this paper, we present a method for extracting biomedical interaction information assuming that the biomedical Named Entities (NEs) are already identified. The proposed method labels all possible pairs of given biomedical NEs as INTERACTION or NO-INTERACTION by using a Maximum Entropy (ME) classifier. The features used for the classifier are obtained by applying various NLP techniques such as POS tagging, base phrase recognition, parsing and predicate-argument recognition. Especially, specific verb predicates (activate, inhibit, diminish and etc.) and their biomedical NE arguments are very useful features for identifying interactive NE pairs. Based on this, we devised a twostep method: 1) an interaction verb extraction step to find biomedically salient verbs, and 2) an argument relation identification step to generate partial predicate-argument structures between extracted interaction verbs and their NE arguments. In the experiments, we analyzed how much each applied NLP technique improves the performance. The proposed method can be completely improved by more than 2% compared to the baseline method. The use of external contextual features, which are obtained from outside of NEs, is crucial for the performance improvement. We also compare the performance of the proposed method against the co-occurrence-based and the rule-based methods. The result demonstrates that the proposed method considerably improves the performance.


Biomedical Interaction Extraction;Natural Language Processing;Interaction Verb Extraction;Argument Relation Identification


  1. A. Madkour, K. Darwish, H. Hassan, A. Hassan, and O. Emam, "BioNoculars: Extracting Protein-Protein Interactions from Biomedical Text", Association for Computational Linguistics, 2007.
  2. M. Craven, "Learning to extract relations from Medline", In Proceedings of the AAAI-99 Workshop on Machine Learning for Information Extraction, 1999.
  3. S. Ray, and M. Craven, "Representing sentence structure in Hidden Markov Models for information extraction", In Proceedings of the International Joint Conference on Artificial Intelligence, 2001.
  4. U. Hahn, M. Romacker, and S. Schulz, "Creating Knowledge Repositories from Biomedical Reports : The MEDSYNDIKATE Text Mining System", In Proceedings of the Pacific Symposium on Biocomputing, 2002.
  5. P. Srinivasan, and T. Rindflesch, "Exploring text mining from Medline", In Proceedings of the American Medical Informatics Association Symposium, 2002.
  6. T. Rindflesch, L. Hunter, and A. Aronson, "Mining molecular binding terminology from biomedical text", In Proceedings of the American Medical Informatics Association Symposium, 1999.
  7. J. Pustejovsky, J. Castano, J. Zhang, M. Kotecki, and B. Cochran, "Robust relational parsing over biomedical literature: Extracting inhibit relations", In Proceedings of the Pacific Symposium on Biocomputing, 2002.
  8. J. Pustejovsky, J. Castano, R. Sauri, A. Rumshinsky, J. Zhang, and W. Luo, "Medstract: Creating large-scale information servers for biomedical libraries", In Proceedings of the ACL-02 the Workshop on Natural Language Processing in the Biomedical Domain, 2002.
  9. C. Friedman, P. Kra, H. Yu, and M. Krauthammer, A. Rzhetsky, "GENIES : A Natural-Language Processing System for the Extraction of Molecular Pathways from Journal Articles", Bioinformatics, 2001.
  10. R. Feldman, Y. Regev, M. Finkelstein-Landau, E. Hurvitz, and B. Kogan, "Mining biomedical literature using information extraction", Current Drug Discovery, 2002.
  11. B. Rosario, and M. Hearst, "Classifying semantic relations in bioscience texts", In Proceedings of the Annual Meeting of the Association for Computational Linguistics, 2004.
  12. J. Xiao, J. Su, G. Zhou, and C. Tan, "Protein-Protein Interaction Extraction: A Supervised Learning Approach", In proceedings of the 1st International Symposium on Semantic Mining in Biomedicine, 2005.
  13. A. Berger, S. Pietra, and V. Pietra, "A maximum-entropy approach to natural language processing", Computational Linguistics, 1996.
  14. T. Brants, "TnT - A statistical Part-of-Speech Tagger", In Proceedings of the 6th Applied Natural Language Processing, 2000.
  15. T. Kudoh, and Y. Matsumoto, "Use of support vector learning for chunk identification", In Proceedings of the 3rd Conference on Natural Language Learning, 2000.
  16. E. Charniak, "A Maximum-Entropy-Inspired Parser", In Proceedings of the North American Chapter of the Association for Computational Linguistics, 2000.
  17. K.M. Park, Y.S. Hwang, and H.C. Rim, "Two-Phase Semantic Role Labeling based on Support Vector Machines", In Proceedings of the 7th Conference on Natural Language Learning, 2004.
  18. S. Buchholz, "Memory-Based Grammatical Relation Finding", PhD. Thesis, Tilburg University, 2002.
  19. K.M. Park, and H.C. Rim, "Maximum Entropy based Semantic Role Labeling", In Proceedings of the 8th Conference on Natural Language Learning, 2005.
  20. J. Ding, D. Berleant, D. Nettleton, and E. Wurtele, "Mining Medline: abstracts, sentences, or phrases?", In Proceedings of the Pacific Symposium on Biocomputing, 2002.
  21. J. Ding, D. Berleant, J. Xu, and A. Fulmer, "Extracting Biochemical Interactions from MEDLINE Using a Link Grammar Parser", In Proceedings of the 15th IEEE International Conference on Tools with Artificial Intelligence, 2003.
  22. S. Chen, and R. Rosenfeld, "A Gaussian prior for smoothing maximum entropy models", Technical Report CMUCS-99-108, Carnegie Mellon University, 1999.
  23. E. Riloff, "The Sundance sentence analyzer",, 1998.
  24. J.D. Kim, T. Ohta, and J. Tsujii, "Corpus annotation for mining biomedical events from literature", BMC Bioinformatics, 2008.

Cited by

  1. A Maximum Entropy-Based Bio-Molecular Event Extraction Model that Considers Event Generation 2014,
  2. Building neural network language model with POS-based negative sampling and stochastic conjugate gradient descent vol.22, pp.20, 2018,