PubMine: An Ontology-Based Text Mining System for Deducing Relationships among Biological Entities

  • Kim, Tae-Kyung (Korean Bioinformation Center (KOBIC), Korea Research Institute of Bioscience (KRIBB) & Biotechnology) ;
  • Oh, Jeong-Su (Korean Bioinformation Center (KOBIC), Korea Research Institute of Bioscience (KRIBB) & Biotechnology) ;
  • Ko, Gun-Hwan (Korean Bioinformation Center (KOBIC), Korea Research Institute of Bioscience (KRIBB) & Biotechnology) ;
  • Cho, Wan-Sup (Department of Management Information System, Chungbuk National University) ;
  • Hou, Bo-Kyeng (Korean Bioinformation Center (KOBIC), Korea Research Institute of Bioscience (KRIBB) & Biotechnology) ;
  • Lee, Sang-Hyuk (Korean Bioinformation Center (KOBIC), Korea Research Institute of Bioscience (KRIBB) & Biotechnology)
  • Received : 2011.04.08
  • Accepted : 2011.04.25
  • Published : 2011.06.30


Background: Published manuscripts are the main source of biological knowledge. Since the manual examination is almost impossible due to the huge volume of literature data (approximately 19 million abstracts in PubMed), intelligent text mining systems are of great utility for knowledge discovery. However, most of current text mining tools have limited applicability because of i) providing abstract-based search rather than sentence-based search, ii) improper use or lack of ontology terms, iii) the design to be used for specific subjects, or iv) slow response time that hampers web services and real time applications. Results: We introduce an advanced text mining system called PubMine that supports intelligent knowledge discovery based on diverse bio-ontologies. PubMine improves query accuracy and flexibility with advanced search capabilities of fuzzy search, wildcard search, proximity search, range search, and the Boolean combinations. Furthermore, PubMine allows users to extract multi-dimensional relationships between genes, diseases, and chemical compounds by using OLAP (On-Line Analytical Processing) techniques. The HUGO gene symbols and the MeSH ontology for diseases, chemical compounds, and anatomy have been included in the current version of PubMine, which is freely available at Conclusions: PubMine is a unique bio-text mining system that provides flexible searches and analysis of biological entity relationships. We believe that PubMine would serve as a key bioinformatics utility due to its rapid response to enable web services for community and to the flexibility to accommodate general ontology.



  1. Tanabe, L., Scherf, U., Smith, L.H., Lee, J.K., Hunter, L., and Weinstein, J.N. (1999). MedMiner: an Internet text-mining tool for biomedical information, with application to gene expression profiling. Biotechniques 27, 1210-1214, 1216-1217.
  2. Cheng , D., Knox, C., Young, N., Stothard, P., and Damaraju, S. (2008). PolySearch: a webbased text mining system for extracting relationships between human diseases, genes, mutations, drugs and metabolites. Nucleic Acids Res 36, 399-405.
  3. Wheeler, D.L., Barrett, T., Benson, D.A., Bryant, S.H., Canese, K., Church, D.M., DiCuccio, M., Edgar, R., Federhen, S., Helmberg, W., et al. (2005). Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 33, D39-45.
  4. Hoffmann, R., and Valencia, A. (2005). Implementing the iHOP concept for navigation of biomedical literature. Bioinformatics 21 Suppl 2, ii252-258.
  5. Tsuruoka, Y., Tsujii, J., and Ananiadou, S. (2008). FACTA: a text search engine for finding associated biomedical concepts. Bioinformatics 24, 2559-2560.
  6. Andreas, D.a.M., S. (2005). GoPubMed: expoloring PubMed with the Gene Ontology. Nucleic Acids Res 33, 1210-1217.
  7. O'Brien, J. (2009). "Marakas G: Management Information Systems", 9th Edition (New York: McGraw-Hill Higher Education).
  8. Navarro, G. (2001). A guided tour to approximate string matching. ACM Computing Surveys 33, 31-88.
  9. Lucene,
  10. AJAX,
  11. ICEfaces,
  12. Specialist NLP Tool, index.html.
  13. Zobel: Inverted Files for Text Search Engines. ACM Computing Surveys 2006, 38 (2):6.
  14. MeSH,
  15. HUGO Gene,