PubMine: An Ontology-Based Text Mining System for Deducing Relationships among Biological Entities

Kim, Tae-Kyung;Oh, Jeong-Su;Ko, Gun-Hwan;Cho, Wan-Sup;Hou, Bo-Kyeng;Lee, Sang-Hyuk;

doi:10.4051/ibc.2011.3.2.0007

Interdisciplinary Bio Central

Volume 3 Issue 2
/
Pages.7.1-7.6
/
2011
/
2005-8543(eISSN)

Korean Society for Bioinformatics (한국생명정보학회)

DOI QR Code

PubMine: An Ontology-Based Text Mining System for Deducing Relationships among Biological Entities

Kim, Tae-Kyung (Korean Bioinformation Center (KOBIC), Korea Research Institute of Bioscience (KRIBB) & Biotechnology) ;
Oh, Jeong-Su (Korean Bioinformation Center (KOBIC), Korea Research Institute of Bioscience (KRIBB) & Biotechnology) ;
Ko, Gun-Hwan (Korean Bioinformation Center (KOBIC), Korea Research Institute of Bioscience (KRIBB) & Biotechnology) ;
Cho, Wan-Sup (Department of Management Information System, Chungbuk National University) ;
Hou, Bo-Kyeng (Korean Bioinformation Center (KOBIC), Korea Research Institute of Bioscience (KRIBB) & Biotechnology) ;
Lee, Sang-Hyuk (Korean Bioinformation Center (KOBIC), Korea Research Institute of Bioscience (KRIBB) & Biotechnology)

Received : 2011.04.08
Accepted : 2011.04.25
Published : 2011.06.30

https://doi.org/10.4051/ibc.2011.3.2.0007 Citation PDF

Download PDF

⟨ Previous Next ⟩

Abstract

Background: Published manuscripts are the main source of biological knowledge. Since the manual examination is almost impossible due to the huge volume of literature data (approximately 19 million abstracts in PubMed), intelligent text mining systems are of great utility for knowledge discovery. However, most of current text mining tools have limited applicability because of i) providing abstract-based search rather than sentence-based search, ii) improper use or lack of ontology terms, iii) the design to be used for specific subjects, or iv) slow response time that hampers web services and real time applications. Results: We introduce an advanced text mining system called PubMine that supports intelligent knowledge discovery based on diverse bio-ontologies. PubMine improves query accuracy and flexibility with advanced search capabilities of fuzzy search, wildcard search, proximity search, range search, and the Boolean combinations. Furthermore, PubMine allows users to extract multi-dimensional relationships between genes, diseases, and chemical compounds by using OLAP (On-Line Analytical Processing) techniques. The HUGO gene symbols and the MeSH ontology for diseases, chemical compounds, and anatomy have been included in the current version of PubMine, which is freely available at http://pubmine.kobic.re.kr. Conclusions: PubMine is a unique bio-text mining system that provides flexible searches and analysis of biological entity relationships. We believe that PubMine would serve as a key bioinformatics utility due to its rapid response to enable web services for community and to the flexibility to accommodate general ontology.

Keywords

References

Tanabe, L., Scherf, U., Smith, L.H., Lee, J.K., Hunter, L., and Weinstein, J.N. (1999). MedMiner: an Internet text-mining tool for biomedical information, with application to gene expression profiling. Biotechniques 27, 1210-1214, 1216-1217.
Cheng , D., Knox, C., Young, N., Stothard, P., and Damaraju, S. (2008). PolySearch: a webbased text mining system for extracting relationships between human diseases, genes, mutations, drugs and metabolites. Nucleic Acids Res 36, 399-405. https://doi.org/10.1093/nar/gkn296
Wheeler, D.L., Barrett, T., Benson, D.A., Bryant, S.H., Canese, K., Church, D.M., DiCuccio, M., Edgar, R., Federhen, S., Helmberg, W., et al. (2005). Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 33, D39-45.
Hoffmann, R., and Valencia, A. (2005). Implementing the iHOP concept for navigation of biomedical literature. Bioinformatics 21 Suppl 2, ii252-258.
Tsuruoka, Y., Tsujii, J., and Ananiadou, S. (2008). FACTA: a text search engine for finding associated biomedical concepts. Bioinformatics 24, 2559-2560. https://doi.org/10.1093/bioinformatics/btn469
Andreas, D.a.M., S. (2005). GoPubMed: expoloring PubMed with the Gene Ontology. Nucleic Acids Res 33, 1210-1217.
O'Brien, J. (2009). "Marakas G: Management Information Systems", 9th Edition (New York: McGraw-Hill Higher Education).
Navarro, G. (2001). A guided tour to approximate string matching. ACM Computing Surveys 33, 31-88. https://doi.org/10.1145/375360.375365
Lucene, http://lucene.apache.org/java/docs/index.html.
AJAX, http://www.w3schools.com/ajax/default.asp.
ICEfaces, http://www.icefaces.org/main/home/.
Specialist NLP Tool, http://lexsrv3.nlm.nih.gov/Specialist/Home/ index.html.
Zobel: Inverted Files for Text Search Engines. ACM Computing Surveys 2006, 38 (2):6. https://doi.org/10.1145/1132956.1132959
MeSH, http://www.nlm.nih.gov/mesh/meshhome.html.
HUGO Gene, http://www.genenames.org/.

Interdisciplinary Bio Central

PubMine: An Ontology-Based Text Mining System for Deducing Relationships among Biological Entities

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)