• Title/Summary/Keyword: bio text mining

Search Result 25, Processing Time 0.017 seconds

A bio-text mining system using keywords and patterns in a grid environment

  • Kwon, Hyuk-Ryul;Jung, Tae-Sung;Kim, Kyoung-Ran;Jahng, Hye-Kyoung;Cho, Wan-Sup;Yoo, Jae-Soo
    • Proceedings of the Korea Society for Industrial Systems Conference
    • /
    • 2007.02a
    • /
    • pp.48-52
    • /
    • 2007
  • As huge amount of literature including biological data is being generated after post genome era, it becomes difficult for researcher to find useful knowledge from the biological databases. Bio-text mining and related natural language processing technique are the key issues in the intelligent knowledge retrieval from the biological databases. We propose a bio-text mining technique for the biologists who find Knowledge from the huge literature. At first, web robot is used to extract and transform related literature from remote databases. To improve retrieval speed, we generate an inverted file for keywords in the literature. Then, text mining system is used for extracting given knowledge patterns and keywords. Finally, we construct a grid computing environment to guarantee processing speed in the text mining even for huge literature databases. In the real experiment for 10,000 bio-literatures, the system shows 95% precision and 98% recall.

  • PDF

PubMine: An Ontology-Based Text Mining System for Deducing Relationships among Biological Entities

  • Kim, Tae-Kyung;Oh, Jeong-Su;Ko, Gun-Hwan;Cho, Wan-Sup;Hou, Bo-Kyeng;Lee, Sang-Hyuk
    • Interdisciplinary Bio Central
    • /
    • v.3 no.2
    • /
    • pp.7.1-7.6
    • /
    • 2011
  • Background: Published manuscripts are the main source of biological knowledge. Since the manual examination is almost impossible due to the huge volume of literature data (approximately 19 million abstracts in PubMed), intelligent text mining systems are of great utility for knowledge discovery. However, most of current text mining tools have limited applicability because of i) providing abstract-based search rather than sentence-based search, ii) improper use or lack of ontology terms, iii) the design to be used for specific subjects, or iv) slow response time that hampers web services and real time applications. Results: We introduce an advanced text mining system called PubMine that supports intelligent knowledge discovery based on diverse bio-ontologies. PubMine improves query accuracy and flexibility with advanced search capabilities of fuzzy search, wildcard search, proximity search, range search, and the Boolean combinations. Furthermore, PubMine allows users to extract multi-dimensional relationships between genes, diseases, and chemical compounds by using OLAP (On-Line Analytical Processing) techniques. The HUGO gene symbols and the MeSH ontology for diseases, chemical compounds, and anatomy have been included in the current version of PubMine, which is freely available at http://pubmine.kobic.re.kr. Conclusions: PubMine is a unique bio-text mining system that provides flexible searches and analysis of biological entity relationships. We believe that PubMine would serve as a key bioinformatics utility due to its rapid response to enable web services for community and to the flexibility to accommodate general ontology.

Inferring Undiscovered Public Knowledge by Using Text Mining Analysis and Main Path Analysis: The Case of the Gene-Protein 'brings_about' Chains of Pancreatic Cancer (텍스트마이닝과 주경로 분석을 이용한 미발견 공공 지식 추론 - 췌장암 유전자-단백질 유발사슬의 경우 -)

  • Ahn, Hyerim;Song, Min;Heo, Go Eun
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.26 no.1
    • /
    • pp.217-231
    • /
    • 2015
  • This study aims to infer the gene-protein 'brings_about' chains of pancreatic cancer which were referred to in the pancreatic cancer related researches by constructing the gene-protein interaction network of pancreatic cancer. The chains can help us uncover publicly unknown knowledge that would develop as empirical studies for investigating the cause of pancreatic cancer. In this study, we applied a novel approach that grafts text mining and the main path analysis into Swanson's ABC model for expanding intermediate concepts to multi-levels and extracting the most significant path. We carried out text mining analysis on the full texts of the pancreatic cancer research papers published during the last ten-year period and extracted the gene-protein entities and relations. The 'brings_about' network was established with bio relations represented by bio verbs. We also applied main path analysis to the network. We found the main direct 'brings_about' path of pancreatic cancer which includes 14 nodes and 13 arcs. 9 arcs were confirmed as the actual relations emerged on the related researches while the other 4 arcs were arisen in the network transformation process for main path analysis. We believe that our approach to combining text mining analysis with main path analysis can be a useful tool for inferring undiscovered knowledge in the situation where either a starting or an ending point is unknown.

BIOLOGY ORIENTED TARGET SPECIFIC LITERATURE MINING FOR GPCR PATHWAY EXTRACTION (GPCR 경로 추출을 위한 생물학 기반의 목적지향 텍스트 마이닝 시스템)

  • KIm, Eun-Ju;Jung, Seol-Kyoung;Yi, Eun-Ji;Lee, Gary-Geunbae;Park, Soo-Jun
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2003.10a
    • /
    • pp.86-94
    • /
    • 2003
  • Electronically available biological literature has been accumulated exponentially in the course of time. So, researches on automatically acquiring knowledge from these tremendous data by text mining technology become more and more prosperous. However, most of the previous researches are technology oriented and are not well focused in practical extraction target, hence result in low performance and inconvenience for the bio-researchers to actually use. In this paper, we propose a more biology oriented target domain specific text mining system, that is, POSTECH bio-text mining system (POSBIOTM), for signal transduction pathway extraction, especially for G protein-coupled receptor (GPCR) pathway. To reflect more domain knowledge, we specify the concrete target for pathway extraction and define the minimal pathway domain ontology. Under this conceptual model, POSBIOTM extracts interactions and entities of pathways from the full biological articles using a machine learning oriented extraction method and visualizes the pathways using JDesigner module provided in the system biology workbench (SBW) [14]

  • PDF

Discovering the anti-cancer phytochemical rutin against breast cancer through the methodical platform based on traditional medicinal knowledge

  • Jungwhoi Lee;Jungsul Lee;WooGwang Sim;Jae-Hoon Kim;Chulhee Choi;Jongwook Jeon
    • BMB Reports
    • /
    • v.56 no.11
    • /
    • pp.594-599
    • /
    • 2023
  • A number of therapeutic drugs have been developed from functional chemicals found in plants. Knowledge of plants used for medicinal purposes has historically been transmitted by word of mouth or through literature. The aim of the present study is to provide a systemic platform for the development of lead compounds against breast cancer based on a traditional medical text. To verify our systematic approach, integrating processes consisted of text mining of traditional medical texts, 3-D virtual docking screening, and in vitro and in vivo experimental validations were demonstrated. Our text analysis system identified rutin as a specific phytochemical traditionally used for cancer treatment. 3-D virtual screening predicted that rutin could block EGFR signaling. Thus, we validated significant anti-cancer effects of rutin against breast cancer cells through blockade of EGFR signaling pathway in vitro. We also demonstrated in vivo anti-cancer effects of rutin using the breast cancer recurrence in vivo models. In summary, our innovative approach might be proper for discovering new phytochemical lead compounds designing for blockade of malignant neoplasm including breast cancer.

  • PDF

A Study on the Countmeasures of the Korean Pharmaceutical/Bio Industry to the EU Corporate Sustainability Due Diligence Directive, by using Text Mining (텍스트 마이닝을 활용한 국내 제약·바이오 업종의 EU 공급망 실사법 대응 방안 연구)

  • Sori Kim;Joonhak Ki
    • Information Systems Review
    • /
    • v.26 no.1
    • /
    • pp.93-117
    • /
    • 2024
  • In February 2022, the EU announced a draft of the EU Corporate Sustainability Due Diligence Directive requiring due diligence and disclosure of information on environmental and human rights risks in corporate supply chains. This study evaluated the ability of 13 Korean pharmaceutical/bio companies to respond to the EU's demand for due diligence in the supply chain and compared it to 13 globally leading pharmaceutical/bio companies which are considered good in environmental and human rights risk management. For comparative analysis, text mining analysis was performed using R. Basic word frequency and concurrent words were analyzed and topic modeling was performed by applying Latent Dirichlet Allocation. As a result of the analysis, it was found that compared to advanced companies, domestic pharmaceutical and bio companies lack negative issue reporting and identification systems and supply chain due diligence implementation processes, and require advancement of data management for environmental and human rights information disclosure. Accordingly, domestic pharmaceutical and bio companies need to prepare differentiated support measures to systematically identify and reduce risks in the supply chain of small and medium-sized businesses beyond simply providing financial support. It is also desirable for the government to provide policy support by mandating Korea's own supply chain environment and human rights due diligence system, along with support for strengthening the ability to respond to due diligence of domestic pharmaceutical and bio companies, such as expert consulting and financial support.

OryzaGP: rice gene and protein dataset for named-entity recognition

  • Larmande, Pierre;Do, Huy;Wang, Yue
    • Genomics & Informatics
    • /
    • v.17 no.2
    • /
    • pp.17.1-17.3
    • /
    • 2019
  • Text mining has become an important research method in biology, with its original purpose to extract biological entities, such as genes, proteins and phenotypic traits, to extend knowledge from scientific papers. However, few thorough studies on text mining and application development, for plant molecular biology data, have been performed, especially for rice, resulting in a lack of datasets available to solve named-entity recognition tasks for this species. Since there are rare benchmarks available for rice, we faced various difficulties in exploiting advanced machine learning methods for accurate analysis of the rice literature. To evaluate several approaches to automatically extract information from gene/protein entities, we built a new dataset for rice as a benchmark. This dataset is composed of a set of titles and abstracts, extracted from scientific papers focusing on the rice species, and is downloaded from PubMed. During the 5th Biomedical Linked Annotation Hackathon, a portion of the dataset was uploaded to PubAnnotation for sharing. Our ultimate goal is to offer a shared task of rice gene/protein name recognition through the BioNLP Open Shared Tasks framework using the dataset, to facilitate an open comparison and evaluation of different approaches to the task.

Text Mining Driven Content Analysis of Ebola on News Media and Scientific Publications (텍스트 마이닝을 이용한 매체별 에볼라 주제 분석 - 바이오 분야 연구논문과 뉴스 텍스트 데이터를 이용하여 -)

  • An, Juyoung;Ahn, Kyubin;Song, Min
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.50 no.2
    • /
    • pp.289-307
    • /
    • 2016
  • Infectious diseases such as Ebola virus disease become a social issue and draw public attention to be a major topic on news or research. As a result, there have been a lot of studies on infectious diseases using text-mining techniques. However, there is no research on content analysis of two media channels that have distinct characteristics. Accordingly, in this study, we conduct topic analysis between news (representing a social perspective) and academic research paper (representing perspectives of bio-professionals). As text-mining techniques, topic modeling is applied to extract various topics according to the materials, and the word co-occurrence map based on selected bio entities is used to compare the perspectives of the materials specifically. For network analysis, topic map is built by using Gephi. Aforementioned approaches uncovered the difference of topics between two materials and the characteristics of the two materials. In terms of the word co-occurrence map, however, most of entities are shared in both materials. These results indicate that there are differences and commonalties between social and academic materials.

ManBIF: a Program for Mining and Managing Biobank Impact Factor Data

  • Yu, Ki-Jin;Nam, Jung-Min;Her, Yun;Chu, Min-Seock;Seo, Hyung-Seok;Kim, Jun-Woo;Jeon, Jae-Pil;Park, Hye-Kyung;Park, Kie-Jung
    • Genomics & Informatics
    • /
    • v.9 no.1
    • /
    • pp.37-38
    • /
    • 2011
  • Biobank Impact Factor (BIF), which is a very effective criterion to evaluate the activity of biobanks, can be estimated by the citation information of biobanks from scientific papers. We have developed a program, ManBIF, to investigate the citation information from PDF files in the literature. The program manages a dictionary for expressions to represent biobanks and their resources, mines the citation information by converting PDF files to text files and searching with a dictionary, and produces a statistical report file. It can be used as an important tool by biobanks.

A Study on the Characteristic Analysis of Local Informatization in Chungcheongbuk-do: Focus on text mining (충청북도의 지역정보화 특성 분석에 관한 연구: 텍스트마이닝 중심)

  • Lee, Junghwan;Park, Soochang;Lee, Euisin
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.10
    • /
    • pp.67-77
    • /
    • 2021
  • This study conducted topic modeling, association analysis, and sentiment analysis focused on text mining in order to reflect regional characteristics in the process of establishing an information plan in Chungcheongbuk-do. As a result of the analysis, it was confirmed that Chungcheongbuk-do occupies a relatively high proportion of educational activities to bridge the information gap, and is interested in improving infrastructure to provide non-face-to-face, untouched administrative services, and bridge the gap between urban and rural areas. In addition, it is necessary to refer to the fact that there is a positive evaluation of the combination of bio and IT in the regional strategic industry and examples of ICT innovation services. It has been confirmed that smart cities have high expectations for the establishment of various cooperation systems with IT companies, but continuous crisis management is necessary so that they are not related to political issues. It is hoped that the results of this study can be used as one of the methods to specifically reflect regional changes in the process of informatization.