• Title/Summary/Keyword: Biomedical Interaction Extraction

Search Result 10, Processing Time 0.02 seconds

Utilizing Various Natural Language Processing Techniques for Biomedical Interaction Extraction

  • Park, Kyung-Mi;Cho, Han-Cheol;Rim, Hae-Chang
    • Journal of Information Processing Systems
    • /
    • v.7 no.3
    • /
    • pp.459-472
    • /
    • 2011
  • The vast number of biomedical literature is an important source of biomedical interaction information discovery. However, it is complicated to obtain interaction information from them because most of them are not easily readable by machine. In this paper, we present a method for extracting biomedical interaction information assuming that the biomedical Named Entities (NEs) are already identified. The proposed method labels all possible pairs of given biomedical NEs as INTERACTION or NO-INTERACTION by using a Maximum Entropy (ME) classifier. The features used for the classifier are obtained by applying various NLP techniques such as POS tagging, base phrase recognition, parsing and predicate-argument recognition. Especially, specific verb predicates (activate, inhibit, diminish and etc.) and their biomedical NE arguments are very useful features for identifying interactive NE pairs. Based on this, we devised a twostep method: 1) an interaction verb extraction step to find biomedically salient verbs, and 2) an argument relation identification step to generate partial predicate-argument structures between extracted interaction verbs and their NE arguments. In the experiments, we analyzed how much each applied NLP technique improves the performance. The proposed method can be completely improved by more than 2% compared to the baseline method. The use of external contextual features, which are obtained from outside of NEs, is crucial for the performance improvement. We also compare the performance of the proposed method against the co-occurrence-based and the rule-based methods. The result demonstrates that the proposed method considerably improves the performance.

PubMiner: Machine Learning-based Text Mining for Biomedical Information Analysis

  • Eom, Jae-Hong;Zhang, Byoung-Tak
    • Genomics & Informatics
    • /
    • v.2 no.2
    • /
    • pp.99-106
    • /
    • 2004
  • In this paper we introduce PubMiner, an intelligent machine learning based text mining system for mining biological information from the literature. PubMiner employs natural language processing techniques and machine learning based data mining techniques for mining useful biological information such as protein­protein interaction from the massive literature. The system recognizes biological terms such as gene, protein, and enzymes and extracts their interactions described in the document through natural language processing. The extracted interactions are further analyzed with a set of features of each entity that were collected from the related public databases to infer more interactions from the original interactions. An inferred interaction from the interaction analysis and native interaction are provided to the user with the link of literature sources. The performance of entity and interaction extraction was tested with selected MEDLINE abstracts. The evaluation of inference proceeded using the protein interaction data of S. cerevisiae (bakers yeast) from MIPS and SGD.

A Protein-Protein Interaction Extraction Approach Based on Large Pre-trained Language Model and Adversarial Training

  • Tang, Zhan;Guo, Xuchao;Bai, Zhao;Diao, Lei;Lu, Shuhan;Li, Lin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.3
    • /
    • pp.771-791
    • /
    • 2022
  • Protein-protein interaction (PPI) extraction from original text is important for revealing the molecular mechanism of biological processes. With the rapid growth of biomedical literature, manually extracting PPI has become more time-consuming and laborious. Therefore, the automatic PPI extraction from the raw literature through natural language processing technology has attracted the attention of the majority of researchers. We propose a PPI extraction model based on the large pre-trained language model and adversarial training. It enhances the learning of semantic and syntactic features using BioBERT pre-trained weights, which are built on large-scale domain corpora, and adversarial perturbations are applied to the embedding layer to improve the robustness of the model. Experimental results showed that the proposed model achieved the highest F1 scores (83.93% and 90.31%) on two corpora with large sample sizes, namely, AIMed and BioInfer, respectively, compared with the previous method. It also achieved comparable performance on three corpora with small sample sizes, namely, HPRD50, IEPA, and LLL.

An Experimental Study on the Relation Extraction from Biomedical Abstracts using Machine Learning (기계 학습을 이용한 바이오 분야 학술 문헌에서의 관계 추출에 대한 실험적 연구)

  • Choi, Sung-Pil
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.50 no.2
    • /
    • pp.309-336
    • /
    • 2016
  • This paper introduces a relation extraction system that can be used in identifying and classifying semantic relations between biomedical entities in scientific texts using machine learning methods such as Support Vector Machines (SVM). The suggested system includes many useful functions capable of extracting various linguistic features from sentences having a pair of biomedical entities and applying them into training relation extraction models for maximizing their performance. Three globally representative collections in biomedical domains were used in the experiments which demonstrate its superiority in various biomedical domains. As a result, it is most likely that the intensive experimental study conducted in this paper will provide meaningful foundations for research on bio-text analysis based on machine learning.

A Study on the Semiautomatic Construction of Domain-Specific Relation Extraction Datasets from Biomedical Abstracts - Mainly Focusing on a Genic Interaction Dataset in Alzheimer's Disease Domain - (바이오 분야 학술 문헌에서의 분야별 관계 추출 데이터셋 반자동 구축에 관한 연구 - 알츠하이머병 유관 유전자 간 상호 작용 중심으로 -)

  • Choi, Sung-Pil;Yoo, Suk-Jong;Cho, Hyun-Yang
    • Journal of Korean Library and Information Science Society
    • /
    • v.47 no.4
    • /
    • pp.289-307
    • /
    • 2016
  • This paper introduces a software system and process model for constructing domain-specific relation extraction datasets semi-automatically. The system uses a set of terms such as genes, proteins diseases and so forth as inputs and then by exploiting massive biological interaction database, generates a set of term pairs which are utilized as queries for retrieving sentences containing the pairs from scientific databases. To assess the usefulness of the proposed system, this paper applies it into constructing a genic interaction dataset related to Alzheimer's disease domain, which extracts 3,510 interaction-related sentences by using 140 gene names in the area. In conclusion, the resulting outputs of the case study performed in this paper indicate the fact that the system and process could highly boost the efficiency of the dataset construction in various subfields of biomedical research.

TAKES: Two-step Approach for Knowledge Extraction in Biomedical Digital Libraries

  • Song, Min
    • Journal of Information Science Theory and Practice
    • /
    • v.2 no.1
    • /
    • pp.6-21
    • /
    • 2014
  • This paper proposes a novel knowledge extraction system, TAKES (Two-step Approach for Knowledge Extraction System), which integrates advanced techniques from Information Retrieval (IR), Information Extraction (IE), and Natural Language Processing (NLP). In particular, TAKES adopts a novel keyphrase extraction-based query expansion technique to collect promising documents. It also uses a Conditional Random Field-based machine learning technique to extract important biological entities and relations. TAKES is applied to biological knowledge extraction, particularly retrieving promising documents that contain Protein-Protein Interaction (PPI) and extracting PPI pairs. TAKES consists of two major components: DocSpotter, which is used to query and retrieve promising documents for extraction, and a Conditional Random Field (CRF)-based entity extraction component known as FCRF. The present paper investigated research problems addressing the issues with a knowledge extraction system and conducted a series of experiments to test our hypotheses. The findings from the experiments are as follows: First, the author verified, using three different test collections to measure the performance of our query expansion technique, that DocSpotter is robust and highly accurate when compared to Okapi BM25 and SLIPPER. Second, the author verified that our relation extraction algorithm, FCRF, is highly accurate in terms of F-Measure compared to four other competitive extraction algorithms: Support Vector Machine, Maximum Entropy, Single POS HMM, and Rapier.

Natural language processing techniques for bioinformatics

  • Tsujii, Jun-ichi
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2003.10a
    • /
    • pp.3-3
    • /
    • 2003
  • With biomedical literature expanding so rapidly, there is an urgent need to discover and organize knowledge extracted from texts. Although factual databases contain crucial information the overwhelming amount of new knowledge remains in textual form (e.g. MEDLINE). In addition, new terms are constantly coined as the relationships linking new genes, drugs, proteins etc. As the size of biomedical literature is expanding, more systems are applying a variety of methods to automate the process of knowledge acquisition and management. In my talk, I focus on the project, GENIA, of our group at the University of Tokyo, the objective of which is to construct an information extraction system of protein - protein interaction from abstracts of MEDLINE. The talk includes (1) Techniques we use fDr named entity recognition (1-a) SOHMM (Self-organized HMM) (1-b) Maximum Entropy Model (1-c) Lexicon-based Recognizer (2) Treatment of term variants and acronym finders (3) Event extraction using a full parser (4) Linguistic resources for text mining (GENIA corpus) (4-a) Semantic Tags (4-b) Structural Annotations (4-c) Co-reference tags (4-d) GENIA ontology I will also talk about possible extension of our work that links the findings of molecular biology with clinical findings, and claim that textual based or conceptual based biology would be a viable alternative to system biology that tends to emphasize the role of simulation models in bioinformatics.

  • PDF

Effect of Chlorella Growth Factor on the Proliferation of Human Skin Keratinocyte

  • Yong-Ho Kim;Yoo-Kyeong Hwang;Yu-Yon Kim;Su-Mi Ko;Jung-Min Hwang;Yong-Woo Lee
    • Biomedical Science Letters
    • /
    • v.8 no.4
    • /
    • pp.229-234
    • /
    • 2002
  • Chlorella is rich in chlorella growth factor (CGF). A review of the literature has described that CGF improves the capability of a Th1-based immunity, anticancer, antioxidant antibacterial activity, growth promotion, wound healing and so on, but has not studied the effect for the metabolism and the proliferation of human skin keratinocyte. The aim of this study was to examine the effect of metabolism and the proliferation of human skin keratinocyte in vitro. CGF was extracted with an autoclaving method which is a modified hot-water extraction method from dried chlorella and conformed by means of absorbance 0.22 at 260 nm. We have measured the extracellular acidification rate (ECAR) of the CGF by Cytosensor$^{\circledR}$ Microphysiometer and evaluated responsiveness depending upon the dosage on the HaCaT cell. The ECAR for the concentrations of 0.15, 1.5, 15, 150 $\mu\textrm{g}$/ml of CGF increased as a 103.6, 128.2, 149.0 and 423.9%, respectively compared to control (0.0 $\mu\textrm{g}$/ml, 100% ECAR). The ECAR for ErbBl tyrosine kinase inhibited by 4-anilinoquinazolines, $C_{16}$H$_{14}$BrN$_3$O$_2$.HCl on tile HaCaT cells with the amounts of 10 $\mu\textrm{g}$/ml of the CCF compared with 100 $\mu\textrm{g}$/ml of rhEGF. The conclusion of the study is that CGF might increase human epidermal keratinocyte proliferation through the interaction between the epidermal growth factor receptor and itself.

  • PDF

Evaluation of Cardiac Function Analysis System Using Magnetic Resonance Images

  • Tae, Ki-Sik;Suh, Tae-Suk;Choe, Bo-Young;Lee, Hyoung-Koo;Shinn, Kyung-Sub;Jung, Seung-Eun;Lee, Jae-Moon
    • Progress in Medical Physics
    • /
    • v.10 no.3
    • /
    • pp.159-168
    • /
    • 1999
  • Cardiac disease is one of the leading causes of death in Korea. In quantitative analysis of cardiac function and morphological information by three-dimensional reconstruction of magnetic resonance images, left ventricle provides an important role functionally and physiologically. However, existing procedures mostly rely on the extensive human interaction and are seldom evaluated on clinical applications. In this study, we developed a system which could perform automatic extraction of enpicardial and endocardial contour and analysis of cardiac function to evaluate reliability and stability of each system comparing with the result of ARGUS system offered 1.5T Siemens MRI system and manual method performed by clinicians. For various aspects, we investigated reliability of each system by compared with left ventricular contour, end-diastolic volume (EDV), end-systolic volume (ESV), stock volume (SV), ejection fraction (EF), cardiac output (CO) and wall thickness (WT). When comparing with manual method, extracted results of developed process using minimum error threshold (MET) method that automatically extracts contour from cardiac MR images and ARGUS system were demonstrated as successful rate 90% of the contour extraction. When calculating cardiac function parameters using MET and comparing with using correlation coefficients analysis method, the process extracts endocardial and epicardial contour using MET, values from automatic and ARGUS method agreed with manual values within :t 3% average error. It was successfully demonstrated that automatic method using threshold technique could provide high potential for assessing of each parameters with relatively high reliability compared with manual method. In this study, the method developed in this study could reduce processing time compared with ARGUS and manual method due to a simple threshold technique. This method is useful for diagnosis of cardiac disease, simulating physiological function and amount of blood flow of left ventricle. In addition, this method could be valuable in developing automatic systems in order to apply to other deformable image models.

  • PDF

A Systemic Review of Pulse Contour Analysis and Fourier Spectrum Analysis on the Photoplethysmography of Digit (지첨용적맥파의 파형분석과 주파수분석에 대한 문헌적 연구)

  • Nam, Tong-Hyun;Park, Young-Bae;Park, Young-Jae;Shin, Sang-Hoon
    • The Journal of the Society of Korean Medicine Diagnostics
    • /
    • v.11 no.1
    • /
    • pp.48-60
    • /
    • 2007
  • Palpation of the pulse has been used in Korean traditional medicine since ancient times to assess physical health. Pulse wave contour may be obtained by measuring arterial pressure or blood volume change of skin. The latter is called as Photoplethysmography(PPG) or digital volume pulse(DVP). The PPG signal is measured by a device comprising an infrared light sourece and a photodetector. Although less widely used, this technique deserves further consideration because of its simplicity and ease of use. The contour of the PPG is formed as a result of a complex interaction between the left ventricle and the systemic circulation. It usually exhibits an early systolic peak and an early diastolic peak. the first peak is formed mainly by pressure trasmitted along a direct path from the left ventricle to the finger. The second peak is formed in part by pressure transmitted along the aorta and large arteries to sites of impedance mismatch in the lower body. The contour of the PPG is sensitive to changes in arterial tone and is influenced by ageing and large artery stiffness. Measurements taken directly from the PPG or from its second derivative can be used to assess these properties. In some mathematical approaches, the extraction of periodic components using frequency analysis was tried to analysis of the PPG. But we don't understand yet what kind of factor in the cardiovascular system or human body is related with the respective specific Fourier components of PPG. This review describes the background to measurement principles, representative contour, contour analysis and frequency domain analysis of PPG, and current and future.

  • PDF