• Title/Summary/Keyword: protein-protein interaction extraction

Search Result 25, Processing Time 0.035 seconds

Extraction of Protein-Protein Interactions based on Convolutional Neural Network (CNN) (Convolutional Neural Network (CNN) 기반의 단백질 간 상호 작용 추출)

  • Choi, Sung-Pil
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.3
    • /
    • pp.194-198
    • /
    • 2017
  • In this paper, we propose a revised Deep Convolutional Neural Network (DCNN) model to extract Protein-Protein Interaction (PPIs) from the scientific literature. The proposed method has the merit of improving performance by applying various global features in addition to the simple lexical features used in conventional relation extraction approaches. In the experiments using AIMed, which is the most famous collection used for PPI extraction, the proposed model shows state-of-the art scores (78.0 F-score) revealing the best performance so far in this domain. Also, the paper shows that, without conducting feature engineering using complicated language processing, convolutional neural networks with embedding can achieve superior PPIE performance.

A Protein-Protein Interaction Extraction Approach Based on Large Pre-trained Language Model and Adversarial Training

  • Tang, Zhan;Guo, Xuchao;Bai, Zhao;Diao, Lei;Lu, Shuhan;Li, Lin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.3
    • /
    • pp.771-791
    • /
    • 2022
  • Protein-protein interaction (PPI) extraction from original text is important for revealing the molecular mechanism of biological processes. With the rapid growth of biomedical literature, manually extracting PPI has become more time-consuming and laborious. Therefore, the automatic PPI extraction from the raw literature through natural language processing technology has attracted the attention of the majority of researchers. We propose a PPI extraction model based on the large pre-trained language model and adversarial training. It enhances the learning of semantic and syntactic features using BioBERT pre-trained weights, which are built on large-scale domain corpora, and adversarial perturbations are applied to the embedding layer to improve the robustness of the model. Experimental results showed that the proposed model achieved the highest F1 scores (83.93% and 90.31%) on two corpora with large sample sizes, namely, AIMed and BioInfer, respectively, compared with the previous method. It also achieved comparable performance on three corpora with small sample sizes, namely, HPRD50, IEPA, and LLL.

Prediction of Protein-Protein Interactions from Sequences using a Correlation Matrix of the Physicochemical Properties of Amino Acids

  • Kopoin, Charlemagne N'Diffon;Atiampo, Armand Kodjo;N'Guessan, Behou Gerard;Babri, Michel
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.3
    • /
    • pp.41-47
    • /
    • 2021
  • Detection of protein-protein interactions (PPIs) remains essential for the development of therapies against diseases. Experimental studies to detect PPI are longer and more expensive. Today, with the availability of PPI data, several computer models for predicting PPIs have been proposed. One of the big challenges in this task is feature extraction. The relevance of the information extracted by some extraction techniques remains limited. In this work, we first propose an extraction method based on correlation relationships between the physicochemical properties of amino acids. The proposed method uses a correlation matrix obtained from the hydrophobicity and hydrophilicity properties that it then integrates in the calculation of the bigram. Then, we use the SVM algorithm to detect the presence of an interaction between 2 given proteins. Experimental results show that the proposed method obtains better performances compared to the approaches in the literature. It obtains performances of 94.75% in accuracy, 95.12% in precision and 96% in sensitivity on human HPRD protein data.

PubMiner: Machine Learning-based Text Mining for Biomedical Information Analysis

  • Eom, Jae-Hong;Zhang, Byoung-Tak
    • Genomics & Informatics
    • /
    • v.2 no.2
    • /
    • pp.99-106
    • /
    • 2004
  • In this paper we introduce PubMiner, an intelligent machine learning based text mining system for mining biological information from the literature. PubMiner employs natural language processing techniques and machine learning based data mining techniques for mining useful biological information such as protein­protein interaction from the massive literature. The system recognizes biological terms such as gene, protein, and enzymes and extracts their interactions described in the document through natural language processing. The extracted interactions are further analyzed with a set of features of each entity that were collected from the related public databases to infer more interactions from the original interactions. An inferred interaction from the interaction analysis and native interaction are provided to the user with the link of literature sources. The performance of entity and interaction extraction was tested with selected MEDLINE abstracts. The evaluation of inference proceeded using the protein interaction data of S. cerevisiae (bakers yeast) from MIPS and SGD.

Effect of Alcohol Addition on Back-Extraction of BSA and Cytochrome c Using AOT Reverse Micellar System

  • Lee, Seong Sik;Lee, Bong Guk;Choe, Jin Seong;Lee, Jong Pal
    • Bulletin of the Korean Chemical Society
    • /
    • v.22 no.8
    • /
    • pp.897-902
    • /
    • 2001
  • The protein back-extraction processes were discussed from the viewpoint of the micelle-micelle interaction. Bovine serum albumin (BSA) suppressing the cluster formation of reverse micelle (positive value of ${\beta}pr)$ has the high back-extra cted fraction (Eb), but cytochrome c enhancing the formation of reverse micelle (negative value of ${\beta}pr)$ has the low back-extracted fraction, relatively. We have also examined quantitatively the effects of alcohol addition and protein solubilization on the percolation process of reverse micelle. The alcohols suppressing the formation of micellar cluster (high values of ${\beta}t)$, remarkably improved the back-extraction rates of BSA and cytochrome c. The values of ${\beta}t$, defined by the variation of percolation process, and the back-extraction behavior of proteins have a good linear correlation. These results indicate that the micelle-micelle interaction or micellar clustering plays an important role in the back-extraction process of proteins.

Performance Enhancement of Tree Kernel-based Protein-Protein Interaction Extraction by Parse Tree Pruning and Decay Factor Adjustment (구문 트리 가지치기 및 소멸 인자 조정을 통한 트리 커널 기반 단백질 간 상호작용 추출 성능 향상)

  • Choi, Sung-Pil;Choi, Yun-Soo;Jeong, Chang-Hoo;Myaeng, Sung-Hyon
    • Journal of KIISE:Software and Applications
    • /
    • v.37 no.2
    • /
    • pp.85-94
    • /
    • 2010
  • This paper introduces a novel way to leverage convolution parse tree kernel to extract the interaction information between two proteins in a sentence without multiple features, clues and complicated kernels. Our approach needs only the parse tree alone of a candidate sentence including pairs of protein names which is potential to have interaction information. The main contribution of this paper is two folds. First, we show that for the PPI, it is imperative to execute parse tree pruning removing unnecessary context information in deciding whether the current sentence imposes interaction information between proteins by comparing with the latest existing approaches' performance. Secondly, this paper presents that tree kernel decay factor can play an pivotal role in improving the extraction performance with the identical learning conditions. Consequently, we could witness that it is not always the case that multiple kernels with multiple parsers perform better than each kernels alone for PPI extraction, which has been argued in the previous research by presenting our out-performed experimental results compared to the two existing methods by 19.8% and 14% respectively.

Protein Motif Extraction via Feature Interval Selection

  • Sohn, In-Suk;Hwang, Chang-Ha;Ko, Jun-Su;Chiu, David;Hong, Dug-Hun
    • Journal of the Korean Data and Information Science Society
    • /
    • v.17 no.4
    • /
    • pp.1279-1287
    • /
    • 2006
  • The purpose of this paper is to present a new algorithm for extracting the consensus pattern, or motif from sequence belonging to the same family. Two methods are considered for feature interval partitioning based on equal probability and equal width interval partitioning. C2H2 zinc finger protein and epidermal growth factor protein sequences are used to demonstrate the effectiveness of the proposed algorithm for motif extraction. For two protein families, the equal width interval partitioning method performs better than the equal probability interval partitioning method.

  • PDF

TAKES: Two-step Approach for Knowledge Extraction in Biomedical Digital Libraries

  • Song, Min
    • Journal of Information Science Theory and Practice
    • /
    • v.2 no.1
    • /
    • pp.6-21
    • /
    • 2014
  • This paper proposes a novel knowledge extraction system, TAKES (Two-step Approach for Knowledge Extraction System), which integrates advanced techniques from Information Retrieval (IR), Information Extraction (IE), and Natural Language Processing (NLP). In particular, TAKES adopts a novel keyphrase extraction-based query expansion technique to collect promising documents. It also uses a Conditional Random Field-based machine learning technique to extract important biological entities and relations. TAKES is applied to biological knowledge extraction, particularly retrieving promising documents that contain Protein-Protein Interaction (PPI) and extracting PPI pairs. TAKES consists of two major components: DocSpotter, which is used to query and retrieve promising documents for extraction, and a Conditional Random Field (CRF)-based entity extraction component known as FCRF. The present paper investigated research problems addressing the issues with a knowledge extraction system and conducted a series of experiments to test our hypotheses. The findings from the experiments are as follows: First, the author verified, using three different test collections to measure the performance of our query expansion technique, that DocSpotter is robust and highly accurate when compared to Okapi BM25 and SLIPPER. Second, the author verified that our relation extraction algorithm, FCRF, is highly accurate in terms of F-Measure compared to four other competitive extraction algorithms: Support Vector Machine, Maximum Entropy, Single POS HMM, and Rapier.

Studies on the Nuclei Adduction and Expression of c-myc Gene by Benzo(a)pyrene and Doxorubicin in Human NC-37 Cells (사람 NC-37 세포에서 Benzo(a)pyrene과 Doxorubicin에 의한 Nuclei내전과 c-myc 유전자의 발현에 대한 연구)

  • 김호찬;정인철;조무연
    • Journal of Life Science
    • /
    • v.8 no.4
    • /
    • pp.400-409
    • /
    • 1998
  • Formation of adduct was studied in benzo(a)pyrene(BP)- and doxorubicin(Dx)-treated human NC-37 cells and isolated nuclei. Major adducts formed were determined by fluorescence absorption spectrophotometery and DNA-lin-ked protein assay. When isolated nuclei were exposed to carcinogens BP and DMBA, and anticancer drugs m-AMSA, ellipticine and Dx, varying degrees of adduct formation occured between DNA-protein complex and these drugs. When the mixture was centrifuged 1.7 M sucrose solution, binding BP and DMBA appeared to be similar between the sediment and the supernatant. When the sediment was centrifuged again with 0.35% polymin-P, the amount of BP bound was 2-fold greater in the protein(1077$\pm$55cpm) than in DNA fraction (470$\pm$20cpm), whereas that of DMBA was 1.6-fold greater in the DNA than in protein fraction. In the case of m-AMSA, ellipticine and Dx, the amount of binding was slightly greater in supernatant than in sediment in centrifugation with 1.7 M sucrose, and more than 3 times greater in the DNA- than in protein- fraction in centrifugation with 0.35% polymin P. DNA fractions which associated with a subset of nonhistone chromosomal protein were isolated from NC-37 cells exposed to $^{3}$H-BP and $^{14}$C-Dx. They were separated into two distince components DNA-S and DNA-P by centrifugation with 2M Nacl chromatin extraction. The results indicated that the amount of $^{3}$H-BP bound was 6.0-fold greater in DNA-P as compared with DNA-S, while that of $^{14}$C-Dx binding appreaed to be 6.2-fold greater in DNA-S than in DNA-P fraction. When $^{3}$H-BP binding wasdetermined in the presence of cold Dx, the amount of binding was reduced only in the DNA-P fraction, indicating that the interaction between DNA and protein is decreased. Gene expression by these drugs, BP treated cells were increased to compare with nomal cells but reduced by treatment with BP-Dx. These results suggest that the protein moiety which tightly bound to DNA-P fraction may play an important role in the regulation of gene expression.

  • PDF

Extraction of higher yeast protein-protein interaction with hierarchical clustering from textual data (계층적 군집화를 통한 이스트(Yeast) 단백질의 고차 상호작용 추출)

  • 엄재홍;장병탁
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2002.10d
    • /
    • pp.364-366
    • /
    • 2002
  • 본 논문에서는 텍스트 형태로 구성된 특정 생물에 대한 문헌 데이터에서 해당 생물의 주요 단백질간의 이진(binary) 관계를 추출하여 이들을 특징별로 계층적으로 군집화 함으로써 특정 현상을 나타내는 단백질간의 주요 관계를 추출하는 방법을 제시한다. 텍스트 데이터에서 단백질간의 이진관계는 기본적인 데이터마이닝 기법을 사용하여 연관규칙(association rule)의 형태로 추출하게 된다. 본 논문에서는 실험을 위해 PUBMED에서 추출한 Yeast의 주요 단백질간의 관계를 포함하고 있는 논문 데이터인 MEDLINE Abstract와 몇몇 공개 데이터베이스를 사용하였다. 실험 결과 SH3와 같이 기존에 알려진 단백질간의 단일 관계를 추출하는 것 이외에 이러한 관계들을 이용하여 클러스터링을 행한 결과 공통 현상에 작용하는 주요 단백질간의 관계들이 서로 군집화 됨을 확인 할 수 있었다. 또한 단순 이진관계가 아닌 클러스터링을 이용한 보다 상위 단계에서 단순 규칙들 간의 관계를 살펴봄으로써 단백질간의 이진관계를 추출하기 위한 데이터로 사용한 문헌 데이터에 나타나 있지 않은 1차 이상의 관계를 고찰 해 볼 수 있었다. 논문에서는 규칙 추출의 전체 과정과 함께 사용된 추출 시스템의 각 부와 데이터에 대한 설명을 다룬다.

  • PDF