• 제목/요약/키워드: Protein prediction

검색결과 477건 처리시간 0.019초

Prediction Accuracy Evaluation of Domain and Domain Combination Based Prediction Methods for Protein-Protein Interaction

  • Han, Dong-Soo;Jang, Woo-Hyuk
    • Bioinformatics and Biosystems
    • /
    • 제1권2호
    • /
    • pp.128-133
    • /
    • 2006
  • This paper compares domain combination based protein-protein interaction prediction method with domain based protein-protein interaction method. The prediction accuracy and reliability of the methods are compared using the same prediction technique and interaction data. According to the comparison, domain combination based prediction method has showed superior prediction accuracy to domain based prediction method for protein pairs with fully overlapped domains with protein pairs in learning sets. When we consider that domain combination based method has the effects of assigning a weight to each domain interaction, it implies that we can improve the prediction accuracies of currently available domain or domain combination based protein interaction prediction methods further by developing more advanced weight assignment techniques. Several significant facts revealed from the comparative studies are also described in this paper.

  • PDF

Development and Application of Protein-Protein interaction Prediction System, PreDIN (Prediction-oriented Database of Interaction Network)

  • 서정근
    • 한국생물정보학회:학술대회논문집
    • /
    • 한국생물정보시스템생물학회 2002년도 제1차워크샵
    • /
    • pp.5-23
    • /
    • 2002
  • Motivation: Protein-protein interaction plays a critical role in the biological processes. The identification of interacting proteins by bioinformatical methods can provide new lead In the functional studies of uncharacterized proteins without performing extensive experiments. Results: Protein-protein interactions are predicted by a computational algorithm based on the weighted scoring system for domain interactions between interacting protein pairs. Here we propose potential interaction domain (PID) pairs can be extracted from a data set of experimentally identified interacting protein pairs. where one protein contains a domain and its interacting protein contains the other. Every combinations of PID are summarized in a matrix table termed the PID matrix, and this matrix has proposed to be used for prediction of interactions. The database of interacting proteins (DIP) has used as a source of interacting protein pairs and InterPro, an integrated database of protein families, domains and functional sites, has used for defining domains in interacting pairs. A statistical scoring system. named "PID matrix score" has designed and applied as a measure of interaction probability between domains. Cross-validation has been performed with subsets of DIP data to evaluate the prediction accuracy of PID matrix. The prediction system gives about 50% of sensitivity and 98% of specificity, Based on the PID matrix, we develop a system providing several interaction information-finding services in the Internet. The system, named PreDIN (Prediction-oriented Database of Interaction Network) provides interacting domain finding services and interacting protein finding services. It is demonstrated that mapping of the genome-wide interaction network can be achieved by using the PreDIN system. This system can be also used as a new tool for functional prediction of unknown proteins.

  • PDF

도메인 조합 기반 단백질-단백질 상호작용 확률 예측기법 (A Domain Combination Based Probabilistic Framework for Protein-Protein Interaction Prediction)

  • Han, Dong-Soo;Seo, Jung-Min;Kim, Hong-Soog;Jang, Woo-Hyuk
    • 한국생물정보학회:학술대회논문집
    • /
    • 한국생물정보시스템생물학회 2003년도 제2차 연례학술대회 발표논문집
    • /
    • pp.7-16
    • /
    • 2003
  • In this paper, we propose a probabilistic framework to predict the interaction probability of proteins. The notion of domain combination and domain combination pair is newly introduced and the prediction model in the framework takes domain combination pair as a basic unit of protein interactions to overcome the limitations of the conventional domain pair based prediction systems. The framework largely consists of prediction preparation and service stages. In the prediction preparation stage, two appearance pro-bability matrices, which hold information on appearance frequencies of domain combination pairs in the interacting and non-interacting sets of protein pairs, are constructed. Based on the appearance probability matrix, a probability equation is devised. The equation maps a protein pair to a real number in the range of 0 to 1. Two distributions of interacting and non-interacting set of protein pairs are obtained using the equation. In the prediction service stage, the interaction probability of a protein pair is predicted using the distributions and the equation. The validity of the prediction model is evaluated fur the interacting set of protein pairs in Yeast organism and artificially generated non-interacting set of protein pairs. When 80% of the set of interacting protein pairs in DIP database are used as foaming set of interacting protein pairs, very high sensitivity(86%) and specificity(56%) are achieved within our framework.

  • PDF

보완된 카이-제곱 기법을 이용한 단백질 기능 예측 기법 (Fucntional Prediction Method for Proteins by using Modified Chi-square Measure)

  • 강태호;유재수;김학용
    • 한국콘텐츠학회논문지
    • /
    • 제9권5호
    • /
    • pp.332-336
    • /
    • 2009
  • 유전체 분석에서 중요한 부분 중 하나는 기능이 알려지지 않은 미지 단백질에 대한 기능 예측이다. 단백질-단백질 상호작용 네트워크를 분석하는 것은 미지 단백질에 대한 기능을 보다 쉽게 예측할 수 있게 한다. 단백질-단백질 상호작용 네트워크로부터 미지 단백질의 기능을 예측하기 위한 다양한 연구들이 시도되어 왔다. 카이-제곱(Chi-square) 방식은 단백질-단백질 상호작용 네트워크를 통해 기능을 예측하고자 하는 연구 중 대표적인 방식이다. 하지만 카이-제곱 방식은 네트워크의 토폴로지를 반영하지 않아 네트워크 크기에 따라 예측의 정확성이 떨어지는 문제점이 있다. 따라서 본 논문에서는 카이-제곱 방식을 보완하여 정확성을 높인 새로운 기능 예측 방법을 제안한다 이를 위해 MIPS, DIP 그리고 SGD와 같은 공개된 단백질 상호작용 데이터베이스들로부터 데이터를 수집하여 분석하였다. 그리고 제안된 방식의 우수성을 입증하기 위해 각 데이터베이스들에 대해 카이-제곱방식과 제안하는 보완된 카이-제곱(Modified Chi-square)방식으로 예측해보고 이들의 정확성을 평가하였다.

Protein Disorder Prediction Using Multilayer Perceptrons

  • Oh, Sang-Hoon
    • International Journal of Contents
    • /
    • 제9권4호
    • /
    • pp.11-15
    • /
    • 2013
  • "Protein Folding Problem" is considered to be one of the "Great Challenges of Computer Science" and prediction of disordered protein is an important part of the protein folding problem. Machine learning models can predict the disordered structure of protein based on its characteristic of "learning from examples". Among many machine learning models, we investigate the possibility of multilayer perceptron (MLP) as the predictor of protein disorder. The investigation includes a single hidden layer MLP, multi hidden layer MLP and the hierarchical structure of MLP. Also, the target node cost function which deals with imbalanced data is used as training criteria of MLPs. Based on the investigation results, we insist that MLP should have deep architectures for performance improvement of protein disorder prediction.

Minimally Complex Problem Set for an Ab initio Protein Structure Prediction Study

  • Kim RyangGug;Choi Cha-Yong
    • Biotechnology and Bioprocess Engineering:BBE
    • /
    • 제9권5호
    • /
    • pp.414-418
    • /
    • 2004
  • A 'minimally complex problem set' for ab initio protein Structure prediction has been proposed. As well as consisting of non-redundant and crystallographically determined high-resolution protein structures, without disulphide bonds, modified residues, unusual connectivities and heteromolecules, it is more importantly a collection of protein structures. with a high probability of being the same in the crystal form as in solution. To our knowledge, this is the first attempt at this kind of dataset. Considering the lattice constraint in crystals, and the possible flexibility in solution of crystallographically determined protein structures, our dataset is thought to be the safest starting points for an ab initio protein structure prediction study.

Reviving GOR method in protein secondary structure prediction: Effective usage of evolutionary information

  • Lee, Byung-Chul;Lee, Chang-Jun;Kim, Dong-Sup
    • 한국생물정보학회:학술대회논문집
    • /
    • 한국생물정보시스템생물학회 2003년도 제2차 연례학술대회 발표논문집
    • /
    • pp.133-138
    • /
    • 2003
  • The prediction of protein secondary structure has been an important bioinformatics tool that is an essential component of the template-based protein tertiary structure prediction process. It has been known that the predicted secondary structure information improves both the fold recognition performance and the alignment accuracy. In this paper, we describe several novel ideas that may improve the prediction accuracy. The main idea is motivated by an observation that the protein's structural information, especially when it is combined with the evolutionary information, significantly improves the accuracy of the predicted tertiary structure. From the non-redundant set of protein structures, we derive the 'potential' parameters for the protein secondary structure prediction that contains the structural information of proteins, by following the procedure similar to the way to derive the directional information table of GOR method. Those potential parameters are combined with the frequency matrices obtained by running PSI-BLAST to construct the feature vectors that are used to train the support vector machines (SVM) to build the secondary structure classifiers. Moreover, the problem of huge model file size, which is one of the known shortcomings of SVM, is partially overcome by reducing the size of training data by filtering out the redundancy not only at the protein level but also at the feature vector level. A preliminary result measured by the average three-state prediction accuracy is encouraging.

  • PDF

Prediction of Transmembrane Protein Topology Using Position-specific Modeling of Context-dependent Structural Regions

  • Chi, Sang-Mun
    • Journal of the Korean Data and Information Science Society
    • /
    • 제16권3호
    • /
    • pp.683-693
    • /
    • 2005
  • This paper presents a new transmembrane Protein topology prediction method which is an attempt to model the topological rules governing the topogenesis of transmembrane proteins. Context-dependent structural regions of the transmembrane protein are used as basic modeling units in order to effectively represent their topogenic roles during transmembrane protein assembly. These modeling units are modeled by means of a tied-state hidden Markov model, which can express the position-specific effect of amino acids during ransmembrane protein assembly. The performance of prediction improves with these modeling approaches. In particular, marked improvement of orientation prediction shows the validity of the proposed modeling. The proposed method is available at http://bioroutine.com/TRAPTOP.

  • PDF

도메인 조합 기반 단백질-단백질 상호작용 확률 예측 틀 (A Domain Combination-based Probabilistic Framework for Protein-Protein Interaction Prediction)

  • 한동수;서정민;김홍숙;장우혁
    • 한국정보과학회논문지:컴퓨팅의 실제 및 레터
    • /
    • 제10권4호
    • /
    • pp.299-308
    • /
    • 2004
  • 최근 단백질 및 도메인과 관련된 방대한 양의 데이타들이 인터넷상에 공표되고 축적됨에 따라, 단백질간의 상호작용에 대한 예측 시스템의 필요성이 제기되고 있다. 본 논문에서는 이러한 데이타를 이용하여 계산적으로 도메인 조합 쌍에 기반하여 단백질의 상호작용 확률을 예측하는 새로운 단백질 상호작용 예측 시스템을 제안한다. 제안된 예측 시스템에서는 기존의 도메인 쌍(domain pair)의 제약성을 극복하기 위하여 도메인 조합(domain combination)과 도메인 조합 쌍(domain combination pair)의 개념이 새롭게 도입하였다. 그리고 도메인 조합 쌍(domain combination pair 또는 dc-pair)을 단백질 상호작용의 기본 단위로 간주하고 예측을 시도한다. 예측 시스템은 크게 예측 준비 과정과 서비스 과정으로 구성되어 있다. 예측 준비 과정에서는 상호작용이 있는 것으로 알려진 단백질 쌍 집합과 상호작용이 없는 것으로 추정되는 단백질 도메인 쌍 집합으로부터 각각 도메인 조합 정보와 그 출현 빈도를 추출한다. 추출된 정보들은 출현 확률 배열(Appearance Probability Matrix 또는 AP matrix)로 불리는 배열 구조에 저장된다. 논문에서는 출현 확률 배열에 기반을 두어, 단백질-단백질 상호작용을 예측하는 확률식 PIP(Primary Interaction Probability)를 고안하고, 고안된 확률식을 이용하여, 상호작용이 있는 것으로 알려진 단백질 쌍 집합과 상호작용이 없는 것으로 추정되는 단백질 도메인 쌍 집합의 확률 값 분포를 생성시킨다. 예측서비스 과정에서는 예측 준비 과정에서 얻어진 분포와 확률식을 이용하여 임의의 단백질 쌍의 상호작용 확률을 계산한다. 예측 모델의 유효성은 효모(yeast)에서 상호작용이 있는 것으로 보고된 단백질 쌍 집합과 상호작용이 없는 것으로 추정되는 단백질 쌍 집합을 이용하여 검증하였다. DIP(Database of Inter-acting Proteins)의 상호작용이 있는 것으로 알려진 효모 단백질 쌍 집합의 80%를 학습 집단으로 사용했을 때, 86%의 sensitivity와 56%의 specificity를 나타내어, 도메인을 기반으로 한 기존의 예측 시스템에 비해서 우월한 예측 정확도를 보여주었다. 이와 같은 예측 정확도의 개선은 본 예측 시스템이 상호작용의 기본 단위로 dc-pair를 채택한 점과 분류를 위하여 새롭게 고안하여 사용한 PIP식이 유효했던 것으로 판단된다.