• Title/Summary/Keyword: protein sequence

Search Result 2,344, Processing Time 0.028 seconds

Protein Disorder/Order Region Classification Using EPs-TFP Mining Method (EPs-TFP 마이닝 기법을 이용한 단백질 Disorder/Order 지역 분류)

  • Lee, Heon Gyu;Shin, Yong Ho
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.17 no.6
    • /
    • pp.59-72
    • /
    • 2012
  • Since a protein displays its specific functions when disorder region of protein sequence transits to order region with provoking a biological reaction, the separation of disorder region and order region from the sequence data is urgently necessary for predicting three dimensional structure and characteristics of the protein. To classify the disorder and order region efficiently, this paper proposes a classification/prediction method using sequence data while acquiring a non-biased result on a specific characteristics of protein and improving the classification speed. The emerging patterns based EPs-TFP methods utilizes only the essential emerging pattern in which the redundant emerging patterns are removed. This classification method finds the sequence patterns of disorder region, such sequence patterns are frequently shown in disorder region but relatively not frequently in the order region. We expand P-tree and T-tree conceptualized TFP method into a classification/prediction method in order to improve the performance of the proposed algorithm. We used Disprot 4.9 and CASP 7 data to evaluate EPs-TFP technique, the results of order/disorder classification show sensitivity 73.6, specificity 69.51 and accuracy 74.2.

Computational Approaches for Structural and Functional Genomics

  • Brenner, Steven-E.
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2000.11a
    • /
    • pp.17-20
    • /
    • 2000
  • Structural genomics aims to provide a good experimental structure or computational model of every tractable protein in a complete genome. Underlying this goal is the immense value of protein structure, especially in permitting recognition of distant evolutionary relationships for proteins whose sequence analysis has failed to find any significant homolog. A considerable fraction of the genes in all sequenced genomes have no known function, and structure determination provides a direct means of revealing homology that may be used to infer their putative molecular function. The solved structures will be similarly useful for elucidating the biochemical or biophysical role of proteins that have been previously ascribed only phenotypic functions. More generally, knowledge of an increasingly complete repertoire of protein structures will aid structure prediction methods, improve understanding of protein structure, and ultimately lend insight into molecular interactions and pathways. We use computational methods to select families whose structures cannot be predicted and which are likely to be amenable to experimental characterization. Methods to be employed included modern sequence analysis and clustering algorithms. A critical component is consultation of the presage database for structural genomics, which records the community's experimental work underway and computational predictions. The protein families are ranked according to several criteria including taxonomic diversity and known functional information. Individual proteins, often homologs from hyperthermophiles, are selected from these families as targets for structure determination. The solved structures are examined for structural similarity to other proteins of known structure. Homologous proteins in sequence databases are computationally modeled, to provide a resource of protein structure models complementing the experimentally solved protein structures.

  • PDF

Nonspecific Association of a 17 kDa Isoform of the Myelin Basic Protein with the Postsynaptic Density Fraction

  • Moon, Il-Soo
    • BMB Reports
    • /
    • v.33 no.3
    • /
    • pp.276-278
    • /
    • 2000
  • The postsynaptic density (PSD), a large protein complex beneath the postsynaptic membrane, is notorious for its 'stickiness'. In order to understand the molecular composition of the PSD fraction, a 17 kDa protein band was isolated by electroelution from SDS-geis, and its partial amino acid sequence was determined from HPLC-purified tryptic peptides of the protein. Surprisingly, the amino acid sequence was identical to that of the previously reported 17 kDa isoform of the myelin basic protein (MBP), an essential protein in CNS myelin formation. Since the protein band represented ~2% of the total proteins in the 1 % n-octyl glucoside-insoluble PSD fraction, these results indicate that a significant amount of the 17 kDa isoform of MBP is tightly associated with the PSD during preparation of the PSD fraction.

  • PDF

Kinetic analysis of Drosophila Vnd protein containing homeodomain with its target sequence

  • Yoo, Si-Uk
    • BMB Reports
    • /
    • v.43 no.6
    • /
    • pp.407-412
    • /
    • 2010
  • Homeodomain (HD) is a highly conserved DNA-binding domain composed of helix-turn-helix motif. Drosophila Vnd (Ventral nervous system defective) containing HD acts as a regulator to either enhance or suppress gene expression upon binding to its target sequence. In this study, kinetic analysis of Vnd binding to DNA was performed. The result demonstrates that DNA-binding affinity of the recombinant protein containing HD and NK2-specific domain (NK2-SD) was higher than that of the full-length Vnd. To access whether phosphorylation sites within HD and NK2-SD affect the interaction of the protein with the target sequence, alanine substitutions were introduced. The result shows that S631A mutation within NK2-SD does not contribute significantly to the DNA-binding affinity. However, S571A and T600A mutations within HD showed lower affinity for DNA binding. In addition, DNA-binding analysis using embryonic nuclear protein also demonstrates that Vnd interacts with other nuclear proteins, suggesting the existence of Vnd as a complex.

The 52 kD Protein Gene of Odontoglossum Ringspot Virus Containing RNA-Dependent RNA Polymerase Motifs and Comparisons with Other Tobamoviruses

  • Park, Won-Mok
    • Journal of Plant Biology
    • /
    • v.38 no.2
    • /
    • pp.129-136
    • /
    • 1995
  • Complementary DNA of the genomic RNA of odontoglossum ringspot virus Cymbidium strain (ORSV-Cy) was synthesized from polyadenylated viral RNA and cloned. Selected clones containing the viral RNA-dependent RNA polymerase gene of the virus has been sequenced by automated sequencing system. The complete nucleotide sequence of an open reading frame is 1377 base pairs in length, and encodes a protein of 458 amino acids about 52, 334 D. The 52 kD protein of ORSV shares four sequence motifs characteristic of viral RNA-dependent RNA polymerase. Comparison of the ORSV 52 kD protein sequence with that of other five viruses in tobamovirus group showed 76.0 to 60.7% homologies at the amino acid level and the conservation of the four motifs betwen the viruses.

  • PDF

A new method to predict the protein sequence alignment quality (단백질 서열정렬 정확도 예측을 위한 새로운 방법)

  • Lee, Min-Ho;Jeong, Chan-Seok;Kim, Dong-Seop
    • Bioinformatics and Biosystems
    • /
    • v.1 no.1
    • /
    • pp.82-87
    • /
    • 2006
  • The most popular protein structure prediction method is comparative modeling. To guarantee accurate comparative modeling, the sequence alignment between a query protein and a template should be accurate. Although choosing the best template based on the protein sequence alignments is most critical to perform more accurate fold-recognition in comparative modeling, even more critical is the sequence alignment quality. Contrast to a lot of attention to developing a method for choosing the best template, prediction of alignment accuracy has not gained much interest. Here, we develop a method for prediction of the shift score, a recently proposed measure for alignment quality. We apply support vector regression (SVR) to predict shift score. The alignment between a query protein and a template protein of length n in our own library is transformed into an input vector of length n +2. Structural alignments are assumed to be the best alignment, and SVR is trained to predict the shift score between structural alignment and profile-profile alignment of a query protein to a template protein. The performance is assessed by Pearson correlation coefficient. The trained SVR predicts shift score with the correlation between observed and predicted shift score of 0.80.

  • PDF

An Approach for a Substitution Matrix Based on Protein Blocks and Physicochemical Properties of Amino Acids through PCA

  • You, Youngki;Jang, Inhwan;Lee, Kyungro;Kim, Heonjoo;Lee, Kwanhee
    • Interdisciplinary Bio Central
    • /
    • v.6 no.4
    • /
    • pp.3.1-3.10
    • /
    • 2014
  • Amino acid substitution matrices are essential tools for protein sequence analysis, homology sequence search in protein databases and multiple sequence alignment. The PAM matrix was the first widely used amino acid substitution matrix. The BLOSUM series then succeeded the PAM matrix. Most substitution matrixes were developed by using the statistical frequency of substitution between each amino acid at blocks representing groups of protein families or related proteins. However, substitution of amino acids is based on the similarity of physiochemical properties of each amino acid. In this study, a new approach was used to obtain major physiochemical properties in multiple sequence alignment. Frequency of amino acid substitution in multiple sequence alignment database and selected attributes of amino acids in physiochemical properties database were merged. This merged data showed the major physiochemical properties through principle components analysis. Using factor analysis, these four principle components were interpreted as flexibility of electronic movement, polarity, negative charge and structural flexibility. Applying these four components, BAPS was constructed and validated for accuracy. When comparing receiver operated characteristic ($ROC_{50}$) values, BAPS scored slightly lower than BLOSUM and PAM. However, when evaluating for accuracy by comparing results from multiple sequence alignment with the structural alignment results of two test data sets with known three-dimensional structure in the homologous structure alignment database, the result of the test for BAPS was comparatively equivalent or better than results for prior matrices including PAM, Gonnet, Identity and Genetic code matrix.

Nucleotide Sequence of a Bacteriolytic Enzyme Gene from Alkalophilic Bacillus sp.

  • Jung, Myeong-Ho;Ohk, Seung-Ho;Yum, Do-Young;Kong, In-Soo;Bai, Dong-Hoon
    • Journal of Microbiology and Biotechnology
    • /
    • v.3 no.2
    • /
    • pp.73-77
    • /
    • 1993
  • The nucleotide sequence of Bacillus sp. bacteriolytic enzyme gene, lytP and its flanking regions were determined. A unique open reading frame for a protein of Mw. 27, 000, and a putative terminator sequence, were found behind a concensus ribosome binding site located 8 nt upstream from ATG start codon. The primary amino acid sequence deduced from nucleotide sequence revealed a putative protein of 255 amino acid residues with an Mw. of 27, 420. No significant homology could be found between the amino acid sequence of Bacillus sp. bacteriolytic enzyme and that of other cell wall hydrolases.

  • PDF

Orphan G Protein-coupled Receptors in Post-Genome Era

  • Im, Dong-Soon
    • Proceedings of the PSK Conference
    • /
    • 2002.10a
    • /
    • pp.131-133
    • /
    • 2002
  • In 'Nature', Dixon et al. reported the first cloned mammalian G-protein coupled receptor sequence (1). The DNA sequence from a hamster encodes the $\beta$$_2$-aderenergic receptor. In the same year, 1986, Kubo et al. published the muscarinic acetylcholine receptor sequence (M$_1$) from a rat in the same journal (2). Both groups purified the receptor proteins and identified the DNA sequences (1, 2). (omitted)

  • PDF

Isolation and Characterization of a CDNA Encoding a Protein Homologous to the Mouse 70 kDa Heat Shock Protein (생쥐 섬 유아세포에서 70 kDa 고온충격 단백질의 CDNA 클로닝과 염기서열 분석)

  • 김창환;정선미최준호
    • The Korean Journal of Zoology
    • /
    • v.35 no.2
    • /
    • pp.203-210
    • /
    • 1992
  • Hsp70, a 70 kDa protein, is the maior protein expressed when cells are heat-shocked. A cDNA library from mouse ID13 cells was screened with the human hsp70 gene as a probe, and a positive clone was obtained. The positive clone was subcloned into puc19 and the precise restriction was obtained. The CDNA was sequenced by the Sanger's dideoxv termination method. Single open reading frame that codes for a protein of 70 kDa was found. The DNA sequence of the cloned mouse DNA shows great homology (66-90%) with other mouse hsp70 genes and somewhat less homology (50",) with E. coli hsp70 gene (dnak). With the exception of one amino acid, the protein sequence deduced from the CDNA is identical to the mouse that shock cognate protein 70 (hsc70) that is constitutivelv expressed at normal temperature. The result suggests that the cloned CDNA encodes a hsc70 family rather than a heatinducible family.mily.

  • PDF