• Title/Summary/Keyword: Protein sequence search

Search Result 114, Processing Time 0.032 seconds

Prediction of Protein Secondary Structure Using the Weighted Combination of Homology Information of Protein Sequences (단백질 서열의 상동 관계를 가중 조합한 단백질 이차 구조 예측)

  • Chi, Sang-mun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.20 no.9
    • /
    • pp.1816-1821
    • /
    • 2016
  • Protein secondary structure is important for the study of protein evolution, structure and function of proteins which play crucial roles in most of biological processes. This paper try to effectively extract protein secondary structure information from the large protein structure database in order to predict the protein secondary structure of a query protein sequence. To find more remote homologous sequences of a query sequence in the protein database, we used PSI-BLAST which can perform gapped iterative searches and use profiles consisting of homologous protein sequences of a query protein. The secondary structures of the homologous sequences are weighed combined to the secondary structure prediction according to their relative degree of similarity to the query sequence. When homologous sequences with a neural network predictor were used, the accuracies were higher than those of current state-of-art techniques, achieving a Q3 accuracy of 92.28% and a Q8 accuracy of 88.79%.

Molecular Identification and Expression of Myosin Light Chain in Shortspine Spurdog (Squalus mitsukurii)

  • Kim, Soo Cheol;Sumi, Kanij Rukshana;Sharker, Md Rajib;Kho, Kang Hee
    • Journal of Marine Life Science
    • /
    • v.3 no.1
    • /
    • pp.1-8
    • /
    • 2018
  • Myosin is considered as the vital motor protein in vertebrates and invertebrates. Our present study was conducted to decipher the occurrence of myosin in dog fish (Squalus mitsukurii). We isolated one clone containing 979 bp cDNA sequence, which consisted of a complete coding sequence of 453 bp and a deduced amino acid sequence of 150 amino acids from the open reading frame with molecular weight, isoelectric point and aliphatic index are 16.72 Kda, 4.49 and 78.00, respectively. It contained 428 bp long 3' UTR with single potential polyadenylation signals (AATAAA). The predicted EF CA2+ binding domains were identified in residue 6-41, 83-118 and 133-150. A BLAST search indicates this protein exhibits a strong similarity to whale shark (Rhincodon typus) MLC3 (91% identical) and also house mouse (Mus musculus) MLC isoform 3f (81% identical). Phylogenetic analysis revealed that this protein is a MLC 3 isoform like protein. This protein also demonstrates highly conserved region with other myosin proteins. Homology modeling of S. mitsukuri was performed using crystal structure of Gallus gallus skeletal muscle myosin II based on high similarity. Reverse transcription-polymerase chain reaction (PCR), quantitative PCR results exhibits dogfish myosin protein is highly expressed in muscle tissue.

An Approach for a Substitution Matrix Based on Protein Blocks and Physicochemical Properties of Amino Acids through PCA

  • You, Youngki;Jang, Inhwan;Lee, Kyungro;Kim, Heonjoo;Lee, Kwanhee
    • Interdisciplinary Bio Central
    • /
    • v.6 no.4
    • /
    • pp.3.1-3.10
    • /
    • 2014
  • Amino acid substitution matrices are essential tools for protein sequence analysis, homology sequence search in protein databases and multiple sequence alignment. The PAM matrix was the first widely used amino acid substitution matrix. The BLOSUM series then succeeded the PAM matrix. Most substitution matrixes were developed by using the statistical frequency of substitution between each amino acid at blocks representing groups of protein families or related proteins. However, substitution of amino acids is based on the similarity of physiochemical properties of each amino acid. In this study, a new approach was used to obtain major physiochemical properties in multiple sequence alignment. Frequency of amino acid substitution in multiple sequence alignment database and selected attributes of amino acids in physiochemical properties database were merged. This merged data showed the major physiochemical properties through principle components analysis. Using factor analysis, these four principle components were interpreted as flexibility of electronic movement, polarity, negative charge and structural flexibility. Applying these four components, BAPS was constructed and validated for accuracy. When comparing receiver operated characteristic ($ROC_{50}$) values, BAPS scored slightly lower than BLOSUM and PAM. However, when evaluating for accuracy by comparing results from multiple sequence alignment with the structural alignment results of two test data sets with known three-dimensional structure in the homologous structure alignment database, the result of the test for BAPS was comparatively equivalent or better than results for prior matrices including PAM, Gonnet, Identity and Genetic code matrix.

A Simple and Fast Web Alignment Tool for Large Amount of Sequence Data

  • Lee, Yong-Seok;Oh, Jeong-Su
    • Genomics & Informatics
    • /
    • v.6 no.3
    • /
    • pp.157-159
    • /
    • 2008
  • Multiple sequence alignment (MSA) is the most important step for many of biological sequence analyses, homology search, and protein structural assignments. However, large amount of data make biologists difficult to perform MSA analyses and it requires much computational time to align many sequences. Here, we have developed a simple and fast web alignment tool for aligning, editing, and visualizing large amount of sequence data. We used a cluster server installed ClustalW-MPI using web services and message passing interface (MPI). It also enables users to edit multiple sequence alignments for manual editing and to download the input data and results such as alignments and phylogenetic tree.

Human Proteome Data Analysis Protocol Obtained via the Bacterial Proteome Analysis

  • Kwon, Kyung-Hoon;Park, Gun-Wook;Kim, Jin-Young;Lee, Jeong-Hwa;Kim, Seung-Il;Yoo, Jong-Shin
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2005.09a
    • /
    • pp.91-95
    • /
    • 2005
  • In the multidimensional protein identification technology of high-throughput proteomics, we use one-dimensional gel electrophoresis and after the separation by two-dimensional liquid chromatography, the sample is analyzed by tandem mass spectrometry. In this study, we have analyzed the Pseudomonas Putida KT2440 protein. From the protein identification, the protein database was combined with its reversed sequence database. From the peptide selection whose error rate is less than 1%, the SEQUEST database search for the tandem mass spectral data identified 2,045 proteins. For each protein, we compared the molecular weight calibrated from 1D-gel band position with the theoretical molecular weight computed from the amino acid sequence, by defining a variable MW$_{corr}$ Since the bacterial proteome is simpler than human proteome considering the complexity and modifications, the proteome analysis result for the Pseudomonas Putida KT2440 could suggest a guideline to build the protocol to analyze human proteome data.

  • PDF

Structure-based Functional Discovery of Proteins: Structural Proteomics

  • Jung, Jin-Won;Lee, Weon-Tae
    • BMB Reports
    • /
    • v.37 no.1
    • /
    • pp.28-34
    • /
    • 2004
  • The discovery of biochemical and cellular functions of unannotated gene products begins with a database search of proteins with structure/sequence homologues based on known genes. Very recently, a number of frontier groups in structural biology proposed a new paradigm to predict biological functions of an unknown protein on the basis of its three-dimensional structure on a genomic scale. Structural proteomics (genomics), a research area for structure-based functional discovery, aims to complete the protein-folding universe of all gene products in a cell. It would lead us to a complete understanding of a living organism from protein structure. Two major complementary experimental techniques, X-ray crystallography and NMR spectroscopy, combined with recently developed high throughput methods have played a central role in structural proteomics research; however, an integration of these methodologies together with comparative modeling and electron microscopy would speed up the goal for completing a full dictionary of protein folding space in the near future.

Anti-Apoptosis Engineering Using a Gene of Bombyx mori

  • Kim, Eun-Jeong;Park, Tae-Hyeon
    • 한국생물공학회:학술대회논문집
    • /
    • 2002.04a
    • /
    • pp.62-65
    • /
    • 2002
  • We have previously shown that the addition of silkworm hemolymph to a culture medium increases the longevity of insect and mammalian cells by inhibiting apoptosis. This indicates that the component which inhibits apoptosis is contained in the silkworm hemolymph, The apoptosis-inhibiting component was isolated from silkwonn hemolymph and characterized in our previous study. A database search using the N-terminal amino acid sequence of this component as a template resulted in a 95% homology with a low molecular weight lipoprotein, the so called ’30K protein' of unknown function. In this study, the 30K protein gene was expressed in mammalian and insect cells to confirm the apoptosis-inhibiting effect. The overexpression of 30K protein in mammalian cell inhibited the staurosporin-induced apoptosis by the prevention of the activation of caspase 3. Using an Autographa californicanuclear polyhedrosis virus (AcNPV) system, the 30K protein was overexpressed also in insect cells. The expression of the 30K protein increased the longevity of baculovirus-infected insect cells by inhibiting apoptosis. These results suggest that the 30K protein is a novel anti-apoptotic protein.

  • PDF

Proteome Data Analysis of Hairy Root of Panax ginseng : Use of Expressed Sequence Tag Data of Ginseng for the Protein Identification (인삼 모상근 프로테옴 데이터 분석 : 인삼 EST database와의 통합 분석에 의한 단백질 동정)

  • Kwon, Kyung-Hoon;Kim, Seung-Il;Kim, Kyung-Wook;Kim, Eun-A;Cho, Kun;Kim, Jin-Young;Kim, Young-Hwan;Yang, Deok-Chun;Hur, Cheol-Goo;Yoo, Jong-Shin;Park, Young-Mok
    • Journal of Plant Biotechnology
    • /
    • v.29 no.3
    • /
    • pp.161-170
    • /
    • 2002
  • For the hairy root of Panax ginseng, we have got mass spectrums from MALDI/TOF/MS analysis and Tandem mass spectrums from ESI/Q-TOF/MS analysis. While mass spectrum provides the molecular weights of peptide fragments digested by protease such as trypsin, tandem mass spectrum produces amino acid sequence of digested peptides. Each amino acid sequences can be a query sequence in BLAST search to identify proteins. For the specimens of animals or plants of which genome sequences were known, we can easily identify expressed proteins from mass spectrums with high accuracy. However, for the other specimens such as ginseng, it is difficult to identify proteins with accuracy since all the protein sequences are not available yet. Here we compared the mass spectrums and the peptide amino acid sequences with ginseng expressed sequence tag (EST) DB. The matched EST sequence was used as a query in BLAST search for protein identification. They could offer the correct protein information by the sequence alignment with EST sequences. 90% of peptide sequences of ESI/Q-TOF/MS are matched with EST sequences. Comparing 68% matches of the same sequences with the nr database of NCBI, we got more matches by 22% from ginseng EST sequence search. In case of peptide mass fingerprinting from MALDI/TOF/MS, only about 19% (9 proteins of 47 spots) among peptide matches from nr DB were correlated with ginseng EST DB. From these results, we suggest that amino acid sequencing using tandem mass spectrum analysis may be necessary for protein identification in ginseng proteome analysis.

Identification and Characterization of Bombyx mori LDH Gene through Bioinformatics Approaches

  • Zhu, Minfeng;Chen, Keping;Yao, Qin
    • International Journal of Industrial Entomology and Biomaterials
    • /
    • v.15 no.2
    • /
    • pp.137-143
    • /
    • 2007
  • Lactate dehydrogenase (LDH) is a ubiquitous enzyme that plays a significant role in the clinical diagnosis of pathologic processes. Discovery of the LDH (BmLDH) gene in B. mori may shed light on its role in the biology of Lepidoptera species, and afford further understanding of the function of the enzyme. In this study, we used the bioinformatics tools to identify LDH gene in B. mori. Sequence analysis showed that BmLDH cDNA contains a 996 bp open reading frame, encoding 331 AA proteins, with seven introns. Compared with hHLDH (human heart LDH), BmLDH contained the same key active sites. Domain search and protein fold recognition analyses provide compelling evidences that the deduced protein is a LDH. Using the computer program MEGA3, we conducted a search for homologs of BmLDH among many eukaryotic species and confirmed that the BmLDH was conserved in all organisms investigated. This gene has been registered in GenBank under the accession number EU000385.

Feature Selection and Classification of Protein CDS Using n-Block substring weighted Linear Model (N-Block substring 가중 선형모형을 이용한 단백질 CDS의 특징 추출 및 분류)

  • Choi, Seong-Yong;Kim, Jin-Su;Han, Seung-Jin;Choi, Jun-Hyeog;Rim, Kee-Wook;Lee, Jung-Hyun
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.19 no.5
    • /
    • pp.730-736
    • /
    • 2009
  • It is more important to analysis of huge gemonics data in Bioinformatics. Here we present a novel datamining approach to predict structure and function using protein's primnary structure only. We propose not also to develope n-Block substring search algorithm in reducing enormous search space effectively in relation to feature selection, but to formulate weighted linear algorithm in a prediction of structure and function of a protein using primary structure. And we show efficient in protein domain characterization and classification by calculation weight value in determining domain association in each selected substring, and also reveal that more efficient results are acquired through claculated model score result in an inference about degree of association with each CDS(coding sequence) in domain.