• Title/Summary/Keyword: sequence homology

Search Result 919, Processing Time 0.028 seconds

Protein Sequence Search based on N-gram Indexing

  • Hwang, Mi-Nyeong;Kim, Jin-Suk
    • Bioinformatics and Biosystems
    • /
    • v.1 no.1
    • /
    • pp.46-50
    • /
    • 2006
  • According to the advancement of experimental techniques in molecular biology, genomic and protein sequence databases are increasing in size exponentially, and mean sequence lengths are also increasing. Because the sizes of these databases become larger, it is difficult to search similar sequences in biological databases with significant homologies to a query sequence. In this paper, we present the N-gram indexing method to retrieve similar sequences fast, precisely and comparably. This method regards a protein sequence as a text written in language of 20 amino acid codes, adapts N-gram tokens of fixed-length as its indexing scheme for sequence strings. After such tokens are indexed for all the sequences in the database, sequences can be searched with information retrieval algorithms. Using this new method, we have developed a protein sequence search system named as ProSeS (PROtein Sequence Search). ProSeS is a protein sequence analysis system which provides overall analysis results such as similar sequences with significant homologies, predicted subcellular locations of the query sequence, and major keywords extracted from annotations of similar sequences. We show experimentally that the N-gram indexing approach saves the retrieval time significantly, and that it is as accurate as current popular search tool BLAST.

  • PDF

A New Approach to Find Orthologous Proteins Using Sequence and Protein-Protein Interaction Similarity

  • Kim, Min-Kyung;Seol, Young-Joo;Park, Hyun-Seok;Jang, Seung-Hwan;Shin, Hang-Cheol;Cho, Kwang-Hwi
    • Genomics & Informatics
    • /
    • v.7 no.3
    • /
    • pp.141-147
    • /
    • 2009
  • Developed proteome-scale ortholog and paralog prediction methods are mainly based on sequence similarity. However, it is known that even the closest BLAST hit often does not mean the closest neighbor. For this reason, we added conserved interaction information to find orthologs. We propose a genome-scale, automated ortholog prediction method, named OrthoInterBlast. The method is based on both sequence and interaction similarity. When we applied this method to fly and yeast, 17% of the ortholog candidates were different compared with the results of Inparanoid. By adding protein-protein interaction information, proteins that have low sequence similarity still can be selected as orthologs, which can not be easily detected by sequence homology alone.

Molecular Cloning of the nahC Gene Encoding 1,2-Dihydroxynaphthalene Dioxygenase from Pseudomonas fluorescens

  • KIM, YEO-JUNG;NA-RI LEE;SOON-YOUNG CHOI;KYUNG-HEE MIN
    • Journal of Microbiology and Biotechnology
    • /
    • v.12 no.1
    • /
    • pp.172-175
    • /
    • 2002
  • The complete nucleotide sequence of the nahC gene from Pseudomonas fluorescens, the structural gene for 1,2-dihydroxynaphthalene (1,2-DHN) dioxygenase, was determined. The 1,2-DHN dioxygenase is an extradiol ring-cleavage enzyme that cleaves the first ring of 1,2-dihydroxynaphthalene. The amino acid sequence of the dioxygenase deduced from the nucleotide sequence suggested that the holoenzyme consists of eight identical subunits with a molecular weight of approximately 34,200. The amino acid sequence of 1,2-DHN dioxygenase showed more than $90\%$ homology with those of the dioxygenases of other Pseudomonas strains. However, sequence similarity with those of the Sphingomonas species was less than $60\%$. The nahC gene of P. fluorescens was moderately expressed in E. coli NM522, as determined by enzymatic activity.

Analysis of Small-Subunit rDNA Sequences Obtained from Korean Peridinium bipes f. occultatum (Dinophyceae) (한국산 와편모조류 Peridinium bipes f. occultatum의 Small-Subunit Ribosomal DNA(SSU rDNA) 염기서열 분석)

  • Ki, Jang-Seu;Cho, Soo-Yeon;Han, Myung-Soo
    • ALGAE
    • /
    • v.20 no.1
    • /
    • pp.25-30
    • /
    • 2005
  • To clarify some confusions concerning identification of the Korean Peridinium species, genotypic analysis was performed with their SSU rDNA sequences. PCR was used to amplify the partial SSU rDNA of Peridinium isolates collected from three different Korean waters (Juam, Sang-sa and Togyo Reservoirs). The PCR products were allowed directly to sequence, which revealed each 942 bp of rDNA sequence. Analyses of the rDNA sequences showed that all the Korean isolates had the same genotype (100% sequence homology), and they were nearly identical to a Japanese strain of P. bipes f. occultatum (NIES 364; 99.8% sequence similarity). The sequence-based comparisons could clearly resolve P. bipes f. occultatum isolated from three different Korean waters.

Nucleotide Sequence Analysis of Movement Protein Gene from Tobacco Mosaic Virus Korean Pepper (TMV-KP) Strain (담배 모자이크 바이러스 한국고추계통에서 분리한 이동 단백질 유전자의 염기서열 분석)

  • 이재열;정동수;장무웅;최장경
    • Korean Journal Plant Pathology
    • /
    • v.11 no.1
    • /
    • pp.87-90
    • /
    • 1995
  • Complementary DNA of the movement protein (MP) gene of tobacco mosaic virus Korean pepper strain (TMV-KP) was synthesized from purified TMV-KP RNA by using the reverse transcription and polymerase chain reaction (PCR) system. The synthesized double stranded cDNA was cloned into the plasmid pUC9 and transformed into Escherichia coli JM110. The movement protein gene of TMV-KP of the selected clones was subjected to sequence analysis by Sanger's dideoxy chain termination method. The complete sequence of viral MP gene from TMV-KP strain was 807 nucleotides long. The nucleotide of MP gene from TMV-KP has thirteen and two nucleotide differences from TMV vulgarae (TMV-OM) and Korean (TMV-K) strains, respectively. Thus, the nucleotide sequence of TMV-KP MP gene showed higher homology of 99% with that of TMV-K MP gene.

  • PDF

A Simple and Fast Web Alignment Tool for Large Amount of Sequence Data

  • Lee, Yong-Seok;Oh, Jeong-Su
    • Genomics & Informatics
    • /
    • v.6 no.3
    • /
    • pp.157-159
    • /
    • 2008
  • Multiple sequence alignment (MSA) is the most important step for many of biological sequence analyses, homology search, and protein structural assignments. However, large amount of data make biologists difficult to perform MSA analyses and it requires much computational time to align many sequences. Here, we have developed a simple and fast web alignment tool for aligning, editing, and visualizing large amount of sequence data. We used a cluster server installed ClustalW-MPI using web services and message passing interface (MPI). It also enables users to edit multiple sequence alignments for manual editing and to download the input data and results such as alignments and phylogenetic tree.

Physiological and Molecular Characterization of NAD(P)H-Nitroreductase from Stenotrophomonas sp. OK-5 (Stenotrophomonas sp. OK-5에서 분리한 NAD(P)H-Nitroreductase의 생리학적 및 분자생학적 특성 연구)

  • Ho Eun-Mi;Kahng Hyung-Yeel;Oh Kye-Heon
    • Korean Journal of Microbiology
    • /
    • v.40 no.3
    • /
    • pp.183-188
    • /
    • 2004
  • Stenotrophomonas sp. OK-5 capable of degrading TNT has been found to have three nitroreductase fractions designated as NTR fractions I, II, and III. NTR in a previous study. This study was attempted to reveal physiological and molecular characteristics of NTR fractions I, II, and III in strain OK-5. Several chemicals (e.g., EDTA, NaCl, dithiothreitol, $\beta$-mercaptoethanol) were tested for their effect on enzyme activity of NTRs, demonstrating that enzyme activities of NTR fractions I, II, and III from OK-5 were inhibited in the presence of $\beta$-mercaptoethanol. Substrate specificity test showed that NTR fractions I, II, and III all have over 70% enzyme activities for nitrobenzene or RDX as a substrate. N-terminal amino acid sequence of NTR fraction I from Stenotrophomonas sp. OK-5 was $^1MSDLLNADAVVQLFRTARDS^20$ and exhibited 70% sequence homology with that of NTR from Xanthomonas campestris. NTR I gene from Stenotrophomonas sp. OK-5 (SmOK5nrI) shared extensive sequence homology in deduced amino acid sequence of PCR product with NTRs from Xanthomonas campestris (81 %), X. axonopodis (75%), Streptomyces avermitilis(30%), whereas they had low homology with that from P. putida KT2440 (pnrB) (16%).

Bacillus stearothermophilus Acetylxylan Esterase 유전자(estI)의 염기 서열 결정

  • 이정숙;최용진
    • Microbiology and Biotechnology Letters
    • /
    • v.25 no.1
    • /
    • pp.23-29
    • /
    • 1997
  • The nucleotide sequence of the estI gene encoding acetylxylan esterase I of Bacillus stearothermophilus was determined and analyzed. The estI gene was found to consist of a 810 base pair open reading frame coding for a polypeptide of 270 amino acids with a deduced molecular weight of 30 kDa. This was in well agreement with the molecular weight (29 kDa) estimated by SDS-PAGE of the purified esterase. The coding sequence was preceded by a putative ribo some binding site 10 bp upsteam of the ATG codon. Further 53 bp upstream, the transcription initiation signals were identified. The putative $_{-}$10 sequence (TCCAAT) and $_{-}$35 seqence (TTGAAT) corresponded closely to the respective consensus sequences for the Bacillus subtiis major RNA polymerase. The G+C content of the coding region of the estI was 51% whereas that of the third position of codone was 60.2%. The N-terminal amino acid sequence of the EstI deduced from the nucleotide sequence perfectly matched the corresponding region of the purified esterase described previously. Comparison with the amino acid sequence of other esterases and lipases reported so far allowed us to identify a sequence, GLSMG at positions 123 to 127 of the EstI which was reported to be the highly conserved active site sequence for those enzymes. The nucleotide sequence of the estI revealed 55.7% homology to that of the xylC coding for the acetylxylan esterase of Caldocellum saccharolyticum.

  • PDF

Characterizations of Tobacco Mosaic Virus isolated from Chinese Foxglove(Rehmannia glutinosa Libosch) (지황(Rehmannia glutinosa Libosch)에서 분리한 Tobacco Mosaic Virus의 특성)

  • 박준식;최민경;유강열;이귀재
    • Korean Journal of Plant Resources
    • /
    • v.16 no.3
    • /
    • pp.230-237
    • /
    • 2003
  • This study was conducted to investigate the occurrence and characterization of tobacco mosaic virus(TMV) in Chinese foxglove isolated from the field of the Chonbuk province(Jinan, Jangsu, Jeongeup). TMV was detected in all three regions and confirmed positive reaction by ELISA test. In the host range test, Chenopodium amaranticola, Nicotiana glutinosa, N. tabacum cv. 'Bright yellow', N. tabacum cv. 'KY­57, Datura stramonium were locally infected with the virus. The virus produced mosaic symptom on inoculated leaves of N. tabacum cv. 'Samson'. However, Chenopodium quinoa, Glycine max, Raphanus sativus, Cucumis sativus, Cucurbita moschata, Brassica rape and Lycopersion esculentum did not show any symptoms. TMV particles were revealed as a stiff rod shape by transmission electron microscopic(TEM) and measured as 300 nm in length with 18 nm in diameter. Total RNA was extracted from showing symptom loaves infected with TMV and the reverse transcription­polymerase chain reaction (RT­PCR) obtained 531 bp DNA product of RNA with specific primer used. The capsid protein of TMV­RE showed higher amino acid sequence homology(97.7%) with TMV­To than with TMV­P(72.2%). The capsid protein of TMV­152 showed same amino acid sequence homology with TMV­F. The result of comparison of nucleotides sequence homology between TMV­RE strain and other TMV strain showed 94% homology with others except TMV­P(67.3%) and TMV ­ C(68.6%).

Construction of a full-length cDNA library from Pinus koraiensis and analysis of EST dataset (잣나무(Pinus koraiensis)의 cDNA library 제작 및 EST 분석)

  • Kim, Joon-Ki;Im, Su-Bin;Choi, Sun-Hee;Lee, Jong-Suk;Roh, Mark S.;Lim, Yong-Pyo
    • Korean Journal of Agricultural Science
    • /
    • v.38 no.1
    • /
    • pp.11-16
    • /
    • 2011
  • In this study, we report the generation and analysis of a total of 1,211 expressed sequence tags (ESTs) from Pinus koraiensis. A cDNA library was generated from the young leaf tissue and a total of 1,211 cDNA were partially sequenced. EST and unigene sequence quality were determined by computational filtering, manual review, and BLAST analyses. In all, 857 ESTs were acquired after the removal of the vector sequence and filtering over a minimum length 50 nucleotides. A total of 411 unigene, consisting of 89 contigs and 322 singletons, was identified after assembling. Also, we identified 77 new microsatellite-containing sequences from the unigenes and classified the structure according to their repeat unit. According to homology search with BLASTX against the NCBI database, 63.1% of ESTs were homologous with known function and 22.2% of ESTs were matched with putative or unknown function. The remaining 14.6% of ESTs showed no significant similarity to any protein sequences found in the public database. Gene ontology (GO) classification showed that the most abundant GO terms were transport, nucleotide binding, plastid, in terms biological process, molecular function and cellular component, respectively. The sequence data will be used to characterize potential roles of new genes in Pinus and provided for the useful tools as a genetic resource.