• Title/Summary/Keyword: Biological Sequence

Search Result 1,458, Processing Time 0.031 seconds

Mining Maximal Frequent Contiguous Sequences in Biological Data Sequences

  • Kang, Tae-Ho;Yoo, Jae-Soo;Kim, Hak-Yong;Lee, Byoung-Yup
    • International Journal of Contents
    • /
    • v.3 no.2
    • /
    • pp.18-24
    • /
    • 2007
  • Biological sequences such as DNA and amino acid sequences typically contain a large number of items. They have contiguous sequences that ordinarily consist of more than hundreds of frequent items. In biological sequences analysis(BSA), a frequent contiguous sequence search is one of the most important operations. Many studies have been done for mining sequential patterns efficiently. Most of the existing methods for mining sequential patterns are based on the Apriori algorithm. In particular, the prefixSpan algorithm is one of the most efficient sequential pattern mining schemes based on the Apriori algorithm. However, since the algorithm expands the sequential patterns from frequent patterns with length-1, it is not suitable for biological datasets with long frequent contiguous sequences. In recent years, the MacosVSpan algorithm was proposed based on the idea of the prefixSpan algorithm to significantly reduce its recursive process. However, the algorithm is still inefficient for mining frequent contiguous sequences from long biological data sequences. In this paper, we propose an efficient method to mine maximal frequent contiguous sequences in large biological data sequences by constructing the spanning tree with a fixed length. To verify the superiority of the proposed method, we perform experiments in various environments. The experiments show that the proposed method is much more efficient than MacosVSpan in terms of retrieval performance.

Implementation of an Information Management System for Nucleotide Sequences based on BSML using Active Trigger Rules (BSML 기반 능동 트리거 규칙을 이용한 염기서열정보관리시스템의 구현)

  • Park Sung Hee;Jung Kwang Su;Ryu Keun Ho
    • Journal of KIISE:Databases
    • /
    • v.32 no.1
    • /
    • pp.24-42
    • /
    • 2005
  • Characteristics of biological data including genome sequences are heterogeneous and various. Although the need of management systems for genome sequencing which should reflect biological characteristics has been raised, most current biological databases provide restricted function as repositories for biological data. Therefore, this paper describes a management system of nucleotide sequences at the level of biological laboratories. It includes format transformation, editing, storing and retrieval for collected nucleotide sequences from public databases, and handles sequence produced by experiments. It uses BSML based on XML as a common format in order to extract data fields and transfer heterogeneous sequence formats. To manage sequences and their changes, version management system for originated DNA is required so as to detect transformed new sequencing appearance and trigger database update. Our experimental results show that applying active trigger rules to manage changes of sequences can automatically store changes of sequences into databases.

Cloning and molecular characterization of a new fungal xylanase gene from Sclerotinia sclerotiorum S2

  • Ellouze, Olfa Elleuch;Loukil, Sana;Marzouki, Mohamed Nejib
    • BMB Reports
    • /
    • v.44 no.10
    • /
    • pp.653-658
    • /
    • 2011
  • Sclerotinia sclerotiorum fungus has three endoxylanases induced by wheat bran. In the first part, a partial xylanase sequence gene (90 bp) was isolated by PCR corresponding to catalytic domains (${\beta}5$ and ${\beta}6$ strands of this protein). The high homology of this sequence with xylanase of Botryotinia fuckeliana has permitted in the second part to amplify the XYN1 gene. Sequence analysis of DNA and cDNA revealed an ORF of 746 bp interrupted by a 65 bp intron, thus encoding a predicted protein of 226 amino acids. The mature enzyme (20.06 kDa), is coded by 188 amino acid (pI 9.26). XYN1 belongs to G/11 glycosyl hydrolases family with a conserved catalytic domain containing $E_{86}$ and $E_{178}$ residues. Bioinformatics analysis revealed that there was no Asn-X-Ser/Thr motif required for N-linked glycosylation in the deduced sequence however, five O-glycosylation sites could intervene in the different folding of xylanses isoforms and in their secretary pathway.

Characterization of Korean Erwinia carotovora Strains from Potato and Chinese Cabbage

  • Seo, Sang-Tae;Koo, Jun-Hak;Hur, Jang-Hyun;Lim, Chun-Keun
    • The Plant Pathology Journal
    • /
    • v.20 no.4
    • /
    • pp.283-288
    • /
    • 2004
  • Four Erwinia carotovora strains isolated from potatoes showing blackleg symptoms and rotted Chinese cabbage were analysed by biochemical tests and sequence analysis of 16S rDNA and 16S-23S rRNA intergenic spacer (IGS) regions, and the data were compared to related E. carotovora strains. Based on the results of the biochemical tests and sequence analysis, 2 of the 4 strains were identified as E. carotovora subsp. carotovora (Ecc), whereas the rest strains were distinct from Ecc. The last two strains, HCC3 and JEJU, were biochemically similar to E, carotovora subsp. atroseptica (Eca). However, the results of sequence analysis and Eca-specific PCR assays showed that the strains were distinct from Eca. On the basis of 16S rDNA sequence analysis, HCC3 and JEJU strains were placed in E. carotovora subsp. odorifera and E. carotovora subsp. wasabiae, respectively. The results of sequence analysis and specific PCR assay for Eca indicated that Asian Eca strains were distinct from European Eca strains, although they were phenotycally homogeneous.

Generation of Protein Lineages with new Sequence Spaces by Functional Salvage Screen

  • Kim, Geun-Joong;Cheon, Young-Hoon;Park, Min-Soon;Park, Hee-Sung;Kim, Hak-Sung
    • Proceedings of the Korean Society for Applied Microbiology Conference
    • /
    • 2001.06a
    • /
    • pp.77-80
    • /
    • 2001
  • A variety of different methods to generate diverse proteins, including random mutagenesis and recombination, are currently available, and most of them accumulate the mutations on the target gene of a protein, whose sequence space remains unchanged. On the other hand, a pool of diverse genes, which is generated by random insertions, deletions, and exchange of the homologous domains with different lengths in the target gene, would present the protein lineages resulting in new fitness landscapes. Here we report a method to generate a pool of protein variants with different sequence spaces by employing green fluorescent protein (GFP) as a model protein. This process, designated functional salvage screen (FSS), comprises the following procedures: a defective GFP template expressing no fluorescence is firstly constructed by genetically disrupting a predetermined region(s) of the protein, and a library of GFP variants is generated from the defective template by incorporating the randomly fragmented genomic DNA from E. coli into the defined region(s) of the target gene, followed by screening of the functionally salvaged, fluorescence-emitting GFPs. Two approaches, sequence-directed and PCR-coupled methods, were attempted to generate the library of GFP variants with new sequences derived from the genomic segments of E. coli. The functionally salvaged GFPs were selected and analyzed in terms of the sequence space and functional property. The results demonstrate that the functional salvage process not only can be a simple and effective method to create protein lineages with new sequence spaces, but also can be useful in elucidating the involvement of a specific region(s) or domain(s) in the structure and function of protein.

  • PDF

Cloning, Sequencing and Baculovirus-based Expression of Fusion-Glycoprotein D Gene of Herpes Simplex Virus Type 1 (F)

  • Uh, Hong-Sun;Choi, Jin-Hee;Byun, Si-Myung;Kim, Soo-Young;Lee, Hyung-Hoan
    • BMB Reports
    • /
    • v.34 no.4
    • /
    • pp.371-378
    • /
    • 2001
  • The Glycoprotein D (gD) gene of the HSV-1 strain F was cloned, sequenced, recombinated into the HcNPV (Hyphantria cunea nuclear polyhedrosis virus) expression vector and expressed in insect cells. The gD gene was located in the 6.43 kb BamHI fragment of the strainF. The open reading frame (ORF) of the gD gene was 1,185 by and codes 394 amino acid residues. Recombinant baculoviruses, GD-HcNPVs, expressing the gD protein were constructed. Spodoptera frugiperda cells, infected with the recombinant virus, synthesized a matured gX-gD fusion protein with an approximate molecular weight of 54 kDa and secreted the gD proteins into the culture media by an immunoprecipitation assay The fusion gD protein was localized on the membrane of the insect cells, seen by using an immunofluorescence assay The deduced amino acid sequence presents additional characteristics compatible with the structure of a viral glycoprotein: signal peptide, putative glycosylation sites and a long C-terminal transmembrane sequence. These results indicate the utility of the HcNPV-insect cell system for producing and characterizing eukaryotic proteins.

  • PDF

Cloning and Sequence Analysis of Two Catechol-degrading Gene Clusters from a Phenol-utilizing Bacterium Pseudomonas putida SM25

  • Jung, Young-Hee;Ka, Jong-Ok;Cheon, Choong-Ⅰll;Lee, Myeong-Sok;Song, Eun-Sook;Daeho Cho;Park, Sang-Ho;Ha, Kwon-Soo;Park, Young-Mok
    • Journal of Microbiology
    • /
    • v.41 no.2
    • /
    • pp.102-108
    • /
    • 2003
  • A 6.1 kb Sph I fragment from the genomic DNA of Pseudomonas putida SM 25 was cloned into the veetor pUC19. The open reading frame of catB was found to consist of 1,122 nucleotides. The sequence alignment of the catB gene products from different kinds of bacteria revealed an overall identity ranging from 40 to 98%. The catC gene contained an open reading frame of 96 codons, from which a protein with a molecular mass of about 10.6 kDa was predicted. The amino acids in the proposed activesite region of CatC were found to be almost conserved, including the charged residues. Since the catBC genes in P. putida SM25 were tightly linked, the could be regulated under coordinate transcription, and transcribed from a single promoter located upstream of the catB gene, as in P. putida RBI.

Identification and Phylogenetic Analysis of SINE-R Retroposon Family in cDNA Library of Human Fetal Brain

  • Yi, Joo-Mi;Shin, Kyung-Mi;Lee, Ji-Won;Paik, In-Ho;Jang, Kyung-Lib;Kim, Heui-Soo
    • Animal cells and systems
    • /
    • v.5 no.3
    • /
    • pp.231-236
    • /
    • 2001
  • SINE-R retroposons have been derived from human endogenous retrovirus HERV-K family and found to be hominoid specific. Both SINE-R retroposons and HERV-K family are potentially capable of affecting the expression of closely located genes. From cDNA library of human fetal brain, we identified seven SINE-R retroposons and compared them with sequences derived from GenBank database. The SINE-R retroposons from human feta1 brain showed 85∼97% sequence similarities with the human-specific retroposon SINE-R.C2. They also showed 88∼96% sequence similarities with the sequence of the schizo-cDNA clone that derived from postmortem frontal cortex tissue of a schizophrenic patient. Phylogenetic analysis using the neiqhbor-joining method revealed that the seven new SINE-R retroposons from cDNA library of the human feta1 brain have proliferated independently during human evolution. The data indicate that such SINE-R retroposons are expressed in human fetal brain and deserve further investigation as potential leads to understanding of neuropsychiatric diseases.

  • PDF

Characterization of an Extracellular Lipase in Burkholderia sp. HY-10 Isolated from a Longicorn Beetle

  • Park, Doo-Sang;Oh, Hyun-Woo;Heo, Sun-Yeon;Jeong, Won-Jin;Shin, Dong-Ha;Bae, Kyung-Sook;Park, Ho-Yong
    • Journal of Microbiology
    • /
    • v.45 no.5
    • /
    • pp.409-417
    • /
    • 2007
  • Burkholderia sp. HY-10 isolated from the digestive tracts of the longicorn beetle, Prionus insularis, produced an extracellular lipase with a molecular weight of 33.5 kDa estimated by SDS-PAGE. The lipase was purified from the culture supernatant to near electrophoretic homogenity by a one-step adsorption-desorption procedure using a polypropylene matrix followed by a concentration step. The purified lipase exhibited highest activities at pH 8.5 and $60^{\circ}C$. A broad range of lipase substrates, from $C_4\;to\;C_{18}$ p-nitrophenyl esters, were hydrolyzed efficiently by the lipase. The most efficient substrate was p-nitrophenyl caproate ($C_6$). A 2485 bp DNA fragment was isolated by PCR amplification and chromosomal walking which encoded two polypeptides of 364 and 346 amino acids, identified as a lipase and a lipase foldase, respectively. The N-terminal amino acid sequence of the purified lipase and nucleotide sequence analysis predicted that the precursor lipase was proteolytically modified through the secretion step and produced a catalytically active 33.5 kDa protein. The deduced amino acid sequence for the lipase shared extensive similarity with those of the lipase family 1.2 of lipases from other bacteria. The deduced amino acid sequence contained two Cystein residues forming a disulfide bond in the molecule and three, well-conserved amino acid residues, $Ser^{131},\;His^{330},\;and\;Asp^{308}$, which composed the catalytic triad of the enzyme.

Basic Concept of Gene Microarray (Gene Microarray의 기본개념)

  • Hwang, Seung Yong
    • Korean Journal of Biological Psychiatry
    • /
    • v.8 no.2
    • /
    • pp.203-207
    • /
    • 2001
  • The genome sequencing project has generated and will continue to generate enormous amounts of sequence data including 5 eukaryotic and about 60 prokaryotic genomes. Given this ever-increasing amounts of sequence information, new strategies are necessary to efficiently pursue the next phase of the genome project-the elucidation of gene expression patterns and gene product function on a whole genome scale. In order to assign functional information to the genome sequence, DNA chip(or gene microarray) technology was developed to efficiently identify the differential expression pattern of independent biological samples. DNA chip provides a new tool for genome expression analysis that may revolutionize many aspects of biotechnology including new drug discovery and disease diagnostics.

  • PDF