• 제목/요약/키워드: Biological sequence

검색결과 1,437건 처리시간 0.024초

Mining Maximal Frequent Contiguous Sequences in Biological Data Sequences

  • Kang, Tae-Ho;Yoo, Jae-Soo;Kim, Hak-Yong;Lee, Byoung-Yup
    • International Journal of Contents
    • /
    • 제3권2호
    • /
    • pp.18-24
    • /
    • 2007
  • Biological sequences such as DNA and amino acid sequences typically contain a large number of items. They have contiguous sequences that ordinarily consist of more than hundreds of frequent items. In biological sequences analysis(BSA), a frequent contiguous sequence search is one of the most important operations. Many studies have been done for mining sequential patterns efficiently. Most of the existing methods for mining sequential patterns are based on the Apriori algorithm. In particular, the prefixSpan algorithm is one of the most efficient sequential pattern mining schemes based on the Apriori algorithm. However, since the algorithm expands the sequential patterns from frequent patterns with length-1, it is not suitable for biological datasets with long frequent contiguous sequences. In recent years, the MacosVSpan algorithm was proposed based on the idea of the prefixSpan algorithm to significantly reduce its recursive process. However, the algorithm is still inefficient for mining frequent contiguous sequences from long biological data sequences. In this paper, we propose an efficient method to mine maximal frequent contiguous sequences in large biological data sequences by constructing the spanning tree with a fixed length. To verify the superiority of the proposed method, we perform experiments in various environments. The experiments show that the proposed method is much more efficient than MacosVSpan in terms of retrieval performance.

BSML 기반 능동 트리거 규칙을 이용한 염기서열정보관리시스템의 구현 (Implementation of an Information Management System for Nucleotide Sequences based on BSML using Active Trigger Rules)

  • 박성희;정광수;류근호
    • 한국정보과학회논문지:데이타베이스
    • /
    • 제32권1호
    • /
    • pp.24-42
    • /
    • 2005
  • 유전체 서열을 포함하는 생물정보는 지속적으로 변화하며 이질적이고 다양하다는 특성을 갖는다. 이러한 생물 정보의 특성을 반영한 관리시스템이 요구되지만 현재 대부분의 기존 생물정보 데이타베이스는 생물 데이타에 대한 저장소로만 이용된다. 따라서 이 논문에서는 생물학 연구실 수준에서 시퀀싱 실험을 통해 생산되거나 다양한 공개용 데이타베이스로부터 수집된 염기 서열 데이타를 파일 포맷 변환, 편집, 저장 및 검색을 수행하는 서열정보관리 시스템을 제시한다. 이질적인 서열 포맷간의 파일 변환을 위하여 XML기반 BSML을 공통 포맷으로 이용한다. 서열 저장관리에서는 동일한 DNA 조각에 대한 서열 구성의 변경정보를 저장하기 위해 서열 버전을 정의하고 능동 트리거 규칙을 이용하여 변경 정보 검출 및 생성 방법을 보여준다. 트리거 기능을 이용하여 서열의 변경 정보를 자동적으로 데이타베이스에서 저장관리 할 수 있음을 보이고 성능을 평가하였다.

Cloning and molecular characterization of a new fungal xylanase gene from Sclerotinia sclerotiorum S2

  • Ellouze, Olfa Elleuch;Loukil, Sana;Marzouki, Mohamed Nejib
    • BMB Reports
    • /
    • 제44권10호
    • /
    • pp.653-658
    • /
    • 2011
  • Sclerotinia sclerotiorum fungus has three endoxylanases induced by wheat bran. In the first part, a partial xylanase sequence gene (90 bp) was isolated by PCR corresponding to catalytic domains (${\beta}5$ and ${\beta}6$ strands of this protein). The high homology of this sequence with xylanase of Botryotinia fuckeliana has permitted in the second part to amplify the XYN1 gene. Sequence analysis of DNA and cDNA revealed an ORF of 746 bp interrupted by a 65 bp intron, thus encoding a predicted protein of 226 amino acids. The mature enzyme (20.06 kDa), is coded by 188 amino acid (pI 9.26). XYN1 belongs to G/11 glycosyl hydrolases family with a conserved catalytic domain containing $E_{86}$ and $E_{178}$ residues. Bioinformatics analysis revealed that there was no Asn-X-Ser/Thr motif required for N-linked glycosylation in the deduced sequence however, five O-glycosylation sites could intervene in the different folding of xylanses isoforms and in their secretary pathway.

Characterization of Korean Erwinia carotovora Strains from Potato and Chinese Cabbage

  • Seo, Sang-Tae;Koo, Jun-Hak;Hur, Jang-Hyun;Lim, Chun-Keun
    • The Plant Pathology Journal
    • /
    • 제20권4호
    • /
    • pp.283-288
    • /
    • 2004
  • Four Erwinia carotovora strains isolated from potatoes showing blackleg symptoms and rotted Chinese cabbage were analysed by biochemical tests and sequence analysis of 16S rDNA and 16S-23S rRNA intergenic spacer (IGS) regions, and the data were compared to related E. carotovora strains. Based on the results of the biochemical tests and sequence analysis, 2 of the 4 strains were identified as E. carotovora subsp. carotovora (Ecc), whereas the rest strains were distinct from Ecc. The last two strains, HCC3 and JEJU, were biochemically similar to E, carotovora subsp. atroseptica (Eca). However, the results of sequence analysis and Eca-specific PCR assays showed that the strains were distinct from Eca. On the basis of 16S rDNA sequence analysis, HCC3 and JEJU strains were placed in E. carotovora subsp. odorifera and E. carotovora subsp. wasabiae, respectively. The results of sequence analysis and specific PCR assay for Eca indicated that Asian Eca strains were distinct from European Eca strains, although they were phenotycally homogeneous.

Generation of Protein Lineages with new Sequence Spaces by Functional Salvage Screen

  • Kim, Geun-Joong;Cheon, Young-Hoon;Park, Min-Soon;Park, Hee-Sung;Kim, Hak-Sung
    • 한국미생물생명공학회:학술대회논문집
    • /
    • 한국미생물생명공학회 2001년도 Proceedings of 2001 International Symposium
    • /
    • pp.77-80
    • /
    • 2001
  • A variety of different methods to generate diverse proteins, including random mutagenesis and recombination, are currently available, and most of them accumulate the mutations on the target gene of a protein, whose sequence space remains unchanged. On the other hand, a pool of diverse genes, which is generated by random insertions, deletions, and exchange of the homologous domains with different lengths in the target gene, would present the protein lineages resulting in new fitness landscapes. Here we report a method to generate a pool of protein variants with different sequence spaces by employing green fluorescent protein (GFP) as a model protein. This process, designated functional salvage screen (FSS), comprises the following procedures: a defective GFP template expressing no fluorescence is firstly constructed by genetically disrupting a predetermined region(s) of the protein, and a library of GFP variants is generated from the defective template by incorporating the randomly fragmented genomic DNA from E. coli into the defined region(s) of the target gene, followed by screening of the functionally salvaged, fluorescence-emitting GFPs. Two approaches, sequence-directed and PCR-coupled methods, were attempted to generate the library of GFP variants with new sequences derived from the genomic segments of E. coli. The functionally salvaged GFPs were selected and analyzed in terms of the sequence space and functional property. The results demonstrate that the functional salvage process not only can be a simple and effective method to create protein lineages with new sequence spaces, but also can be useful in elucidating the involvement of a specific region(s) or domain(s) in the structure and function of protein.

  • PDF

Cloning, Sequencing and Baculovirus-based Expression of Fusion-Glycoprotein D Gene of Herpes Simplex Virus Type 1 (F)

  • Uh, Hong-Sun;Choi, Jin-Hee;Byun, Si-Myung;Kim, Soo-Young;Lee, Hyung-Hoan
    • BMB Reports
    • /
    • 제34권4호
    • /
    • pp.371-378
    • /
    • 2001
  • The Glycoprotein D (gD) gene of the HSV-1 strain F was cloned, sequenced, recombinated into the HcNPV (Hyphantria cunea nuclear polyhedrosis virus) expression vector and expressed in insect cells. The gD gene was located in the 6.43 kb BamHI fragment of the strainF. The open reading frame (ORF) of the gD gene was 1,185 by and codes 394 amino acid residues. Recombinant baculoviruses, GD-HcNPVs, expressing the gD protein were constructed. Spodoptera frugiperda cells, infected with the recombinant virus, synthesized a matured gX-gD fusion protein with an approximate molecular weight of 54 kDa and secreted the gD proteins into the culture media by an immunoprecipitation assay The fusion gD protein was localized on the membrane of the insect cells, seen by using an immunofluorescence assay The deduced amino acid sequence presents additional characteristics compatible with the structure of a viral glycoprotein: signal peptide, putative glycosylation sites and a long C-terminal transmembrane sequence. These results indicate the utility of the HcNPV-insect cell system for producing and characterizing eukaryotic proteins.

  • PDF

Cloning and Sequence Analysis of Two Catechol-degrading Gene Clusters from a Phenol-utilizing Bacterium Pseudomonas putida SM25

  • Jung, Young-Hee;Ka, Jong-Ok;Cheon, Choong-Ⅰll;Lee, Myeong-Sok;Song, Eun-Sook;Daeho Cho;Park, Sang-Ho;Ha, Kwon-Soo;Park, Young-Mok
    • Journal of Microbiology
    • /
    • 제41권2호
    • /
    • pp.102-108
    • /
    • 2003
  • A 6.1 kb Sph I fragment from the genomic DNA of Pseudomonas putida SM 25 was cloned into the veetor pUC19. The open reading frame of catB was found to consist of 1,122 nucleotides. The sequence alignment of the catB gene products from different kinds of bacteria revealed an overall identity ranging from 40 to 98%. The catC gene contained an open reading frame of 96 codons, from which a protein with a molecular mass of about 10.6 kDa was predicted. The amino acids in the proposed activesite region of CatC were found to be almost conserved, including the charged residues. Since the catBC genes in P. putida SM25 were tightly linked, the could be regulated under coordinate transcription, and transcribed from a single promoter located upstream of the catB gene, as in P. putida RBI.

Identification and Phylogenetic Analysis of SINE-R Retroposon Family in cDNA Library of Human Fetal Brain

  • Yi, Joo-Mi;Shin, Kyung-Mi;Lee, Ji-Won;Paik, In-Ho;Jang, Kyung-Lib;Kim, Heui-Soo
    • Animal cells and systems
    • /
    • 제5권3호
    • /
    • pp.231-236
    • /
    • 2001
  • SINE-R retroposons have been derived from human endogenous retrovirus HERV-K family and found to be hominoid specific. Both SINE-R retroposons and HERV-K family are potentially capable of affecting the expression of closely located genes. From cDNA library of human fetal brain, we identified seven SINE-R retroposons and compared them with sequences derived from GenBank database. The SINE-R retroposons from human feta1 brain showed 85∼97% sequence similarities with the human-specific retroposon SINE-R.C2. They also showed 88∼96% sequence similarities with the sequence of the schizo-cDNA clone that derived from postmortem frontal cortex tissue of a schizophrenic patient. Phylogenetic analysis using the neiqhbor-joining method revealed that the seven new SINE-R retroposons from cDNA library of the human feta1 brain have proliferated independently during human evolution. The data indicate that such SINE-R retroposons are expressed in human fetal brain and deserve further investigation as potential leads to understanding of neuropsychiatric diseases.

  • PDF

Characterization of an Extracellular Lipase in Burkholderia sp. HY-10 Isolated from a Longicorn Beetle

  • Park, Doo-Sang;Oh, Hyun-Woo;Heo, Sun-Yeon;Jeong, Won-Jin;Shin, Dong-Ha;Bae, Kyung-Sook;Park, Ho-Yong
    • Journal of Microbiology
    • /
    • 제45권5호
    • /
    • pp.409-417
    • /
    • 2007
  • Burkholderia sp. HY-10 isolated from the digestive tracts of the longicorn beetle, Prionus insularis, produced an extracellular lipase with a molecular weight of 33.5 kDa estimated by SDS-PAGE. The lipase was purified from the culture supernatant to near electrophoretic homogenity by a one-step adsorption-desorption procedure using a polypropylene matrix followed by a concentration step. The purified lipase exhibited highest activities at pH 8.5 and $60^{\circ}C$. A broad range of lipase substrates, from $C_4\;to\;C_{18}$ p-nitrophenyl esters, were hydrolyzed efficiently by the lipase. The most efficient substrate was p-nitrophenyl caproate ($C_6$). A 2485 bp DNA fragment was isolated by PCR amplification and chromosomal walking which encoded two polypeptides of 364 and 346 amino acids, identified as a lipase and a lipase foldase, respectively. The N-terminal amino acid sequence of the purified lipase and nucleotide sequence analysis predicted that the precursor lipase was proteolytically modified through the secretion step and produced a catalytically active 33.5 kDa protein. The deduced amino acid sequence for the lipase shared extensive similarity with those of the lipase family 1.2 of lipases from other bacteria. The deduced amino acid sequence contained two Cystein residues forming a disulfide bond in the molecule and three, well-conserved amino acid residues, $Ser^{131},\;His^{330},\;and\;Asp^{308}$, which composed the catalytic triad of the enzyme.

Gene Microarray의 기본개념 (Basic Concept of Gene Microarray)

  • 황승용
    • 생물정신의학
    • /
    • 제8권2호
    • /
    • pp.203-207
    • /
    • 2001
  • The genome sequencing project has generated and will continue to generate enormous amounts of sequence data including 5 eukaryotic and about 60 prokaryotic genomes. Given this ever-increasing amounts of sequence information, new strategies are necessary to efficiently pursue the next phase of the genome project-the elucidation of gene expression patterns and gene product function on a whole genome scale. In order to assign functional information to the genome sequence, DNA chip(or gene microarray) technology was developed to efficiently identify the differential expression pattern of independent biological samples. DNA chip provides a new tool for genome expression analysis that may revolutionize many aspects of biotechnology including new drug discovery and disease diagnostics.

  • PDF