• Title/Summary/Keyword: Sequence Databases

Search Result 226, Processing Time 0.02 seconds

A Cloning of Novel Esterase from a Metagenomic Library

  • Yoon, Sang-Young;Kim, Seung-Bum;Ryu, Yeon-Woo
    • 한국생물공학회:학술대회논문집
    • /
    • 2005.10a
    • /
    • pp.243-246
    • /
    • 2005
  • A novel esterase showing high enantioselectivity to (S)-ketoprofen ethyl ester was selected from fosmid environmental DNA library which is provided by Microbial Genomic & Applications Center. As a result of Blast search, the gene wasn't registerated in Gene Bank yet. And as we know, conserved domain region of esterase , G-X-S-X-G, wasn't discovered.$^{4)}$ And it is similar to Beta-lactamase. The DNA sequence of cloned esterase include an open reading frame consisting of 1170 bp, designated as EST-Y29, encoding a protein of 389 amino acids with a molecular mass of about 42.8 kDa. And amino acid sequence analysis revealed only a few identity (28%) to tile known esterases/lipases in the databases containing the conserved sequence motifs of esterases/lipases. when being comparison to other esterase revealed , this enzyme seems to be classified as a new member of esterase family. EST-Y29 was functionally overexpressed in a soluble form in E. coli with maximum conversion yield of (S)-ketoprofen at $65^{\circ}C$. This study demonstrates that functional screening combined with the sequential uses of restriction enzymes to exclude already known enzymes is a useful approach for isolating novel enzyme from a metagenome.

  • PDF

Computational Approaches for Structural and Functional Genomics

  • Brenner, Steven-E.
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2000.11a
    • /
    • pp.17-20
    • /
    • 2000
  • Structural genomics aims to provide a good experimental structure or computational model of every tractable protein in a complete genome. Underlying this goal is the immense value of protein structure, especially in permitting recognition of distant evolutionary relationships for proteins whose sequence analysis has failed to find any significant homolog. A considerable fraction of the genes in all sequenced genomes have no known function, and structure determination provides a direct means of revealing homology that may be used to infer their putative molecular function. The solved structures will be similarly useful for elucidating the biochemical or biophysical role of proteins that have been previously ascribed only phenotypic functions. More generally, knowledge of an increasingly complete repertoire of protein structures will aid structure prediction methods, improve understanding of protein structure, and ultimately lend insight into molecular interactions and pathways. We use computational methods to select families whose structures cannot be predicted and which are likely to be amenable to experimental characterization. Methods to be employed included modern sequence analysis and clustering algorithms. A critical component is consultation of the presage database for structural genomics, which records the community's experimental work underway and computational predictions. The protein families are ranked according to several criteria including taxonomic diversity and known functional information. Individual proteins, often homologs from hyperthermophiles, are selected from these families as targets for structure determination. The solved structures are examined for structural similarity to other proteins of known structure. Homologous proteins in sequence databases are computationally modeled, to provide a resource of protein structure models complementing the experimentally solved protein structures.

  • PDF

Identification Based on Computational Analysis of rpoB Sequence of Bacillus anthracis and Closely Related Species (Bacillus anthracis와 그 유연종의 rpoB 유전자 컴퓨터 분석을 통한 동정)

  • Kim, Kyu-Kwang;Kim, Han-Bok
    • Korean Journal of Microbiology
    • /
    • v.44 no.4
    • /
    • pp.333-338
    • /
    • 2008
  • Computational analysis of partial rpoB gene sequence (777 bp) was done in this study to identify B. anthracis and its closely related species B. cereus and B. thuringiensis. Sequence data including 17 B. anthracis strains, 9 B. cereus strains, and 7 B. thuringiensis strains were obtained by searching databases. Those sequences were aligned and used for other computational analysis. B. anthracis strains were identificated by in silico restriction enzyme digestion. B. cereus and B. thuringiensis were not segregated by this method. Those sequencing and BLAST search were required to distinguish the two. In actual identification tests, B. anthracis strains could be identified by PCR-RFLP, and B. cereus and B. thuringiensis strains were distinguished by BLAST search with reliable e-value. In this study fast and accurate method for identifying three Bacillus species, and flow chart of identification were developed.

Analysis of Expressed Sequence Tags from the Antarctic Psychrophilic Green Algae, Pyramimonas gelidicola

  • Jung, Woongsic;Lee, Sung Gu;Kang, Se Won;Lee, Yong Seok;Lee, Jun Hyuck;Kang, Sung-Ho;Jin, Eon Seon;Kim, Hak Jun
    • Journal of Microbiology and Biotechnology
    • /
    • v.22 no.7
    • /
    • pp.902-906
    • /
    • 2012
  • Expressed sequence tags (ESTs) from the Antarctic green algae Pyramimonas gelidicola were analyzed to obtain molecular information on cold acclimation of psychrophilic microorganisms. A total of 2,112 EST clones were sequenced, generating 222 contigs and 219 singletons, and 200 contigs and 391 singletons from control ($4^{\circ}C$) and cold-shock conditions ($-2^{\circ}C$), respectively. The complete EST sequences were deposited to the DDBJ EST database (http://www.ddbj.nig.ac.jp/index-e.html) and the nucleotide sequences reported in this study are available in the DDBJ/EMBL/GenBank. These EST databases of Antarctic green algae can be used in a wide range of studies on psychrophilic genes expressed by polar microorganisms.

A Database Retrieval Model for Efficient Gene Sequence Alignment (효율적인 유전자 서열 비고를 위한 데이타베이스 검색 모델)

  • 김민준;임성화;김재훈;이원태;정진원
    • Journal of KIISE:Databases
    • /
    • v.31 no.3
    • /
    • pp.243-251
    • /
    • 2004
  • Most programs of bioinformatics provide biochemists and biologists retrieve and analysis services of gene and protein database. As these services retrieve database for each arrival of user's request, it takes a long time and increases server's load and response time. In this paper. by utilizing database retrieval patterns of sequence alignment programs in bioinformatics, grouping method is proposed to share database retrieval between many requests. Carpool method is also proposed to reduce response time as well as to increase system expandability by combining new arriving requests with the previous on going requests. The performance of our two proposed schemes is verified by mathematic analysis and simulation.

KUGI: A Database and Search System for Korean Unigene and Pathway Information

  • Yang, Jin-Ok;Hahn, Yoon-Soo;Kim, Nam-Soon;Yu, Ung-Sik;Woo, Hyun-Goo;Chu, In-Sun;Kim, Yong-Sung;Yoo, Hyang-Sook;Kim, Sang-Soo
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2005.09a
    • /
    • pp.407-411
    • /
    • 2005
  • KUGI (Korean UniGene Information) database contains the annotation information of the cDNA sequences obtained from the disease samples prevalent in Korean. A total of about 157,000 5'-EST high throughput sequences collected from cDNA libraries of stomach, liver, and some cancer tissues or established cell lines from Korean patients were clustered to about 35,000 contigs. From each cluster a representative clone having the longest high quality sequence or the start codon was selected. We stored the sequences of the representative clones and the clustered contigs in the KUGI database together with their information analyzed by running Blast against RefSeq, human mRNA, and UniGene databases from NCBI. We provide a web-based search engine fur the KUGI database using two types of user interfaces: attribute-based search and similarity search of the sequences. For attribute-based search, we use DBMS technology while we use BLAST that supports various similarity search options. The search system allows not only multiple queries, but also various query types. The results are as follows: 1) information of clones and libraries, 2) accession keys, location on genome, gene ontology, and pathways to public databases, 3) links to external programs, and 4) sequence information of contig and 5'-end of clones. We believe that the KUGI database and search system may provide very useful information that can be used in the study for elucidating the causes of the disease that are prevalent in Korean.

  • PDF

A DNA Index Structure using Frequency and Position Information of Genetic Alphabet (염기문자의 빈도와 위치정보를 이용한 DNA 인덱스구조)

  • Kim Woo-Cheol;Park Sang-Hyun;Won Jung-Im;Kim Sang-Wook;Yoon Jee-Hee
    • Journal of KIISE:Databases
    • /
    • v.32 no.3
    • /
    • pp.263-275
    • /
    • 2005
  • In a large DNA database, indexing techniques are widely used for rapid approximate sequence searching. However, most indexing techniques require a space larger than original databases, and also suffer from difficulties in seamless integration with DBMS. In this paper, we suggest a space-efficient and disk-based indexing and query processing algorithm for approximate DNA sequence searching, specially exact match queries, wildcard match queries, and k-mismatch queries. Our indexing method places a sliding window at every possible location of a DNA sequence and extracts its signature by considering the occurrence frequency of each nucleotide. It then stores a set of signatures using a multi-dimensional index, such as R*-tree. Especially, by assigning a weight to each position of a window, it prevents signatures from being concentrated around a few spots in index space. Our query processing algorithm converts a query sequence into a multi-dimensional rectangle and searches the index for the signatures overlapped with the rectangle. The experiments with real biological data sets revealed that the proposed method is at least three times, twice, and several orders of magnitude faster than the suffix-tree-based method in exact match, wildcard match, and k- mismatch, respectively.

Optimization of Post-Processing for Subsequence Matching in Time-Series Databases (시계열 데이터베이스에서 서브시퀀스 매칭을 위한 후처리 과정의 최적화)

  • Kim, Sang-Uk
    • The KIPS Transactions:PartD
    • /
    • v.9D no.4
    • /
    • pp.555-560
    • /
    • 2002
  • Subsequence matching, which consists of index searching and post-processing steps, is an operation that finds those subsequences whose changing patterns are similar to that of a given query sequence from a time-series database. This paper discusses optimization of post-processing for subsequence matching. The common problem occurred in post-processing of previous methods is to compare the candidate subsequence with the query sequence for discarding false alarms whenever each candidate subsequence appears during index searching. This makes a sequence containing candidate subsequences to be accessed multiple times from disk, and also have a candidate subsequence to be compared with the query sequence multiple times. These redundancies cause the performance of subsequence matching to degrade seriously. In this paper, we propose a new optimal method for resolving the problem. The proposed method stores ail the candidate subsequences returned by index searching into a binary search tree, and performs post-processing in a batch fashion after finishing the index searching. By this method, we are able to completely eliminate the redundancies mentioned above. For verifying the performance improvement effect of the proposed method, we perform extensive experiments using a real-life stock data set. The results reveal that the proposed method achieves 55 times to 156 times speedup over the previous methods.

An Efficient Approach for Single-Pass Mining of Web Traversal Sequences (단일 스캔을 통한 웹 방문 패턴의 탐색 기법)

  • Kim, Nak-Min;Jeong, Byeong-Soo;Ahmed, Chowdhury Farhan
    • Journal of KIISE:Databases
    • /
    • v.37 no.5
    • /
    • pp.221-227
    • /
    • 2010
  • Web access sequence mining can discover the frequently accessed web pages pursued by users. Utility-based web access sequence mining handles non-binary occurrences of web pages and extracts more useful knowledge from web logs. However, the existing utility-based web access sequence mining approach considers web access sequences from the very beginning of web logs and therefore it is not suitable for mining data streams where the volume of data is huge and unbounded. At the same time, it cannot find the recent change of knowledge in data streams adaptively. The existing approach has many other limitations such as considering only forward references of web access sequences, suffers in the level-wise candidate generation-and-test methodology, needs several database scans, etc. In this paper, we propose a new approach for high utility web access sequence mining over data streams with a sliding window method. Our approach can not only handle large-scale data but also efficiently discover the recently generated information from data streams. Moreover, it can solve the other limitations of the existing algorithm over data streams. Extensive performance analyses show that our approach is very efficient and outperforms the existing algorithm.

A Study on Web Services for Sequence Similarity search in the Workflow Environment (워크플로우 환경에서의 대규모 서열 유사성 검색 웹 서비스에 관한 연구)

  • Jun, Jin-Young
    • Journal of the Korea Society of Computer and Information
    • /
    • v.13 no.6
    • /
    • pp.41-49
    • /
    • 2008
  • In recent years, a life phenomenon using a workflow management tool in bioinformatics has been actively researched. Workflow management tool is the base which enables researchers to collaborate through the re-use and sharing of service, and a variety of workflow management tools including MyGrid project's Taverna, Kepler and BioWMS have been developed and used as the open source. This workflow management tool can model and automate different services in spatially-distant area in one working space based on the web service technology. Many tools and databases used in the bioinformatics are provided in the web services form and are used in the workflow management tool. In such the situation, the web services development and stable service offering for a sequence similarity search which is basically used in the bioinformatics can be essential in the bioinformatics field. In this paper, the similarity retrieval speed of biology sequence data was improved based on a Linux cluster, and the sequence similarity retrieval could be done for a short time by linking with the workflow management tool through developing it in the web services.

  • PDF