• Title/Summary/Keyword: sequence databases

Search Result 226, Processing Time 0.026 seconds

Expressed Sequence Tag Analysis of Antarctic Hairgrass Deschampsia antarctica from King George Island, Antarctica

  • Lee, Hyoungseok;Cho, Hyun Hee;Kim, Il-Chan;Yim, Joung Han;Lee, Hong Kum;Lee, Yoo Kyung
    • Molecules and Cells
    • /
    • v.25 no.2
    • /
    • pp.258-264
    • /
    • 2008
  • Deschampsia antarctica is the only monocot that thrives in the tough conditions of the Antarctic region. It is an invaluable resource for the identification of genes associated with tolerance to various environmental pressures. In order to identify genes that are differentially regulated between greenhouse-grown and Antarctic field-grown plants, we initiated a detailed gene expression analysis. Antarctic plants were collected and greenhouse plants served as controls. Two different cDNA libraries were constructed with these plants. A total of 2,112 cDNA clones was sequenced and grouped into 1,199 unigene clusters consisting of 243 consensus and 956 singleton sequences. Using similarity searches against several public databases, we constructed a functional classification of the ESTs into categories such as genes related to responses to stimuli, as well as photosynthesis and metabolism. Real-time PCR analysis of various stress responsive genes revealed different patterns of regulation in the different environments, suggesting that these genes are involved in responses to specific environmental factors.

A Subsequence Matching Technique that Supports Time Warping Efficiently (타임 워핑을 지원하는 효율적인 서브시퀀스 매칭 기법)

  • Park, Sang-Hyun;Kim, Sang-Wook;Cho, June-Suh;Lee, Hoen-Gil
    • Journal of Industrial Technology
    • /
    • v.21 no.A
    • /
    • pp.167-179
    • /
    • 2001
  • This paper discusses an index-based subsequence matching that supports time warping in large sequence databases. Time warping enables finding sequences with similar patterns even when they are of different lengths. In earlier work, we suggested an efficient method for whole matching under time warping. This method constructs a multidimensional index on a set of feature vectors, which are invariant to time warping, from data sequences. For filtering at feature space, it also applies a lower-bound function, which consistently underestimates the time warping distance as well as satisfies the triangular inequality. In this paper, we incorporate the prefix-querying approach based on sliding windows into the earlier approach. For indexing, we extract a feature vector from every subsequence inside a sliding window and construct a multi-dimensional index using a feature vector as indexing attributes. For query precessing, we perform a series of index searches using the feature vectors of qualifying query prefixes. Our approach provides effective and scalable subsequence matching even with a large volume of a database. We also prove that our approach does not incur false dismissal. To verily the superiority of our method, we perform extensive experiments. The results reseal that our method achieves significant speedup with real-world S&P 500 stock data and with very large synthetic data.

  • PDF

Functional Diversity of Cysteine Residues in Proteins and Unique Features of Catalytic Redox-active Cysteines in Thiol Oxidoreductases

  • Fomenko, Dmitri E.;Marino, Stefano M.;Gladyshev, Vadim N.
    • Molecules and Cells
    • /
    • v.26 no.3
    • /
    • pp.228-235
    • /
    • 2008
  • Thiol-dependent redox systems are involved in regulation of diverse biological processes, such as response to stress, signal transduction, and protein folding. The thiol-based redox control is provided by mechanistically similar, but structurally distinct families of enzymes known as thiol oxidoreductases. Many such enzymes have been characterized, but identities and functions of the entire sets of thiol oxidoreductases in organisms are not known. Extreme sequence and structural divergence makes identification of these proteins difficult. Thiol oxidoreductases contain a redox-active cysteine residue, or its functional analog selenocysteine, in their active sites. Here, we describe computational methods for in silico prediction of thiol oxidoreductases in nucleotide and protein sequence databases and identification of their redox-active cysteines. We discuss different functional categories of cysteine residues, describe methods for discrimination between catalytic and noncatalytic and between redox and non-redox cysteine residues and highlight unique properties of the redox-active cysteines based on evolutionary conservation, secondary and three-dimensional structures, and sporadic replacement of cysteines with catalytically superior selenocysteine residues.

Expression Analysis of ESTs Derived from the Leaf of Chunpoong (Panax ginseng C,A. Meyer)

  • In, Jun-Gyo;Lee, Bum-Soo;Yang, Deok-Chun
    • Proceedings of the Plant Resources Society of Korea Conference
    • /
    • 2003.04a
    • /
    • pp.122-122
    • /
    • 2003
  • Expressed sequence tags (EST) are help to quickly identify functions of expressed genes and to understand the complexity of gene expression. In order to analyze gene expression of the leaf development in Panax ginseng, which is one of the most important medicinal plant, expressed sequence tags (EST) analysis was carried out. We constructed a cDNA library using the immature leaf of Chunpoong. Partial sequences were obtained from 3,170 clones. The ESTs could be clustered into 1,624 (56.1%) non-redundant groups. Similarity search of the non-redundant ESTs against public non-redundant databases of both protein and DNA indicated that 1,137 groups show similarity to genes of known function. These ESTs clones were divided into sixteen categories depending upon gene function. Most abundant transcripts in immature ginseng leaf were photosynthesis related protein, such as chlorophyll a/b binding protein LHCII type I (128), chlorophyll a/b binding protein (53), ribulose-1,5-bisphosphate carboxylase (41), and photosystem I psaH (26). The EST data from immature leaf generated in this study is useful in dissecting gene expression in leaf organ of ginseng.

  • PDF

Analysis of Expressed Sequence Tags from the Embryogenic Callus of Korean Ginseng (Panax ginseng C.A. Meyer)

  • In, Jun-Gyo;Lee, Bum-Soo;Park, Yong-Eui;Yang, Deok-Chun
    • Proceedings of the Plant Resources Society of Korea Conference
    • /
    • 2003.04a
    • /
    • pp.123-123
    • /
    • 2003
  • In order to study gene expression transcribted during the embryo development, we constructed a cDNA library of embryogenic callus induced from cotylendon of Korean ginseng and generated expressed sequence tags (ESTs) of 3,359 clones randomly selected. The ESTs could be clustered into 1,910 (59.1%) non-redundant groups. Similarity search of the non-redundant ESTs against public non-redundant databases of both protein and DNA indicated that 2,217 groups show similarity to genes of known function. These ESTs clones were divided into eighteen categories depending upon gene function. Most abundant transcripts were ribosomal protein small subunit 28kDa(40), tumor-related protein(35), metallothionein (31), small heat-shock protein class 18.6K(24), and cyclophilin(20). There are no useful informations of gene expression during the embryo development in Korean ginseng. These results could help to understand the embryo development in Korean ginseng.

  • PDF

Expression Analysis of ESTs Derived from the Four-Year Root of Chunpoong (Panax ginseng C.A. Meyer)

  • Yang, Deok-Chun;In, Jun-Gyo;Lee, Bum-Soo
    • Proceedings of the Plant Resources Society of Korea Conference
    • /
    • 2003.04a
    • /
    • pp.121-121
    • /
    • 2003
  • Expressed sequence tags (EST) are help to quickly identify functions of expressed genes and to understand the complexity of gene expression. To assist genetic study of the root development in Panax ginseng, which is one of the most important medicinal plant, expressed sequence tags (EST) analysis was carried out. We constructed a CDNA library using the 4-year Chunpoon root. Partial sequences were obtained from 3,841 clone. The ESTs could be clustered into 2,056 (64%) non-redundant groups. Similarity search of the non-redundant ESTs against public non-redundant databases of both protein and DNA indicated that 1,498 groups show similarity to genes of known function. These ESTs clones were divided into eighteen categories depending upon gene function. The most abundant transcripts were major latex protein (41), ribonuclease 2 (36), metallothionein 2(35). Our extensive EST analysis of genes expressed in 4-year Chunpoong root not only contributes to the understanding of the dynamics of genome expression patterns in root organ development but also adds data to the repertoire of all genomic genes.

  • PDF

Functional Annotation and Analysis of Korean Patented Biological Sequences Using Bioinformatics

  • Lee, Byung Wook;Kim, Tae Hyung;Kim, Seon Kyu;Kim, Sang Soo;Ryu, Gee Chan;Bhak, Jong
    • Molecules and Cells
    • /
    • v.21 no.2
    • /
    • pp.269-275
    • /
    • 2006
  • A recent report of the Korean Intellectual Property Office(KIPO) showed that the number of biological sequence-based patents is rapidly increasing in Korea. We present biological features of Korean patented sequences though bioinformatic analysis. The analysis is divided into two steps. The first is an annotation step in which the patented sequences were annotated with the Reference Sequence (RefSeq) database. The second is an association step in which the patented sequences were linked to genes, diseases, pathway, and biological functions. We used Entrez Gene, Online Mendelian Inheritance in Man (OMIM), Kyoto Encyclopedia of Genes and Genomes (KEGG), and Gene Ontology (GO) databases. Through the association analysis, we found that nearly 2.6% of human genes were associated with Korean patenting, compared to 20% of human genes in the U.S. patent. The association between the biological functions and the patented sequences indicated that genes whose products act as hormones on defense responses in the extra-cellular environments were the most highly targeted for patenting. The analysis data are available at http://www.patome.net

A Visualization Tool for Ranked Subsequence Matching in Time-Series Databases (시계열 데이터베이스에서 순위를 지원하는 서브시퀀스 매칭 방법을 위한 시각화 툴)

  • Lee, Sung-Jin;Lee, Jinsoo;Cho, Hune;Han, Wook-Shin
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2009.11a
    • /
    • pp.787-788
    • /
    • 2009
  • 시계열 데이터(time-series data)는 연속적인 데이터를 고정된 시간 간격으로 샘플링한 실수 값들의 연속을 의미한다. 시계열 데이터의 예로는, 음악 및 동영상 데이터, 심전도 데이터, 주식 그래프 등의 데이터가 있다. 시계열 데이터는 다시 데이터베이스에 저장 되어있는 데이터 시퀀스(data sequence)와, 사용자에 의해 주어지는 질의 시퀀스(query sequence)로 분류된다. 시계열 데이터베이스(time-series database)에서 순위를 지원하는 서브시퀀스 매칭 방법(ranked subsequence matching)은 데이터 시퀀스와 질의 시퀀스가 주어졌을 때, 질의 시퀀스의 길이와 같은 데이터 시퀀스의 서브시퀀스(subsequence)들 중에서 질의 시퀀스와 가장 유사한 상위 k개의 서브시퀀스들을 찾는 것이다. 본 논문의 목적은 사용자가 매칭 방법에 대한 인식과 이해가 부족하더라도 기존의 콘솔 기반의 매칭 프로그램을 보다 쉽게 사용할 수 있도록 이용성을 향상시키기 위하여 시각화 툴을 개발하는 것이다. 구체적으로, 5가지 시각화(visualization) 기능을 제공하는 사용자 인터페이스를 구현하였다. 구현된 사용자 인터페이스를 통해 사용자가 기존의 매칭 프로그램을 보다 쉽고 간편하게 사용할 수 있도록 기여한다.

NBR-Safe Transform: Lower-Dimensional Transformation of High-Dimensional MBRs in Similar Sequence Matching (MBR-Safe 변환 : 유사 시퀀스 매칭에서 고차원 MBR의 저차원 변환)

  • Moon, Yang-Sae
    • Journal of KIISE:Databases
    • /
    • v.33 no.7
    • /
    • pp.693-707
    • /
    • 2006
  • To improve performance using a multidimensional index in similar sequence matching, we transform a high-dimensional sequence to a low-dimensional sequence, and then construct a low-dimensional MBR that contains multiple transformed sequences. In this paper we propose a formal method that transforms a high-dimensional MBR itself to a low-dimensional MBR, and show that this method significantly reduces the number of lower-dimensional transformations. To achieve this goal, we first formally define the new notion of MBR-safe. We say that a transform is MBR-safe if a low-dimensional MBR to which a high-dimensional MBR is transformed by the transform contains every individual low-dimensional sequence to which a high-dimensional sequence is transformed. We then propose two MBR-safe transforms based on DFT and DCT, the most representative lower-dimensional transformations. For this, we prove the traditional DFT and DCT are not MBR-safe, and define new transforms, called mbrDFT and mbrDCT, by extending DFT and DCT, respectively. We also formally prove these mbrDFT and mbrDCT are MBR-safe. Moreover, we show that mbrDFT(or mbrDCT) is optimal among the DFT-based(or DCT-based) MBR-safe transforms that directly convert a high-dimensional MBR itself into a low-dimensional MBR. Analytical and experimental results show that the proposed mbrDFT and mbrDCT reduce the number of lower-dimensional transformations drastically, and improve performance significantly compared with the $na\"{\i}ve$ transforms. These results indicate that our MBR- safe transforms provides a useful framework for a variety of applications that require the lower-dimensional transformation of high-dimensional MBRs.

Analysis of Partial cDNA Sequence from Human Fetal Liver

  • Kim, Jae-Wha;Song, Jae-Chan;Lee, In-Ae;Lee, Young-Hee;Nam, Myoung-Soo;Hahn, Yoon-Soo;Chung, Jae-Hoon;Choe, In-Seong
    • BMB Reports
    • /
    • v.28 no.5
    • /
    • pp.402-407
    • /
    • 1995
  • Single-run Partial cDNA sequencing was conducted on 1,592 randomly selected human fetal liver cDNA clones of Korean origin to isolate novel genes related to liver functions. Each partial cDNA sequence determined was analyzed by comparing it with the databases. GenBank, Protein Information Resource (PIR) and SWISS-PROT Protein Sequence Data Bank. From a set of 1.592 cDNA clones reported here, 1,433 (90.0% of the total) were informative cDNA sequences. The other 159 clones were identified as DNA sequences which had originated from the cloning vector. Among 1,433 informative partial cDNA sequences, 851 (59.3%) clones were revealed to be identical to known human genes. These known genes have been classified into 225 different kinds of genes. In addition, 340 clones (23.7%) showed various degrees of homology to previously known human genes. Ninety four (6.6%) clones contained various repeated sequences. Twenty four (1.7%) partial cDNA sequences were found to have considerable homology to known genes from evolutionarily distant organism such as yeast, rice, Arabidopsis, mouse and rat, based on database matches, whereas 124 (8.7%) had no Significant matches. Human homologues to functionally characterized genes from different organisms could be classified as candidates for novel human genes of similar functions. Information from the partial cDNA sequences in this study may facilitate the analysis of genes expressed in human fetal liver.

  • PDF