• Title/Summary/Keyword: sequence databases

Search Result 226, Processing Time 0.028 seconds

Algorithm for Predicting Functionally Equivalent Proteins from BLAST and HMMER Searches

  • Yu, Dong Su;Lee, Dae-Hee;Kim, Seong Keun;Lee, Choong Hoon;Song, Ju Yeon;Kong, Eun Bae;Kim, Jihyun F.
    • Journal of Microbiology and Biotechnology
    • /
    • v.22 no.8
    • /
    • pp.1054-1058
    • /
    • 2012
  • In order to predict biologically significant attributes such as function from protein sequences, searching against large databases for homologous proteins is a common practice. In particular, BLAST and HMMER are widely used in a variety of biological fields. However, sequence-homologous proteins determined by BLAST and proteins having the same domains predicted by HMMER are not always functionally equivalent, even though their sequences are aligning with high similarity. Thus, accurate assignment of functionally equivalent proteins from aligned sequences remains a challenge in bioinformatics. We have developed the FEP-BH algorithm to predict functionally equivalent proteins from protein-protein pairs identified by BLAST and from protein-domain pairs predicted by HMMER. When examined against domain classes of the Pfam-A seed database, FEP-BH showed 71.53% accuracy, whereas BLAST and HMMER were 57.72% and 36.62%, respectively. We expect that the FEP-BH algorithm will be effective in predicting functionally equivalent proteins from BLAST and HMMER outputs and will also suit biologists who want to search out functionally equivalent proteins from among sequence-homologous proteins.

Gene Expression Profiling of Eukaryotic Microalga, Haematococcus pluvialis

  • EOM HYUNSUK;PARK SEUNGHYE;LEE CHOUL-GYUN;JIN EONSEON
    • Journal of Microbiology and Biotechnology
    • /
    • v.15 no.5
    • /
    • pp.1060-1066
    • /
    • 2005
  • Under environmental stress, such as strong irradiance or nitrogen deficiency, unicellular green algae of the genus Haematococcus accumulate secondary carotenoids, i.e. astaxanthin, in the cytosol. The induction and regulation of astaxanthin biosynthesis in microalgae has recently received considerable attention owing to the increasing use of secondary carotenoids as a source of pigmentation for fish aquacultures, and as a potential drug in cancer prevention as a free-radical quencher. Accordingly, this study generated expressed sequence tags (ESTs) from a library constructed from astaxanthin-induced Haematococcus pluvialis. Partial sequences were obtained from the 5' ends of 1,858 individual cDNAs, and then grouped into 1,025 non-overlapping sequences, among which 708 sequences were singletons, while the remainder fell into 317 clusters. Approximately $63\%$ of the EST sequences showed similarity to previously described sequences in public databases. H. pluvialis was found to consist of a relatively high percentage of genes involved in genetic information processing ($15\%$) and metabolism ($11\%$), whereas a relatively low percentage of sequences was involved in the signal transduction ($3\%$), structure ($2\%$), and environmental information process ($3\%$). In addition, a relatively large fraction of H. pluvialis sequences was classified as genes involved in photosynthesis ($9\%$) and cellular process ($9\%$). Based on this EST analysis, the full-length cDNA sequence for superoxide dismutase (SOD) of H. pluvialis was cloned, and the expression of this gene was investigated. The abundance of SOD changed substantially in response to different culture conditions, indicating the possible regulation of this gene in H. pluvialis.

Cloning and Molecular Characterization of ${\beta}$-1,3-Glucan Synthase from Sparassis crispa

  • Yang, Yun Hui;Kang, Hyeon-Woo;Ro, Hyeon-Su
    • Mycobiology
    • /
    • v.42 no.2
    • /
    • pp.167-173
    • /
    • 2014
  • A ${\beta}$-glucan synthase gene was isolated from the genomic DNA of polypore mushroom Sparassis crispa, which reportedly produces unusually high amount of soluble ${\beta}$-1,3-glucan (${\beta}$-glucan). Sequencing and subsequent open reading frame analysis of the isolated gene revealed that the gene (5,502 bp) consisted of 10 exons separated by nine introns. The predicted mRNA encoded a ${\beta}$-glucan synthase protein, consisting of 1,576 amino acid residues. Comparison of the predicted protein sequence with multiple fungal ${\beta}$-glucan synthases estimated that the isolated gene contained a complete N-terminus but was lacking approximately 70 amino acid residues in the C-terminus. Fungal ${\beta}$-glucan synthases are integral membrane proteins, containing the two catalytic and two transmembrane domains. The lacking C-terminal part of S. crispa ${\beta}$-glucan synthase was estimated to include catalytically insignificant transmembrane ${\alpha}$-helices and loops. Sequence analysis of 101 fungal ${\beta}$-glucan synthases, obtained from public databases, revealed that the ${\beta}$-glucan synthases with various fungal origins were categorized into corresponding fungal groups in the classification system. Interestingly, mushrooms belonging to the class Agaricomycetes were found to contain two distinct types (Type I and II) of ${\beta}$-glucan synthases with the type-specific sequence signatures in the loop regions. S. crispa ${\beta}$-glucan synthase in this study belonged to Type II family, meaning Type I ${\beta}$-glucan synthase is expected to be discovered in S. crispa. The high productivity of soluble ${\beta}$-glucan was not explained but detailed biochemical studies on the catalytic loop domain in the S. crispa ${\beta}$-glucan synthase will provide better explanations.

A Direct Approach for Finding Functional Lipolytic Enzymes from the Paenibacillus polymyxa Genome

  • JUNG, YEO-JIN;KIM, HYUNG-KWOUN;KIM, JIHYUN F.;PARK, SEUNG-HWAN;OH, TAE-KWANG;LEE, JUNG-KEE
    • Journal of Microbiology and Biotechnology
    • /
    • v.15 no.1
    • /
    • pp.155-160
    • /
    • 2005
  • Abstract A direct approach was used to retrieve active lipases from Paenibacillus polymyxa genome databases. Twelve putative lipase genes were tested using a typical lipase sequence rule built on the basis of a consensus sequence of a catalytic triad and oxyanion hole. Among them, six genes satisfied the sequence rule and had similarity (about 25%) with known bacterial lipases. To obtain the six lipase proteins, lipase genes were expressed in E. coli cells and lipolytic activities were measured by using tributyrin plate and pnitrophenyl caproate. One of them, contig 160-26, was expressed as a soluble and active form in E. coli cell. After purifying on Ni-NTA column, its detailed biochemical properties were characterized. It had a maximum hydrolytic activity at $30^{\circ}C$ and pH 7- 8, and was stable up to $40^{\circ}C$ and in the range of pH 5- 8. It most rapidly hydrolyzed pNPC$_6$ among various PNPesters. The other contigs were expressed more or less as soluble forms, although no lipolytic activities were detected. As they have many conserved regions with lipase 160-26 as well as other bacterial lipases throughout their equence, they are suggested as true lipase genes.

Estimation of Substring Selectivity in Biological Sequence Database (생물학 서열 데이타베이스에서 부분 문자열의 선적도 추정)

  • 배진욱;이석호
    • Journal of KIISE:Databases
    • /
    • v.30 no.2
    • /
    • pp.168-175
    • /
    • 2003
  • Until now, substring selectivities have been estimated by two steps. First step is to build up a count-suffix tree, which has statistical information about substrings, and second step is to estimate substring selectivity using it. However, it's actually impossible to build up a count-suffix tree from biological sequences because their lengths are too long. So, this paper proposes a novel data structure, count q-gram tree, consisting of fixed length substrings. The Count q-gram tree retains the exact counts of all substrings whose lengths are equal to or less than q and this tree is generated in 0(N) time and in site not subject to total length of all sequences, N. This paper also presents an estimation technique, k-MO. k-MO can choose overlapping length of splitted substrings from a query string, and this choice will affect accuracy of selectivity and query processing time. Experiments show k-MO can estimate very accurately.

VDCluster : A Video Segmentation and Clustering Algorithm for Large Video Sequences (VDCluster : 대용량 비디오 시퀀스를 위한 비디오 세그멘테이션 및 클러스터링 알고리즘)

  • Lee, Seok-Ryong;Lee, Ju-Hong;Kim, Deok-Hwan;Jeong, Jin-Wan
    • Journal of KIISE:Databases
    • /
    • v.29 no.3
    • /
    • pp.168-179
    • /
    • 2002
  • In this paper, we investigate video representation techniques that are the foundational work for the subsequent video processing such as video storage and retrieval. A video data set if a collection of video clips, each of which is a sequence of video frames and is represented by a multidimensional data sequence (MDS). An MDS is partitioned into video segments considering temporal relationship among frames, and then similar segments of the clip are grouped into video clusters. Thus, the video clip is represented by a small number of video clusters. The video segmentation and clustering algorithm, VDCluster, proposed in this paper guarantee clustering quality to south an extent that satisfies predefined conditions. The experiments show that our algorithm performs very effectively with respect to various video data sets.

Mining Approximate Sequential Patterns in a Large Sequence Database (대용량 순차 데이터베이스에서 근사 순차패턴 탐색)

  • Kum Hye-Chung;Chang Joong-Hyuk
    • The KIPS Transactions:PartD
    • /
    • v.13D no.2 s.105
    • /
    • pp.199-206
    • /
    • 2006
  • Sequential pattern mining is an important data mining task with broad applications. However, conventional methods may meet inherent difficulties in mining databases with long sequences and noise. They may generate a huge number of short and trivial patterns but fail to find interesting patterns shared by many sequences. In this paper, to overcome these problems, we propose the theme of approximate sequential pattern mining roughly defined as identifying patterns approximately shared by many sequences. The proposed method works in two steps: one is to cluster target sequences by their similarities and the other is to find consensus patterns that ire similar to the sequences in each cluster directly through multiple alignment. For this purpose, a novel structure called weighted sequence is presented to compress the alignment result, and the longest consensus pattern that represents each cluster is generated from its weighted sequence. Finally, the effectiveness of the proposed method is verified by a set of experiments.

Strategic construction of mRNA vaccine derived from conserved and experimentally validated epitopes of avian influenza type A virus: a reverse vaccinology approach

  • Leana Rich Herrera-Ong
    • Clinical and Experimental Vaccine Research
    • /
    • v.12 no.2
    • /
    • pp.156-171
    • /
    • 2023
  • Purpose: The development of vaccines that confer protection against multiple avian influenza A (AIA) virus strains is necessary to prevent the emergence of highly infectious strains that may result in more severe outbreaks. Thus, this study applied reverse vaccinology approach in strategically constructing messenger RNA (mRNA) vaccine construct against avian influenza A (mVAIA) to induce cross-protection while targeting diverse AIA virulence factors. Materials and Methods: Immunoinformatics tools and databases were utilized to identify conserved experimentally validated AIA epitopes. CD8+ epitopes were docked with dominant chicken major histocompatibility complexes (MHCs) to evaluate complex formation. Conserved epitopes were adjoined in the optimized mVAIA sequence for efficient expression in Gallus gallus. Signal sequence for targeted secretory expression was included. Physicochemical properties, antigenicity, toxicity, and potential cross-reactivity were assessed. The tertiary structure of its protein sequence was modeled and validated in silico to investigate the accessibility of adjoined B-cell epitope. Potential immune responses were also simulated in C-ImmSim. Results: Eighteen experimentally validated epitopes were found conserved (Shannon index <2.0) in the study. These include one B-cell (SLLTEVETPIRNEWGCR) and 17 CD8+ epitopes, adjoined in a single mRNA construct. The CD8+ epitopes docked favorably with MHC peptidebinding groove, which were further supported by the acceptable ∆Gbind (-28.45 to -40.59 kJ/mol) and Kd (<1.00) values. The incorporated Sec/SPI (secretory/signal peptidase I) cleavage site was also recognized with a high probability (0.964814). Adjoined B-cell epitope was found within the disordered and accessible regions of the vaccine. Immune simulation results projected cytokine production, lymphocyte activation, and memory cell generation after the 1st dose of mVAIA. Conclusion: Results suggest that mVAIA possesses stability, safety, and immunogenicity. In vitro and in vivo confirmation in subsequent studies are anticipated.

An Efficient Sequence Matching Method for XML Query Processing (XML 질의 처리를 위한 효율적인 시퀀스 매칭 기법)

  • Seo, Dong-Min;Song, Seok-Il;Yoo, Jae-Soo
    • Journal of KIISE:Databases
    • /
    • v.35 no.4
    • /
    • pp.356-367
    • /
    • 2008
  • As XML is gaining unqualified success in being adopted as a universal data representation and exchange format, particularly in the World Wide Web, the problem of querying XML documents poses interesting challenges to database researcher. Several structural XML query processing methods, including XISS and XR-tree, for past years, have been proposed for fast query processing. However, structural XML query processing has the problem of requiring expensive Join cost for twig path query Recently, sequence matching based XML query processing methods, including ViST and PRIX, have been proposed to solve the problem of structural XML query processing methods. Through sequence matching based XML query processing methods match structured queries against structured data as a whole without breaking down the queries into sub queries of paths or nodes and relying on join operations to combine their results. However, determining the structural relationship of ViST is incorrect because its numbering scheme is not optimized. And PRIX requires many processing time for matching LPS and NPS about XML data trees and queries. Therefore, in this paper, we propose efficient sequence matching method u sing the bottom-up query processing for efficient XML query processing. Also, to verify the superiority of our index structure, we compare our sequence matching method with ViST and PRIX in terms of query processing with linear path or twig path including wild-card('*' and '//').

Conjunctive Boolean Query Optimization based on Join Sequence Separability in Information Retrieval Systems (정보검색시스템에서 조인 시퀀스 분리성 기반 논리곱 불리언 질의 최적화)

  • 박병권;한욱신;황규영
    • Journal of KIISE:Databases
    • /
    • v.31 no.4
    • /
    • pp.395-408
    • /
    • 2004
  • A conjunctive Boolean text query refers to a query that searches for tort documents containing all of the specified keywords, and is the most frequently used query form in information retrieval systems. Typically, the query specifies a long list of keywords for better precision, and in this case, the order of keyword processing has a significant impact on the query speed. Currently known approaches to this ordering are based on heuristics and, therefore, cannot guarantee an optimal ordering. We can use a systematic approach by leveraging a database query processing algorithm like the dynamic programming, but it is not suitable for a text query with a typically long list of keywords because of the algorithm's exponential run-time (Ο(n2$^{n-1}$)) for n keywords. Considering these problems, we propose a new approach based on a property called the join sequence separability. This property states that the optimal join sequence is separable into two subsequences of different join methods under a certain condition on the joined relations, and this property enables us to find a globally optimal join sequence in Ο(n2$^{n-1}$). In this paper we describe the property formally, present an optimization algorithm based on the property, prove that the algorithm finds an optimal join sequence, and validate our approach through simulation using an analytic cost model. Comparison with the heuristic text query optimization approaches shows a maximum of 100 times faster query processing, and comparison with the dynamic programming approach shows exponentially faster query optimization (e.g., 600 times for a 10-keyword query).