• Title/Summary/Keyword: hmmer

Search Result 5, Processing Time 0.023 seconds

Algorithm for Predicting Functionally Equivalent Proteins from BLAST and HMMER Searches

  • Yu, Dong Su;Lee, Dae-Hee;Kim, Seong Keun;Lee, Choong Hoon;Song, Ju Yeon;Kong, Eun Bae;Kim, Jihyun F.
    • Journal of Microbiology and Biotechnology
    • /
    • v.22 no.8
    • /
    • pp.1054-1058
    • /
    • 2012
  • In order to predict biologically significant attributes such as function from protein sequences, searching against large databases for homologous proteins is a common practice. In particular, BLAST and HMMER are widely used in a variety of biological fields. However, sequence-homologous proteins determined by BLAST and proteins having the same domains predicted by HMMER are not always functionally equivalent, even though their sequences are aligning with high similarity. Thus, accurate assignment of functionally equivalent proteins from aligned sequences remains a challenge in bioinformatics. We have developed the FEP-BH algorithm to predict functionally equivalent proteins from protein-protein pairs identified by BLAST and from protein-domain pairs predicted by HMMER. When examined against domain classes of the Pfam-A seed database, FEP-BH showed 71.53% accuracy, whereas BLAST and HMMER were 57.72% and 36.62%, respectively. We expect that the FEP-BH algorithm will be effective in predicting functionally equivalent proteins from BLAST and HMMER outputs and will also suit biologists who want to search out functionally equivalent proteins from among sequence-homologous proteins.

In silica Prediction of Angiogenesis-related Genes in Human Hepatocellular Carcinoma

  • Kang, Seung-Hui;Park, Jeong-Ae;Hong, Soon-Sun;Kim, Kyu-Won
    • Genomics & Informatics
    • /
    • v.2 no.3
    • /
    • pp.134-141
    • /
    • 2004
  • Hepatocellular carcinoma (HCC) is one of the most common malignancies worldwide and a typical hypervascular tumor. Therefore, it is important to find factors related to angiogenesis in the process of HCC malignancy. In order to find angiogenesis-related factors in HCC, we used combined methods of in silico prediction and an experimental assay. We analyzed 1457 genes extracted from cDNA microarray of HCC patients by text-mining, sequence similarity search and domain analysis. As a result, we predicted that 16 genes were likely to be involved in angiogenesis and then the effects of these genes were confirmed by hypoxia response element(HRE)-luciferase assay. For instant, we classified osteopontin into a potent angiogenic factor and coagulation factor XII into a significant anti­angiogenic factor. Collectively, we suggest that using a combination of in silico prediction and experimental approaches, we can identify HCC-specific angiogenesis­related factors effectively and rapidly.

Metagenome Analysis of Protein Domain Collocation within Cellulase Genes of Goat Rumen Microbes

  • Lim, SooYeon;Seo, Jaehyun;Choi, Hyunbong;Yoon, Duhak;Nam, Jungrye;Kim, Heebal;Cho, Seoae;Chang, Jongsoo
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.26 no.8
    • /
    • pp.1144-1151
    • /
    • 2013
  • In this study, protein domains with cellulase activity in goat rumen microbes were investigated using metagenomic and bioinformatic analyses. After the complete genome of goat rumen microbes was obtained using a shotgun sequencing method, 217,892,109 pair reads were filtered, including only those with 70% identity, 100-bp matches, and thresholds below $E^{-10}$ using METAIDBA. These filtered contigs were assembled and annotated using blastN against the NCBI nucleotide database. As a result, a microbial community structure with 1431 species was analyzed, among which Prevotella ruminicola 23 bacteria and Butyrivibrio proteoclasticus B316 were the dominant groups. In parallel, 201 sequences related with cellulase activities (EC.3.2.1.4) were obtained through blast searches using the enzyme.dat file provided by the NCBI database. After translating the nucleotide sequence into a protein sequence using Interproscan, 28 protein domains with cellulase activity were identified using the HMMER package with threshold E values below $10^{-5}$. Cellulase activity protein domain profiling showed that the major protein domains such as lipase GDSL, cellulase, and Glyco hydro 10 were present in bacterial species with strong cellulase activities. Furthermore, correlation plots clearly displayed the strong positive correlation between some protein domain groups, which was indicative of microbial adaption in the goat rumen based on feeding habits. This is the first metagenomic analysis of cellulase activity protein domains using bioinformatics from the goat rumen.

A Study on Construction of Integrated Prokaryotes Gene Prediction System (통합형 미생물 유전자 예측 시스템의 구축에 관한 연구)

  • Chang Jong-won;Ryoo Yoon-kyu;Ku Ja-hyo;Yoon Young-woo
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.6 no.1
    • /
    • pp.27-32
    • /
    • 2005
  • As a large quantity of Genome sequencing has happened to be done a very much a surprising speed in short period, an automatic genome annotation process has become prerequisite. The most difficult process among with this kind of genome annotation works is to finding out the protein-coding genes within a genome. The main 2 subjects of gene prediction are Eukaryotes and Prokaryotes ; their genes have different structures, therefore, their gene prediction methods will also obviously varies. Until now, it is found that among of the 231 genome sequenced species, 200 have been found to be prokaryotes, therefore, for study of biotechnology studies, through comparative genomics, prokaryotes, rather than eukaryotes could may be more appropriate than eukaryotes. Even more, prokaryotes does not have the gene structure called an intron, so it makes the gene prediction easier. Former prokaryotes gene predictions have been shown to be 80%~ to 90% of accuracy. A recent study is aiming at 100% of gene prediction accuracy. In this paper, especially in the case of the E. coli K-12 and S. typhi genomes, gene prediction accuracy which showed 98.5% and 98.7% was more efficient than previous GLIMMER.

  • PDF

In silico genome wide identification and expression analysis of the WUSCHEL-related homeobox gene family in Medicago sativa

  • Yang, Tianhui;Gao, Ting;Wang, Chuang;Wang, Xiaochun;Chen, Caijin;Tian, Mei;Yang, Weidi
    • Genomics & Informatics
    • /
    • v.20 no.2
    • /
    • pp.19.1-19.15
    • /
    • 2022
  • Alfalfa (Medicago sativa) is an important food and feed crop which rich in mineral sources. The WUSCHEL-related homeobox (WOX) gene family plays important roles in plant development and identification of putative gene families, their structure, and potential functions is a primary step for not only understanding the genetic mechanisms behind various biological process but also for genetic improvement. A variety of computational tools, including MAFFT, HMMER, hidden Markov models, Pfam, SMART, MEGA, ProtTest, BLASTn, and BRAD, among others, were used. We identified 34 MsWOX genes based on a systematic analysis of the alfalfa plant genome spread in eight chromosomes. This is an expansion of the gene family which we attribute to observed chromosomal duplications. Sequence alignment analysis revealed 61 conserved proteins containing a homeodomain. Phylogenetic study sung reveal five evolutionary clades with 15 motif distributions. Gene structure analysis reveals various exon, intron, and untranslated structures which are consistent in genes from similar clades. Functional analysis prediction of promoter regions reveals various transcription binding sites containing key growth, development, and stress-responsive transcription factor families such as MYB, ERF, AP2, and NAC which are spread across the genes. Most of the genes are predicted to be in the nucleus. Also, there are duplication events in some genes which explain the expansion of the family. The present research provides a clue on the potential roles of MsWOX family genes that will be useful for further understanding their functional roles in alfalfa plants.