Search | Korea Science

Computational Detection of Prokaryotic Core Promoters in Genomic Sequences

Kim Ki-Bong;Sim Jeong Seop
- Journal of Microbiology
- /
- v.43 no.5
- /
- pp.411-416
- /
- 2005
The high-throughput sequencing of microbial genomes has resulted in the relatively rapid accumulation of an enormous amount of genomic sequence data. In this context, the problem posed by the detection of promoters in genomic DNA sequences via computational methods has attracted considerable research attention in recent years. This paper addresses the development of a predictive model, known as the dependence decomposition weight matrix model (DDWMM), which was designed to detect the core promoter region, including the -10 region and the transcription start sites (TSSs), in prokaryotic genomic DNA sequences. This is an issue of some importance with regard to genome annotation efforts. Our predictive model captures the most significant dependencies between positions (allowing for nonadjacent as well as adjacent dependencies) via the maximal dependence decomposition (MDD) procedure, which iteratively decomposes data sets into subsets, based on the significant dependence between positions in the promoter region to be modeled. Such dependencies may be intimately related to biological and structural concerns, since promoter elements are present in a variety of combinations, which are separated by various distances. In this respect, the DDWMM may prove to be appropriate with regard to the detection of core promoter regions and TSSs in long microbial genomic contigs. In order to demonstrate the effectiveness of our predictive model, we applied 10-fold cross-validation experiments on the 607 experimentally-verified promoter sequences, which evidenced good performance in terms of sensitivity.
PDF KSCI

Computational Analysis of Neighboring Genes on Arabidopsis thaliana Chromosomes 4 and 5: Their Genomic Association as Functional Subunits

Goh, Sung-Ho;Kim, Tae-Hyung;Kim, Jee-Hyub;Nam, DouGu;Choi, Doil;Hur, Cheol-Goo
- Genomics & Informatics
- /
- v.1 no.1
- /
- pp.40-49
- /
- 2003
The genes related to specific events or pathways in bacteria are frequently localized proximate to the genome of their neighbors, as with the structures known as operon, but eukaryotic genes seem to be independent of their neighbors, and are dispersed randomly throughout genomes. Although cases are rare, the findings from structures similar to prokaryotic operons in the nematode genome, and the clustering of housekeeping genes on human genome, lead us to assess the genomic association of genes as functional subunits. We evaluated the genomic association of neighboring genes on chromosomes 4 and 5 of Arabidopsis thaliana with and without respectively consideration of the scaffold/matrixattached regions (S/MAR) loci. The observed number of functionally identical bigrams and trig rams were significantly higher than expected, and these results were verified statistically by calculating p-values for weighted random distributions. The observed frequency of functionally identical big rams and trig rams were much higher in chromosome 4 than in chromosome 5, but the frequencies with, and without, consideration of the S/MAR in each chromosome were similar. In this study, a genomic association among functionally related neighboring genes in Arabidopsis thaliana was suggested.
PDF KSCI

Prediction of Core Promoter Region with Dependency - Reflecting Decomposition Model (의존성 반영 분해모델에 의한 유전자의 핵심 프로모터 영역 예측)

김기봉;박기정;공은배
- Journal of KIISE:Software and Applications
- /
- v.30 no.3_4
- /
- pp.379-387
- /
- 2003
A lot of microbial genome projects have been completed to pour the enormous amount of genomic sequence data. In this context. the problem of identifying promoters in genomic DNA sequences by computational methods has attracted considerable research attention in recent years. In this paper, we propose a new model of prokaryotic core promoter region including the -10 region and transcription initiation site, that is Dependency-Reflecting Decomposition Model (DRDM), which captures the most significant biological dependencies between positions (allowing for non-adjacent as well as adjacent dependencies). DRDM showed a good result of performance test and it will be employed effectively in predicting promoters in long microbial genomic Contigs.
PDF KSCI

Introduction to Gene Prediction Using HMM Algorithm

Kim, Keon-Kyun;Park, Eun-Sik
- Journal of the Korean Data and Information Science Society
- /
- v.18 no.2
- /
- pp.489-506
- /
- 2007
Gene structure prediction, which is to predict protein coding regions in a given nucleotide sequence, is the most important process in annotating genes and greatly affects gene analysis and genome annotation. As eukaryotic genes have more complicated structures in DNA sequences than those of prokaryotic genes, analysis programs for eukaryotic gene structure prediction have more diverse and more complicated computational models. There are Ab Initio method, Similarity-based method, and Ensemble method for gene prediction method for eukaryotic genes. Each Method use various algorithms. This paper introduce how to predict genes using HMM(Hidden Markov Model) algorithm and present the process of gene prediction with well-known gene prediction programs.
PDF

Conserved Genes and Metabolic Pathways in Prokaryotes of the Same Genus (동일한 속 원핵생물들의 보존 유전자와 대사경로)

Lee, Dong-Geun;Lee, Sang-Hyeon
- Journal of Life Science
- /
- v.29 no.1
- /
- pp.123-128
- /
- 2019
The use of 16S rDNA is commonplace in the determination of prokaryotic species. However, it has limitations, and there are few studies at the genus level. We investigated conserved genes and metabolic pathways at the genus level in 28 strains of 13 genera of prokaryotes using the COG database (conserved genes) and MetaCyc database (metabolic pathways). Conserved genes compared to total genes (core genome) at the genus level ranged from 27.62%(Nostoc genus) to 71.76%(Spiribacter genus), with an average of 46.72%. The lower ratio of core genome meant the higher ratio of peculiar genes of a prokaryote, namely specific biological activities or the habitat may be varied. The ratio of common metabolic pathways at the genus level was higher than the ratio of core genomes, from 58.79% (Clostridium genus) to 96.31%(Mycoplasma genus), with an average of 75.86%. When compared among other genera, members of the same genus were positioned in the closest nodes to each other. Interestingly, Bacillus and Clostridium genera were positioned in closer nodes than those of the other genera. Archaebacterial genera were grouped together in the ortholog and metabolic pathway nodes in a phylogenetic tree. The genera Granulicella, Nostoc, and Bradyrhizobium of the Acidobacteria, Cyanobacteria, and Proteobacteria phyla, respectively, were grouped in an ortholog content tree. The results of this study can be used for (i) the identification of common genes and metabolic pathways at each phylogenetic level and (ii) the improvement of strains through horizontal gene transfer or site-directed mutagenesis.
https://doi.org/10.5352/JLS.2019.29.1.123 인용 PDF KSCI HTML

Synonymy of Micropolyspora internatus and Saccharomonospora viridis and Emended Description of Saccharomonospora viridis

Shin, Su-Kyoung;Cho, Yong-Joon;Yi, Hana
- Journal of Microbiology and Biotechnology
- /
- v.27 no.1
- /
- pp.149-154
- /
- 2017
Transfer of Micropolyspora internatus into the genus Saccharomonospora as "Saccharomonospora internatus comb. nov." was proposed by Kurup and Greiner-Mai, but the nomenclatural change has not been validly published. Although the inclusion of M. internatus in the genus Saccharomonospora has not being established, the synonymy of "Saccharomonospora internatus" and S. viridis was proposed by Greiner-Mai. A number of recent publications regarded M. internatus as a synonym of S. viridis, but the name M. internatus is still used in some cases instead of S. viridis. This is because of the complicated history of M. internatus and S. viridis, but it is different from the generally accepted view of prokaryotic taxonomy. To clearly verify the synonymy of M. internatus and S. viridis, a literature review and experimental verification were conducted in this study. Based on the genomic and phenotypic characteristics obtained in this study, the synonymy of the two species was obvious. The emended description of S. viridis is given.
https://doi.org/10.4014/jmb.1610.10033 인용 PDF KSCI

Developing a Bioinformatics Tool for Peptide Nucleic Acid (PNA) antisense Technique Utilizing Parallel Computing System (Peptide Nucleic Acid(PNA)를 이용한 antisense 기법에 적용할 병렬 컴퓨팅용 Bioinformatics tool 개발)

Kim Seong-Jo;Jeon Ho-Sang;Hong Seung-Pyo;Kim Hyon-Chang;Kim Han-Jip;Min Churl-K
- Proceedings of the Korean Information Science Society Conference
- /
- 2006.06a
- /
- pp.43-45
- /
- 2006
Unlike RNA interference, whose usage is limited to eukaryotic cells, Peptide Nucleic Acid (PNA) technique is applicable to both eukaryotic and prokaryotic cells. PNA has been proven to be an effective agent for blocking gene expressions and has several advantages over other antisense techniques. Here we developed a parallel computing software that provides the ideal sequences to design PNA oligos to prevent any off-target effects. We applied a new approach in our location-finding algorithm that finds a target gene from the whole genome sequence. Message Passing Interface (MPI) was used to perform parallel computing in order to reduce the calculation time. The software will help biologists design more accurate and effective antisense PNA by minimizing the chance of off-target effects.
PDF

Sequencing of the RSDA Gene Encoding Raw Starch-Digesting $\alpha$-Amylase of Bacillus circulans F-2: Identification of Possible Two Domains for Raw Substrate-Adsorption and Substrate-Hydrolysis

Kim, Cheorl-Ho
- Journal of Microbiology and Biotechnology
- /
- v.2 no.1
- /
- pp.56-65
- /
- 1992
The complete nucleotide sequence of the Bacillus circulans F-2 RSDA gene, coding for raw starch digesting a-amylase (RSDA), has been determined. The RSDA structure gene consists of an open reading frame of 2508 bp. Six bp upstream of the translational start codon of the RSDA is a typical gram-positive Shine-Dalgarno sequence and the RSDA encodes a preprotein of 836 amino acids with an Mr of 96, 727. The gene was expressed from its own regulatory region in E. coli and two putative consensus promoter sequences were identified upstream of a ribosome binding site and an ATG start codon. Confirmation of the nucleotide sequence was obtained and the signal peptide cleavage site was identified by comparing the predicted amino acid sequence with that derived by N-terminal analysis of the purified RSDA. The deduced N-terminal region of the RSDA conforms to the general pattern for the signal peptides of secreted prokaryotic proteins. The complete amino acid sequence was deduced and homology with other enzymes was compared. The results suggested that the Thr-Ser-rich hinge region and the non-catalytic domain are necessary for efficient adsorption onto raw substrates, and the catalytic domain (60 kDa) is necessary for the hydrolysis of substrates, as suggested in previous studies (8, 9).
PDF

Phylogenetic Analysis of 680 Prokaryotes by Gene Content (유전자 보유 계통수를 이용한 원핵생물 680종의 분석)

Lee, Dong-Geun;Lee, Sang-Hyeon
- Journal of Life Science
- /
- v.26 no.6
- /
- pp.711-720
- /
- 2016
To determine the degree of common genes and the phylogenetic relationships among genome-sequenced 680 prokaryotes, the similarities among 4,631 clusters of orthologous groups of protein (COGs)’ presence/ absence and gene content trees were analyzed. The number of COGs was in the range of 103–2,199 (mean 1377.1) among 680 prokaryotes. Candidatus Nasuia deltocephalinicola str. NAS-ALF, an obligate symbiont with insects, showed the minimum COG, while Pseudomonas aeruginosa PAO1, an opportunistic pathogen, represented the maximum COG. The similarities between two prokaryotes were 49.30–99.78 % (mean 72.65%). Methanocaldococcus jannaschii DSM 2661 (hyperthermophilic and autotrophic, Euryarchaeota phylum) and Mesorhizobium loti MAFF303099 (mesophilic and symbiotic, alpha-Proteobacteria class) had the minimum amount of similarities. As gene content may represent the potential for an organism to adapt to each habitat, this may represent the history of prokaryotic evolution or the range of prokaryotic habitats at present on earth. COG content trees represented the following. First, two members of Chloroflexi phylum (Dehalogenimonas lykanthroporepellens BL-DC-9 and Dehalococcoides mccartyi 195) showed a greater relationship with Archaea than other Eubacteria. Second, members of the same phylum or class in the 16S rRNA gene were separated in the COG content tree. Finally, delta- and epsilon-Proteobacteria were in different lineages with other Proteobacteria classes in neighbor-joining (NJ) and maximum likelihood (ML) trees. The results of this study would be valuable to identifying the origins of organisms, functional relationships, and useful genes.
https://doi.org/10.5352/JLS.2016.26.6.711 인용 PDF KSCI KPUBS HTML

Investigation of COGs (Clusters of Orthologous Groups of proteins) in 1,309 Species of Prokaryotes (원핵생물 1,309종에 분포된 COGs (Clusters of Orthologous Groups of proteins) 연구)

Lee, Dong-Geun;Lee, Sang-Hyeon
- Journal of Life Science
- /
- v.31 no.9
- /
- pp.834-839
- /
- 2021
Authors previously reported the results of analyses of COGs (Clusters of Orthologous Groups of proteins) in 711 prokaryotes. The data of COGs were significantly updated for 2020 using 1,309 prokaryotic genomes. Here, we report the results of analyses of 3,455,853 proteins comprising 4,877 updated COGs in terms of COGs and prokaryotes. The numbers of COGs in each prokaryote ranged from 97 to 2,281, with an average of 1,430.0 and a standard deviation of 414.2. Mean numbers of COGs at the phylum level were minimal 497.86 for Mollicutes and maximal 1,642.90 for Cyanobacteria. The top 10 species with the highest COG retention numbers were all Proteobacteria, and 9 out of the bottom 10 were those that could not be cultured in vitro. The numbers of proteins belonging to each COG ranged from 2 to 22,048, with over 12,000 proteins up to the top 11. Five of the top 11 were COGs that bind to DNA and were involved in the gene expression, indicating the importance of regulating gene expression in prokaryotes in a changing environment. COG data are expected to be widely utilized as they can be used for the identification of genes included in the genome and the selection of genes for the strain improvement.
https://doi.org/10.5352/JLS.2021.31.9.834 인용 PDF KSCI HTML

Search Result 29, Processing Time 0.025 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)