Correlation between Expression Level of Gene and Codon Usage

  • Hwang, Da-Jung (Department of Computer Science and Engineering, Pohang University of Science and Technology) ;
  • Han, Joon-Hee (Department of Computer Science and Engineering, Pohang University of Science and Technology) ;
  • Raghava, G P S (Department of Computer Science and Engineering, Pohang University of Science and Technology, Bioinformatics Centre, Institute of Microbial Technology)
  • Published : 2004.11.04

Abstract

In this study, we analyzed the gene expression data of Saccharomyces cerevisiae obtained from Holstege et al. 1998 to understand the relationship between expression level and nucleotide sequence of a gene. First, the correlation between gene expression and percent composition of each type of nucleotide was computed. It was observed that nucleotide 'G' and 'C' show positive correlation (r ${\geq}$ 0.15), 'A' shows negative correlation (r ${\approx}$ -0.21) and 'T' shows no correlation (r ${\approx}$ 0.00) with gene expression. It was also found that 'G+C' rich genes express more in comparison to 'A+T' rich genes. We observed the inverse correlation between composition of a nucleotide at genome level and level of gene expression. Then we computed the correlation between dinucleotides (e.g. AA, AT, GC) composition and gene expression and observed a wide variation in correlation (from r = -0.45 for AT to r = 0.35 for GT). The dinucleotides which contain 'T' have wide range of correlation with gene expression. For example, GT and CT have high positive correlation and AT have high negative correlation. We also computed the correlation between trinucleotides (or codon) composition and gene expression and again observed wide range of correlation (from r = -0.45 for ATA r = 0.45 for GGT). However, the major codons of a large number of amino acids show positive correlation with expression level, but there are a few amino acids whose major codons show negative correlation with expression level. These observations clearly indic ate the relationship between nucleotides composition and expression level. We also demonstrate that codon composition can be used to predict the expression of gene in a given condition. Software has been developed for calculating correlation between expression of gene and codon usage.

Keywords