Analyzing Exon Structure with PCA and ICA of Short-Time Fourier Transform

  • Hwang Changha (Dept. of Statistical Information, Catholic University of Daegu) ;
  • Sohn Insuk (Dept. of Statistics, Korea University)
  • Published : 2004.11.01

Abstract

We use principal component analysis (PCA) to identify exons of a gene and further analyze their internal structures. The PCA is conducted on the short-time Fourier transform (STFT) based on the 64 codon sequences and the 4 nucleotide sequences. By comparing to independent component analysis (ICA), we can differentiate between the exon and intron regions, and how they are correlated in terms of the square magnitudes of STFTs. The experiment is done on the gene F56F11.4 in the chromosome III of C. elegans. For this data, the nucleotide based PCA identifies the exon and intron regions clearly. The codon based PCA reveals a weak internal structure in some exon regions, but not the others. The result of ICA shows that the nucleotides thymine (T) and guanine (G) have almost all the information of the exon and intron regions for this data. We hypothesize the existence of complex exon structures that deserve more detailed analysis.

Keywords