Gene Expression Pattern Analysis via Latent Variable Models Coupled with Topographic Clustering

  • Chang, Jeong-Ho (Biointelligence Laboratory, School of Computer Science and Engineering, Seoul National University) ;
  • Chi, Sung Wook (Biointelligence Laboratory, School of Computer Science and Engineering, Seoul National University) ;
  • Zhang, Byoung Tak (Biointelligence Laboratory, School of Computer Science and Engineering, Seoul National University)
  • Published : 2003.09.01


We present a latent variable model-based approach to the analysis of gene expression patterns, coupled with topographic clustering. Aspect model, a latent variable model for dyadic data, is applied to extract latent patterns underlying complex variations of gene expression levels. Then a topographic clustering is performed to find coherent groups of genes, based on the extracted latent patterns as well as individual gene expression behaviors. Applied to cell cycle­regulated genes of the yeast Saccharomyces cerevisiae, the proposed method could discover biologically meaningful patterns related with characteristic expression behavior in particular cell cycle phases. In addition, the display of the variation in the composition of these latent patterns on the cluster map provided more facilitated interpretation of the resulting cluster structure. From this, we argue that latent variable models, coupled with topographic clustering, are a promising tool for explorative analysis of gene expression data.



  1. Baldi, P. and Hatfield, G.W. (2002). DNA Microarrays and Gene Expression: From Experiments to Data Analysis and Modelings. (Cambrige, UK, Cambridge University Press)
  2. Bishop, C.M. (1999). Latent variable models. In Learning in Graphical Models, Jordan, M.I.,ed. (Cambrige; The MIT Press), pp.371-403
  3. Cho, R.J. et al. (1998). A genome-wide transcriptional analysis of the mitotic cell cycle. Mol. Cell 2, 65-73
  4. Cristianini, N. and Shawe-Taylor, J. (2000). An Introduction to Support Vector Machines and Other Kernel-based Learning Methods (Cambridge; Cambridge University Press)
  5. Dempster, A.P., Laird, N.M., and Rubin, D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B 39, 1-38
  6. Eisen, M.B., Spellman, P.T., Brown, P.O., and Botstein, D. (1998). Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95, 14863-14868
  7. Fowlkes, C., Shan, Q., Belongie, S., and Malik, J. (2002). Extracting global structure from gene expression profiles. In edso, Methods of Microarray Data Analysis II: Papers from CAMDA 01. Lin, S.M. and Johnson, K.F., (Norwell, MA: Kluwer Academic Publishers), pp. 81-90
  8. Graepel, T., Burger, M., and Obermayer, K. (1998). Self-organizing maps: Generalizations and new optimization techniques. Neurocomputing 21, 173-190
  9. Herwig, R., Poutska, A. J., Muller, C., Bull, C., Lehrach, H., and O' Brien, J. (1999). Large-scale clustering of cDNA-fingerprinting data. Genome Research 9, 1093-1105
  10. Hofmann, T. (2000). Learning the similarity of documents: an information-geometric approach to document retrieval and categorization. In Advances in Neural Information Processing Systems 12, 914-920
  11. Hofmann, T. (2001). Unsupervised learning by probabilistic latent semantic analysis. Machine Learning 42, 177-196
  12. Jaakkola, T. and Haussler, D. (1999). Exploiting generative models in discriminative classifiers. In Advances in Neural Information Processing Systems 11, 487-493
  13. Kohonen. T. (1997). Self-Organizing Maps (New York: Springer-Verlag)
  14. Rose, K., Gurewitz, E., and Fox, G. (1990). A deterministic annealing approach to clustering. Pattern Recognition Letters 11, 589-594
  15. Scherf, U. et al. (2000). A gene expression database for the molecular pharmacology of cancer. Nature Genetics 24, 236-244
  16. Scholkopf, B. and Sm Nola, A.J. (2001). Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. (Cambridge, MA: MIT Press)
  17. Shamir, R. and Sharan, R. (2002). Algorithmic approaches to clustering gene expression data. In Jiang, T., Smith, T., Xu, Y., and Zhang, M., edso, Current Topics in Computational Biology, Jiang, T., Smith, T., Xu, Y., and Ahang, M., (Cambridge, MA: MIT Press), pp 269-299
  18. Spellman, P.T. et al. (1998). Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol. BioI. Cell 9, 3273-3297
  19. Tamayo, P., Slonim, D., Mesirov, J., Zhu, Q., Kitareewan, S. Dmitrovsky, E., Lander, E. S., and Golub, T. R. (1999). Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc. Natl. Acad. Sci. USA 96, 2907-2912
  20. Tavazoie, S., Hughes, J.D., Campbell, M.J., Cho, R.J., and Church, G.M. (1999). Systematic determination of genetic network architecture. Nature Genetics 22, 281-285
  21. Toronen, P., Kolehmainen, M., Wong, G., and Castren, E. (1999). Analysis of gene expression data using self-organizing maps. FEBS Letters 451(2), 142-146
  22. Tsuda, K., Kin, T., and Asai, K. (2002). Marginalized kernels for biological sequences. Bioinformatics 18(SuppI1), S268-S275