DOI QR코드

DOI QR Code

시간 경로 마이크로어레이 자료의 군집 분석에 관한 고찰

A Review of Cluster Analysis for Time Course Microarray Data

  • 손인석 (고려대학교 통계학과) ;
  • 이재원 (고려대학교 통계학과) ;
  • 김서영 (전남대학교 기초과학연구소)
  • Sohn In-Suk (Department of Statistics, Korea University) ;
  • Lee Jae-Won (Department of Statistics, Korea University) ;
  • Kim Seo-Young (Researcher, Research Institute for Basic Science, Chonnam National University)
  • 발행 : 2006.03.01

초록

생물학자들은 시간에 따라 발현 수준이 변화하는 유전자의 군집화를 시도하고 있다. 지금까지는 마이크로어레이 자료의 군집분석에 관한 연구의 경우 군집 방법 자체를 비교하는 연구가 주를 이루었다. 그러나 군집화 이전에 의미있는 변화를 보이는 유전자 선택에 따라 군집화 결과가 달라지기 때문에, 군집 분석에 있어서 유전자 선택 단계도 중요하게 고려되어야 한다. 따라서, 본 논문에서는 시간 경로 마이크로어레이 자료를 군집 분석하는데 있어서 유전자 선택, 군집 방법 선택, 군집평가 방법 선택 등 3가지 요인을 고려한 폭 넓은 비교 연구를 하였다.

Biologists are attempting to group genes based on the temporal pattern of gene expression levels. So far, a number of methods have been proposed for clustering microarray data. However, the results of clustering depends on the genes selection, therefore the gene selection with significant expression difference is also very important to cluster for microarray data. Thus, this paper present the results of broad comparative studies to time course microarray data by considering methods of gene selection, clustering and cluster validation.

키워드

참고문헌

  1. Barash, Y. and Friedman, N. (2002). Context-Specific Bayesian Clustering for Gene Expression Data, Journal of Computational Biology, 9, 169-191 https://doi.org/10.1089/10665270252935403
  2. Chen, G. et al. (2002). Evaluation and comparison of clustering algorithms in analyzing ES cell gene expression data, Statistica Sinica, 12, 241-262
  3. Chu, S., DeRisi, J. et al., (1998). The transcriptional program of sporulation in budding yeast, Science, 282, 699-705 https://doi.org/10.1126/science.282.5389.699
  4. Datta, S. and Datta, S. (2003). Comparisons and validation of statistical clustering techniques for microarray gene expression data, Bioinformatics, 19, 459-466 https://doi.org/10.1093/bioinformatics/btg025
  5. Dudoit, S., Yang, Y. H., Speed, T. and Callow, M. J. (2002). Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments, Statistica Sinica, 12, 111-139
  6. Efron, B. (1982). The jackknife, the bootsrap, and other resampling plans, Society for industrial and applied mathematics
  7. Eisen, M. B., Spellman, P. T., Brown, P. O. and Botstein, D. (1998). Cluster analysis and display of genome-wide expression patterns, Proc. Natl Acad. Sci., 95, 14863-14868
  8. Goldstein, D. R, Conlon, E. and Ghosh, D. (2002). Statistical issues in the clustering of gene expression data, Statistica Sinica, 12, 219-240
  9. Ghosh, D. and Chinnaiyan, A. M. (2002). Mixture modelling of gene expression data from microarray experiments, Bioinformatics, 18, 275-286 https://doi.org/10.1093/bioinformatics/18.2.275
  10. Guthke, R, Schmidt-Heck, W., Hahn, D. and Pfaff, M. (2000). Gene expression data mining for functional genomics, Proceedings of European Symposium on Intelligent Techniques (EIST 2000), Aachen, Germany, 170-177
  11. Hartigan, J. A. and Wong, M. A. (1979). A k-means clustering algorithm. Applied Statistics. Vol 28. 100-108 https://doi.org/10.2307/2346830
  12. Hastie, T., Tibshirani, R et al. (2000). Gene shaving as a method for identifying distinct sets of genes with similar expression patterns, Genome Biology, 1, research003
  13. Hihara, Y., Kamei, A., Kanehisa, M., Kaplan, A. and Ikeuchi, M. (2001). DNA microarray analysis of cyanobacterial gene expression during acclimation to high light, The Plant Cell, 13, 793-806 https://doi.org/10.2307/3871341
  14. Hong, F. and Li, H. (2004). B-spline Based Empirical Bayes Methods for Identifying Genes with Different Time-course Expression Profiles. submitted
  15. Kaufman, L. and Rousseeuw, P. J. (1990). Finding groups in data: An introduction to cluster analysis, New York, John Wiley
  16. Kasturi, J., Acharya, R. and Ramanathan, R. (2003). An information theoretic approach for analyzing temporal patterns of gene expression, Bioinformatics, 19, 449-458 https://doi.org/10.1093/bioinformatics/btg020
  17. Kerr, M. K. and Churchill, G. A. (2001). Bootstrapping cluster analysis: assessing the reliability of conclusions from microarray experiments, Proc. Natl Acad. Sci., 98, 8961-8965
  18. Kim, S. Y., Choi, T. M. and Bae J. S. (2005). Fuzzy types clustering for microarray data, International Journal of Computational Intelligence, bf 2, 12-15
  19. Kim, S. Y., Lee, J. W. and Bae J. S. (2006). Effect of data normalization on fuzzy clustering of DNA microarray data., BMC Bioinformatics, To appear
  20. Kim, S. Y., Lee, J. W. and Shon, I. S. (2006). Comparison of various statistical methods for identifying differential gene expression in replicated microarray data, Statistical Methods in Medical Research, 15, 1-18 https://doi.org/10.1191/0962280206sm431ed
  21. Laura, L. and Owen, A. (2002). Plaid models for gene expression data, Statistica Sinica, 12, 61-86
  22. Lonnstedt, I. and Speed, T. P. (2002). Replicated microarray data, Statistica Sinica, 12, 31-46
  23. Luan, Y. and Li, H. (2003). Clustering of time-course gene expression data using a mixedeffects model with B-splines, Bioinformatics, 19, 474-482 https://doi.org/10.1093/bioinformatics/btg014
  24. McLachlan, G. J., Bean, R. W. and Peel, D. (2002). A mixture model based approach to the clustering of microarray expression data, Bioinformatics, 18, 1-10 https://doi.org/10.1093/bioinformatics/18.1.1
  25. Moon et al. (2002). Mice Lacking Paternally Expressed Pref-1/Dlk1 Display Growth Retardation and Accelerated Adiposity, Molecular and Cellular Biology, 22, 5585-5592 https://doi.org/10.1128/MCB.22.15.5585-5592.2002
  26. Smyth, G. K., Yang, Y. H. and Speed, T. (2003). Statistical issues in cDNA microarray data analysis, in Functional Genomics: Methods and Protocols, eds
  27. Spellman, P. T., Sherlock, G., Zhang, M. Q. et al., (1998). Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization, Mol. Biol. Cell., 12, 3273-3297
  28. Tamayo, P., Slonim, D., Mesirov, J., Zhu, Q., Kitareewan, S., Dmitrovsky, E., Lander, E. S. and Golub, T. R. (1999). Interpreting patterns of gene expression with selforganizing maps:methods and application to hematopoietic differentiation, Proceedings of the National Academy of Sciences, 96, 2907-2912
  29. Tusher, V., Tibshirani, R., and Chu, G. (2001). Significance analysis of microarrays applied to the ionizing radiation response, Proceedings of the National Academy of Sciences, 98, 5116-5124
  30. Waddell, P. and Kishino, H. (2000). Cluster inference methods and graphical models evaluated on NC160 microarray gene expression data, Genome Informatics, 11, 129-140
  31. Yeung, K., Haynor, D. R. and Ruzzo, W. L. (2001). Validating clustering for gene expression data, Bioinformatics, 17, 309-318 https://doi.org/10.1093/bioinformatics/17.4.309
  32. Yeung, K. Y., Fraley, C. Murua, A, Raftery, E. and Ruzzo, W. L. (2001). Model based clustering and data transformations for gene expression data, Bioinformatics, 17, 977-987 https://doi.org/10.1093/bioinformatics/17.10.977