Clustering Approaches to Identifying Gene Expression Patterns from DNA Microarray Data

Do, Jin Hwan;Choi, Dong-Kug;

Molecules and Cells

Volume 25 Issue 2
/
Pages.279-288
/
2008
/
1016-8478(pISSN)
/
0219-1032(eISSN)

Korean Society for Molecular and Cellular Biology (한국분자세포생물학회)

Clustering Approaches to Identifying Gene Expression Patterns from DNA Microarray Data

Do, Jin Hwan (Bio-Food and Drug Research Center, Konkuk University) ;
Choi, Dong-Kug (Department of Biotechnology, Konkuk University)

Received : 2007.07.13
Accepted : 2007.11.05
Published : 2008.04.30

KSCI

⟨ Previous Next ⟩

Abstract

The analysis of microarray data is essential for large amounts of gene expression data. In this review we focus on clustering techniques. The biological rationale for this approach is the fact that many co-expressed genes are co-regulated, and identifying co-expressed genes could aid in functional annotation of novel genes, de novo identification of transcription factor binding sites and elucidation of complex biological pathways. Co-expressed genes are usually identified in microarray experiments by clustering techniques. There are many such methods, and the results obtained even for the same datasets may vary considerably depending on the algorithms and metrics for dissimilarity measures used, as well as on user-selectable parameters such as desired number of clusters and initial values. Therefore, biologists who want to interpret microarray data should be aware of the weakness and strengths of the clustering methods used. In this review, we survey the basic principles of clustering of DNA microarray data from crisp clustering algorithms such as hierarchical clustering, K-means and self-organizing maps, to complex clustering algorithms like fuzzy clustering.

Keywords

Acknowledgement

Supported by : Korea Research Foundation

References

Alizadeh, A.A., Eisen, M.B., Davis, R.E., Ma, C., Lossos, I.S., Rosenwald, A., Boldrick, J.C., Sabet, H., Tran, T., Yu, X., et al. (2000). Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403, 503-511 https://doi.org/10.1038/35000501
Azuaje, F. (2003). Clustering-based approaches to discovering and visualizing mciroarray data patterns. Brief. Bioinform. 4, 31-42 https://doi.org/10.1093/bib/4.1.31
Balasubramaniyan, R., Hullermeier, E., Weskamp, N., and Kamper, J. (2005). Clustering of gene expression data using a local shape-based similarity measure. Bioinformatics 21, 1069-1077 https://doi.org/10.1093/bioinformatics/bti095
Belacel, N., Cuperlovic-Culf, M., Laflamme, M., and Ouellette, R. (2004). Fuzzy J-means and VNS methods for clustering genes from microarray data. Bioinformatics 20, 1690-1701 https://doi.org/10.1093/bioinformatics/bth142
Belacel, N., Wang, Q., and Cuperlovic-culf, M. (2006). Clustering methods for microarray gene expression data. Omics 10, 507-531 https://doi.org/10.1089/omi.2006.10.507
Ben-Hur, A., Elisseeff, A., and Guyon, I. (2002). A stability based method for discovering structure in clustered data. Pac. Symp. Biocomput. 7, 6-17
Bezdek, J.C. (1981). Pattern Recognition with Fuzzy Objective Function Algorithms (New York: Plenum Press)
Boutros, P.C., and Okey, A.B. (2005). Unsupervised pattern recognition: an introduction to the whys and wherefores of clustering microarray data. Brief. Bioinform. 6, 331-343 https://doi.org/10.1093/bib/6.4.331
Chipman, H. (2006). Hybrid hierarchical clustering with applications to microarray data. Biostatistics 7, 286-301 https://doi.org/10.1093/biostatistics/kxj007
Dembele, D., and Kastner, P. (2003). Fuzzy C-means for clustering microarray data. Bioinformatics 19, 973-980 https://doi.org/10.1093/bioinformatics/btg119
Do, J.H., and Choi, D.K. (2006). Normalization of microarray data: single-labeled and dual-labeled arrays. Mol. Cells 22, 254-261
Dudoit, S., and Fridlyand, J. (2002). A prediction-based resampling method for estimating the number of clusters in a dataset. Genome Biol. 3, research0036
Eisen, M.B., Spellman, P.T., Brown P.O., and Botstein, D. (1998). Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95, 14863-14868
Fowlkes, E.B., and Mallows, C.L. (1983). A method for comparing two hierarchical clusterings. J. Am. Stat. Assoc. 78, 553-584 https://doi.org/10.2307/2288117
Friedman, N., Linial, M., Nachman, I., and Pe'er, D. (2000). Using Bayesian networks to analyze expression data. J. Comput. Biol. 7, 601-620 https://doi.org/10.1089/106652700750050961
Fu, L., and Medico, E. (2007). FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data. BMC Bioinformatics 8, 3 https://doi.org/10.1186/1471-2105-8-3
Gasch, A.U., and Eisen, M.B. (2002). Exploring the conditional co-regulation of yeast gene expression through fuzzy Kmeans clustering. Genome Biol. 3, 1-22
Gersho, A., and Gray, R. (1992). Vector Quantization and Signal Compression (Boston USA: Kluwer Academic Publishers)
Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaassenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., et al. (1999). Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531-537 https://doi.org/10.1126/science.286.5439.531
Hardin, J., Mitani, A., Hicks, L., and VanKoten, B. (2007). A robust measure of correlation between two genes on a microarray. BMC Bioinformatics 8, 220 https://doi.org/10.1186/1471-2105-8-220
Heyer, L.J., Kruglyak, S., and Yooseph, S. (1999). Exploring expression data: identification and analysis of coexpressed genes. Genome Res. 9, 1106-1115 https://doi.org/10.1101/gr.9.11.1106
Hsu, A.L., Tang, S.-L., and Halgamuge, S.K. (2003). An unsupervised hierarchical dynamic self-organizing approach to cancer class discovery and marker gene identification in microarray data. Bioinformatics 19, 2131-2140 https://doi.org/10.1093/bioinformatics/btg296
Ihmels, J., Friedlander, G.., Bergman, S., Sarig, O., Ziv, Y., and Barkai, N. (2002). Revealing modular organization in the yeast transcriptional network. Nat. Genet. 31, 370-377 https://doi.org/10.1038/ng941
Ikota, H., Kinjo, S., Yokoo, H., and Nakazato, Y. (2006). Systematic immunohistochemical profiling of 378 brain tumors with 37 antibodies using tissue microarray technology. Acta Neuropathol. (Berl) 111, 475-482 https://doi.org/10.1007/s00401-006-0060-1
Jain, A.K., and Bubes, R.C. (1988). Algorithms for Clustering Data (NJ: Prentice Hall, Englewood Cliffs)
Jiang, D., Pei, J., and Zhang, A. (2003). Towards interactive exploration of gene expression patterns. ACM SIGKDD Explor Newslett 5, 79-90 https://doi.org/10.1145/980972.980983
Kaufman, L., and Rousseeuw, P. (1990). Finding groups in data (New York, NY: Wiley)
Kim, K., Zhang, S., Jiang, K., Cai, L., Lee, I.-B., Feldman, L.J., and Huang, H. (2007). Measuring similarities between gene expression profiles through new data transformation. BMC Bioinformatics 8, 29 https://doi.org/10.1186/1471-2105-8-29
Kohonen, T. (1990). The self-organizing map. Proc. IEEE 78, 1464-1479
Krishna, K., and Narasimha Murty, M. (1999). Genetic K-means algorithm. IEEE Trans. Syst. Man Cybern. Part B 29, 433-439 https://doi.org/10.1109/3477.764879
Lu, Y., Lu, S., Fotouhi, F., Deng, Y., and Brown, S.J. (2004a). FGKA: a fast genetic K-means clustering algorithm. Proceedings of the 2004 ACM symposium on Applied computing (SAC), Nicosia, Cyprus
Lu, Y., Lu, S., Fotouhi, F., Deng, Y., and Brown, S.J. (2004b). Incremental genetic K-means algorithm and its application in gene expression data analysis. BMC Bioinformatics 5, 172 https://doi.org/10.1186/1471-2105-5-172
Macnaughton-Smith, P., Williams, W.T., Dale, M.B., and Mockett, L.G. (1964). Dissimilarity analysis: a new technic of hierarchical subdivision. Nature 202, 1034-1035 https://doi.org/10.1038/2021034a0
Raychaudhuri, S., Sutphin, P.D., Chang, J.T., and Altman, R.B. (2001). Basic microarray analysis: grouping and feature reduction. Trends Biotechnol. 19, 189-193 https://doi.org/10.1016/S0167-7799(01)01599-2
Ressom, H., Wang, D., and Natarajan, P. (2003). Adaptive double self-organizing maps for clustering gene expression profiles. Neural Netw. 16, 633-640 https://doi.org/10.1016/S0893-6080(03)00102-3
Sheng, Q., Moreau, Y., and De Moor, B. (2003). Biclustering microarray data by Gibbs sampling. Bioinformatics 19 (Suppl. 2), ii196-ii205
Slonim, D.K. (2002). From patterns to pathways: gene expression data analysis comes of age. Nat. Genet. 32 (Suppl. 2), 502-508 https://doi.org/10.1038/ng1033
Su, M., and Chang, H. (2001). A new model of self-organizing neural networks and its application in data projection. IEEE Trans. Neural Netw. 12, 153-158 https://doi.org/10.1109/72.896805
Tan, P.N., Steinbach, M., and Kumar, V. (2005). Introduction to Data Mining (Boston: Addison-Wesley)
Tseng, G. (2004). A comparative review of gene clustering in expression profile. eighth international conference on control, automation, robotics and vision (ICARCV). 1320-1324
Van der Laan, M., Pollard, K.S., and Bryan, J. (2003). A new partitioning around medoids algorithm. J. Stat. Comput. Simul. 73, 575-584 https://doi.org/10.1080/0094965031000136012
Wang, J., Bo, T.H., Jonassen, I., and Hovig, E. (2003). Tumor classification and marker gene prediction by feature selection and fuzzy c-means clustering using microarray data. BMC Bioinformatics 4, 60 https://doi.org/10.1186/1471-2105-4-60
Wen, X., Fuhrman, S., Michaels, G.S., Carr, D.B., Smith, S., Barker, J.L., and Somogyi, R. (1998). Large-scale temporal gene expression mapping of central nervous system development. Proc. Natl. Acad. Sci. USA 95, 334-339
Woolf, P.J., and Wang, Y. (2000). A fuzzy logic approach to analyzing gene expression data. Physiol. Genomics 3, 9-15 https://doi.org/10.1152/physiolgenomics.2000.3.1.9
Wu, F.X., Zhang, W.J., and Kusalik, A.J. (2006). Determination of the minimum number of microarray experiments for discovery of gene expression patterns. BMC Bioinformatics 7 (Suppl. 4), S13 https://doi.org/10.1186/1471-2105-7-13
Xing, B., Greenwood, C.M., and Bull, S.B. (2007). A hierarchical clustering method for estimating copy number variation. Biostatistics 8, 632-653 https://doi.org/10.1093/biostatistics/kxl035

Molecules and Cells

Clustering Approaches to Identifying Gene Expression Patterns from DNA Microarray Data

Abstract

Keywords

Acknowledgement

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)