Validation Measures of Bicluster Solutions

  • Lee, Young-Rok (Department of Industrial and Management Engineering Pohang University of Science and Technology) ;
  • Lee, Jeong-Hwa (Department of Industrial and Management Engineering Pohang University of Science and Technology) ;
  • Jun, Chi-Hyuck (Department of Industrial and Management Engineering Pohang University of Science and Technology)
  • Received : 2009.05.04
  • Accepted : 2009.05.28
  • Published : 2009.06.30

Abstract

Biclustering is a method to extract subsets of objects and features from a dataset which are characterized in some way. In contrast to traditional clustering algorithms which group objects similar in a whole feature set, biclustering methods find groups of objects which have similar values or patterns in some features. Both in clustering and biclustering, validating how much the result is informative or reliable is a very important task. Whereas validation methods of cluster solutions have been studied actively, there are only few measures to validate bicluster solutions. Furthermore, the existing validation methods of bicluster solutions have some critical problems to be used in general cases. In this paper, we review several well-known validation measures for cluster and bicluster solutions and discuss their limitations. Then, we propose several improved validation indices as modified versions of existing ones.

Keywords

References

  1. Aguilar-Ruiz, Jes$\acute{u}$s (2005), Shifting and scaling patterns from gene expression data. Bioinformatics, 21, 3840-3845 https://doi.org/10.1093/bioinformatics/bti641
  2. Bishop, C. M. (2006), Pattern Recognition and Machine Learning, Springer, NY
  3. Brock, G., Pihur, V., Datta, S., and Datta, S. (2008), clValid, an R package for cluster validation. Journal of Statistical Software, 25, 1-22
  4. Cali$\acute{n}$ski, T., and Harabasz, J. (1974), A dendrite method for cluster analysis. Communications in Statistics-Simulation and Computation, 3, 1-27 https://doi.org/10.1080/03610917408548446
  5. Cheng, Y. and Church, G. (2000), Biclustering of expression data. Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology, 93-103
  6. Davies, D. and Bouldin, D. (1979), A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1, 224-227 https://doi.org/10.1109/TPAMI.1979.4766909
  7. Dice, L. (1945), Measures of the amount of ecologic association between species. Ecology, 26, 297-302 https://doi.org/10.2307/1932409
  8. Downton, M. and Brennan, T. (1980), Comparing classifications:an evaluation of several coefficients of partition agreement. Class. Soc. Bull, 4, 53-54
  9. Dunn, J. (1973), A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. Cybernetics and Systems, 3, Cybernetics and Systems, 3, 32-57 https://doi.org/10.1080/01969727308546046
  10. Fowlkes, E. and Mallows, C. (1983), A method for comparing two hierarchical clusterings. Journal of the American Statistical Association, 78, 553-569 https://doi.org/10.2307/2288117
  11. Halkidi, M., Batistakis, Y., and Vazirgiannis, M. (2001), On clustering validation techniques, Journal of Intelligent Information Systems, 17, 107-145 https://doi.org/10.1023/A:1012801612483
  12. Handl, J., Knowles, J., and Kell, D. (2005), Computational cluster validation in post-genomic data analysis, Bioinformatics, 21, 3201-3212 https://doi.org/10.1093/bioinformatics/bti517
  13. Hubert, L. and Arabie, P. (1985), Comparing partitions. Journal of Classification, 2, 193-218 https://doi.org/10.1007/BF01908075
  14. Jain, A. and Dubes, R. (1988), Algorithms for clustering data, Prentice-Hall, Englewood Cliff, NJ
  15. Liu, X. and Wang, L. (2007), Computing the maximum similarity biclusters of gene expression data. Bioinformatics, 23, 50-56 https://doi.org/10.1093/bioinformatics/btl321
  16. Madeira, S. and Oliveira, A. (2004), Biclustering algorithms for biological data analysis: a survey. IEEE /ACM Transactions on Computational Biology and Bioinformatics, 1, 24-45 https://doi.org/10.1109/TCBB.2004.2
  17. Preli$\acute{c}$, A., Bleuler, S., Zimmermann, P.,Wille, A., B$\ddot{u}$hlmann, P., Gruissem, W., Hennig, L., Thiele, L., and Zitzler, E. (2006), A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics, 22, 1122-1129 https://doi.org/10.1093/bioinformatics/btl060
  18. Rand, W. (1971), Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66, 846-850 https://doi.org/10.2307/2284239
  19. Santamar$\acute{i}$a, R., Quintales, L., and Ther$\acute{o}$n, R. (2007), Methods to bicluster validation and comparison in microarray data. Lecture Notes in Computer Science:Proceedings of IDEAL'07, 780-789
  20. Turner, H., Bailey, T., and Krzanowski, W. (2005), Improved biclustering of microarray data demonstrated through systematic performance tests. Computational Statistics and Data Analysis, 48, 235-254 https://doi.org/10.1016/j.csda.2004.02.003
  21. Xu, R. and Wunsch, D., II (2005), Survey of clustering algorithms, IEEE Transactions on Neural Networks, 16, 645-678 https://doi.org/10.1109/TNN.2005.845141
  22. Yang, Y., Wang, W., Wang, H., and Yu, P. (2002), $\delta$-clusters: capturing subspace correlation in a large data set, Proceedings. 18th International Conference on Data Engineering, 517-528