Fuzzy Cluster Analysis of Gene Expression Profiles Using Evolutionary Computation and Adaptive ${\alpha}$-cut based Evaluation

진화연산과 적응적 ${\alpha}$-cut 기반 평가를 이용한 유전자 발현 데이타의 퍼지 클러스터 분석

  • 박한샘 (연세대학교 컴퓨터과학과) ;
  • 조성배 (연세대학교 컴퓨터과학과)
  • Published : 2006.08.01

Abstract

Clustering is one of widely used methods for grouping thousands of genes by their similarities of expression levels, so that it helps to analyze gene expression profiles. This method has been used for identifying the functions of genes. Fuzzy clustering method, which is one category of clustering, assigns one sample to multiple groups according to their degrees of membership. This method is more appropriate for analyzing gene expression profiles because single gene might involve multiple genetic functions. Clustering methods, however, have the problems that they are sensitive to initialization and can be trapped into local optima. To solve these problems, this paper proposes an evolutionary fuzzy clustering method, where adaptive a-cut based evaluation is used for the fitness evaluation to apply different criteria considering the characteristics of datasets to overcome the limitation of Bayesian validation method that applies the same criterion to all datasets. We have conducted experiments with SRBCT and yeast cell-cycle datasets and analyzed the results to confirm the usefulness of the proposed method.

유전자 데이타의 클러스터링은 방대한 유전자 정보를 발현 정도에 따라 비슷한 그룹으로 나누어 분석하는 방법으로 유전자의 기능을 분석하는데 사용되어 왔다. 클러스터링의 한 종류인 퍼지 클러스터링은 하나의 샘플이 소속정도에 따라 여러 그룹에 동시에 소속되도록 나누는 방법으로, 하나의 유전자 데이타는 여러가지 유전 정보를 가칠 수 있기 때문에 유전자 발현 데이타의 분석에 보다 적절한 방법이다. 그러나 보통 클러스터링 방법은 초기 값에 민감하고, 지역해에 빠질 수 있는 단점을 갖는다. 이런 단점을 해결하기 위해 본 논문에서는 진화 연산을 이용한 퍼지 클러스터링 방법을 제안한다. 이때, 적합도 평가를 위해서 모든 데이타에 대해 동일한 기준을 적용하는 베이지안 검증방법의 단점을 개선하여, 데이타의 특성 을 고려하여 결정된 적용적 ${\alpha}$-cut 기반 평가방법을 사용한다. SRBCT 데이타와 효모 세포주기 데이타를 이용해 실험을 하고 결과를 분석하여 제안하는 방법의 유용성을 확인하였다.

Keywords

References

  1. U. Alon, et al., 'Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide ?arrays,' Proc. Natl. Acad. Sci. USA, vol. 96, pp. 6745-6750, June 1999
  2. A. P. Gasch and M. B. Eisen, 'Exploring the conditional coregulation of yeast gene expression through fuzzy k-means clustering,' Genome Biology, vol. 3, no. 11, research 0059.1-0059.22, 2002
  3. N. Bolshakova and F. Azuaje, 'Cluster validation techniques for genome expression data,' SIGPRO, vol. 21, no. 82, pp. 1-9, 2002
  4. L. O. Hall, et al., 'Clustering with a genetically optimized approach,' IEEE Trans. on Evolutionary Computation, vol. 3, no. 2, pp. 103-112, 1999 https://doi.org/10.1109/4235.771164
  5. U. Maulik and S. Bandyopadhyay, 'Genetic algorithm-based clustering technique,' Pattern Recognition, vol. 33, pp. 1455-1465, 2000 https://doi.org/10.1016/S0031-3203(99)00137-5
  6. L. Chamber, Practical Handbook of Genetic Algorithm, CRC Press, 1995
  7. J. N. Bhuyan, et al., 'Genetic algorithm for clustering with an ordered representation,' in Proc. 4th Int. Conf. Genetic Algorithms, pp. 408-415, 1991
  8. M. B. Eisen, P. T. Spellman, P. O. Brown and D. Botstein, 'Cluster analysis and display of genomewide expression patterns,' Proc. Natl. Acad. Sci. USA, 95, pp. 14863-14868, 1998 https://doi.org/10.1073/pnas.95.25.14863
  9. M. E. Futschik, A. Reeve and N. Kasabov, 'Evolving connectionist systems for knowledge discovery from gene expression data of cancer tissue,' Artificial Intelligence in Medicine, 28, pp. 165-189, 2003 https://doi.org/10.1016/S0933-3657(03)00063-0
  10. R. E. Hammah and J. H. Curran, 'Validity ?measures for the fuzzy cluster analysis of orientations,' IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 22, no. 12, 2000 https://doi.org/10.1109/34.895981
  11. F. Hoppner, et al., Fuzzy Cluster Analysis, Wiley, pp. 43-39, 1999
  12. S.-H. Yoo, H.-H. Won and S.-B. Cho, 'Analysis of Saccharomyces cell cycle expression data using Bayesian validation of fuzzy clustering,' Journal of Korea Information Science Society, vol. 31, no. 12, pp. 1591-1601, 2004
  13. D. Dembele, and P. Kastner, 'Fuzzy c-means method for clustering microarray data,' Bioinformatics, vol. 19, no. 8, pp. 973-980, 2003 https://doi.org/10.1093/bioinformatics/btg119
  14. K. Krishna and M. N. Murty, 'Genetic k-means algorithm,' IEEE Trans. on Systems, Man and Cybernetics, vol. 20, no. 3, pp. 433-439, 1999 https://doi.org/10.1109/3477.764879
  15. A. A. Alizadeh, et al., 'Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling,' Nature, vol. 403, pp. 503-511, February 2000 https://doi.org/10.1038/35000501
  16. T. R. Golub, et al., 'Molecular classification of cancer class discovery and class prediction by gene-expression monitoring,' Science, vol. 286, no. 15, pp. 531-537, October 1999 https://doi.org/10.1126/science.286.5439.531
  17. V. R. Iyer, et al., 'The transcriptional program in the response of human fibroblast to serum,' Science, vol. 283, pp. 83-87, 1999 https://doi.org/10.1126/science.283.5398.83
  18. J. Khan, et al., 'Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks,' Nature, vol. 7, no. 6, pp. 673-679, June 2001 https://doi.org/10.1038/89044
  19. R. J. Cho, et al., 'A genome-wide transcriptional ?analysis of the mitotic cell cycle,' Molecular Cell, vol. 2, pp. 65-73, 1998 https://doi.org/10.1016/S1097-2765(00)80114-8