Analysis and Subclass Classification of Microarray Gene Expression Data Using Computational Biology

전산생물학을 이용한 마이크로어레이의 유전자 발현 데이터 분석 및 유형 분류 기법

  • 유창규 (포항공과대학교 화학공학과) ;
  • 이민영 (포항공과대학교 화학공학과) ;
  • 김영황 (포항공과대학교 화학공학과) ;
  • 이인범 (포항공과대학교 화학공학과)
  • Published : 2005.10.01


Application of microarray technologies which monitor simultaneously the expression pattern of thousands of individual genes in different biological systems results in a tremendous increase of the amount of available gene expression data and have provided new insights into gene expression during drug development, within disease processes, and across species. There is a great need of data mining methods allowing straightforward interpretation, visualization and analysis of the relevant information contained in gene expression profiles. Specially, classifying biological samples into known classes or phenotypes is an important practical application for microarray gene expression profiles. Gene expression profiles obtained from tissue samples of patients thus allowcancer classification. In this research, molecular classification of microarray gene expression data is applied for multi-class cancer using computational biology such gene selection, principal component analysis and fuzzy clustering. The proposed method was applied to microarray data from leukemia patients; specifically, it was used to interpret the gene expression pattern and analyze the leukemia subtype whose expression profiles correlated with four cases of acute leukemia gene expression. A basic understanding of the microarray data analysis is also introduced.



  1. G. M. Hampton and H. F. Frierson, 'Classifying human cancers by gene expression analysis,' Trends Mol. Med., vol. 9, 5-10, 2003
  2. J. Quackenbush, 'Computational analysis of microarray data,' Nat. Rev. Genet., vol. 2, 418, 2001
  3. Y. Lu and J. Han, 'Cancer classification using gene expression data,' Information Systems, vol. 28, pp. 243-268, 2003
  4. J. Lyons-Weiler, Patel, S. and S. Bhattacharya, 'A classification-based machine learning approach for the analysis of genomewide expression data,' Genome Res., vol. 13, 503-512, 2003
  5. S. Ramaswamy and T. R Golub, 'DNA microarrays in clinical oncology,' J. Clin. Onc., vol. 20, 1932-1945, 2002
  6. T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov, H. Coller, M. L. Loh, J. R. Downing, M. A. Caligiuri, C. D. Bloomfield and E. S. Lander, 'Molecular classification of cancer: class discovery and class prediction by gene expression monitoring,' Science, vol. 286, pp. 531-537, 1999
  7. P. Kebriaei, J. Anastasi, & R. A. Larson, Acute lymphoblastic leukaemia: diagnosis and classification. Best Pract Res Clin Haematol., vol. 15, 597-621, 2002
  8. M. F. Ochs and A. K. Godwin, 'Microarrays in cancer: research and applications,' BioTechniques, vol. 34, S4-S15, 2003
  9. J. Fridlyand, S. Dudoit and T. P. Speed, 'Comparison of discrimination methods for the classification of tumors using gene expression data,' Journal of the American Statistical Association, vol. 97, pp. 77-87, 2002
  10. D. V. Nguyen, and D. M. Rocke, 'Tumor classification by partial least squares using microarray gene expression.,' Bioinformatics, vol. 18(1), 39-50, 2002
  11. A. A. Alizadeh, M. B. Eisen, R. E. Davis, C. Ma, S. Losses, A. Rosenwald, J. C. Boldrick, H. Sabet, T. Tran, X. Yu, J. I. Powell, L. Yang, G. E. Marti, T. Moore, J. Hudson Jr., L. Lu, D. B. Lewis, R. Tibshirani, G. Sherlock, W. C. Chan, T. C. Greiner, D. D. Weisenburger, J. O. Armitage, R. Warnke, R. Levy, W. Wilson, M. R. Grever, J. C. Byrd, D. Botstein, P. O. Brown, and L. M. Staudt, 'Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling,' Nature, vol. 403, pp. 503-511, 2000
  12. U. Alon, N. Barkai, D. A. Notterman, K. Gish, Y. Barra, D. Mack and A. J. Levine, 'Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays,' Proceedings of National Academy of Science USA, vol. 96, pp. 6745-6750, 1999
  13. S. Bicciato, M. Pandin, G. Didone and C. Di Bello, 'Pattern identification and classification in gene expression data using an auto associative neural network model,' Biotechnol . Bioeng., vol. 18(4), 847-854, 2002
  14. J.-H. Cho, D. Lee, J. H. Park, K. Kim and I.-B. Lee, 'Optimal approach for classification of acute leukemia subtypes based on gene expression data,' Biotechnology Progress, vol. 18, pp. 847-854, 2002
  15. C. K. Yoo, I. Lee, and P. A. Vanrolleghem; 'Interpreting patterns and analysis of acute leukemia gene expression data by multivariate fuzzy statistical analysis,' Comp. & Chem. Eng., vol. 29, 1345-1356, 2005
  16. J. Donie, H. Gerauer, Y. Wachter and S. J. Zunino, 'Resveratrol induces extensive apoptosis by depolarizing motpchondrial membranes and activating caspase-9 in acute lymphoblastic leukemia cells,' Cancer Res., vol. 61, 4731-4739, 2001
  17. P. J. Park, L. Tian and I. S. Kohane, 'Linking gene expression data with patient survival times using partial least squares,' Bioinformatics, vol. 18(1), S120-S127, 2002
  18. T. et al., Ross, 'Systematic variation in gene expression patterns in human cancer cell lines,' Nature Genetics, vol. 24, 227-234, 2000
  19. U. et al., Scherf, 'A gene expression database for the molecular pharmacology of cancer,' Nat. Genet., vol. 24, 236-244, 2000
  20. G Stephanopoulos, D. H. Hwang, W. A. Schmit, J. Misra and G Stephanopoulos, 'Mapping physiological states from microarray expression measurements,' Bioinformatics, vol. 18(8), 1054-1063, 2002
  21. J. Stephenson, 'Human genome studies expected to revolutionize cancer classification,' J. Am. Med. Assoc., vol. 282, 927-92, 1999
  22. R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, 2nd ed., John Wiley & Sons, New York, 2001
  23. Y. H. Yang, S. Dudoit, P. Luu, D. M. Lin, V. Peng, J. Ngai and T. P. Speed, 'Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation,' Nucleic Acids Research, vol. 30, pp. e15, 2002
  24. J. G. Thomas, J. M. Olson, S. J. Tapscott and L. P. Zhao, 'An efficient and robust statistical modeling approach to discover differentially expressed genes using genomic expression profiles,' Genome Res., vol. 11, 1227-1236, 2001
  25. E. et al., Yeoh, 'Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling,' Cancer Cell., vol. 1, 133-143, 2002