DOI QR코드

DOI QR Code

Cancer-Subtype Classification Based on Gene Expression Data

유전자 발현 데이터를 이용한 암의 유형 분류 기법

  • Published : 2004.12.01

Abstract

Recently, the gene expression data, product of high-throughput technology, appeared in earnest and the studies related with it (so-called bioinformatics) occupied an important position in the field of biological and medical research. The microarray is a revolutionary technology which enables us to monitor several thousands of genes simultaneously and thus to gain an insight into the phenomena in the human body (e.g. the mechanism of cancer progression) at the molecular level. To obtain useful information from such gene expression measurements, it is essential to analyze the data with appropriate techniques. However the high-dimensionality of the data can bring about some problems such as curse of dimensionality and singularity problem of matrix computation, and hence makes it difficult to apply conventional data analysis methods. Therefore, the development of method which can effectively treat the data becomes a challenging issue in the field of computational biology. This research focuses on the gene selection and classification for cancer subtype discrimination based on gene expression (microarray) data.

Keywords

References

  1. J. Quackenbush, 'Computational genetics: computational analysis of microarray data,' Nature. Rev. Geneitics, vol. 2, pp. 418-427, 2001 https://doi.org/10.1038/35076576
  2. A. A. Alizadeh, M. B. Eisen, R. E. Davis, C. Ma, S. Losses, A. Rosenwald, J. C. Boldrick, H. Sabet, T. Tran, X. Yu, J. I. Powell, L. Yang, G. E. Marti, T. Moore, J. Hudson Jr., L. Lu, D. B. Lewis, R. Tibshirani, G. Sherlock, W. C. Chan, T. C. Greiner, D. D. Weisenburger, J. O. Armitage, R. Warnke, R. Levy, W. Wilson, M. R. Grever, J. C. Byrd, D. Botstein, P. O. Brown, and L. M. Staudt, 'Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling,' Nature, vol. 403, pp. 503-511, 2000 https://doi.org/10.1038/35000501
  3. U. Alon, N. Barkai, D. A. Notterman, K. Gish, Y. Barra, D. Mack and A. J. Levine, 'Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays,' Proceedings of National Academy of Science USA, vol. 96, pp. 6745-6750, 1999 https://doi.org/10.1073/pnas.96.12.6745
  4. S. A. Armstrong, J. E. Staunton, L. B. Silverman, R. Rieters, M. L. den Boer, M. D. Minden, S. E. SaIlan, E. S. Lander, T. R. Golub and S. J. Korsmeyer, 'MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia,' Nature Genetics, vol. 30, pp. 41-47, 2002 https://doi.org/10.1038/ng765
  5. T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov, H. Coller, M. L. Loh, J. R. Downing, M. A. Caligiuri, C. D. Bloomfield and E. S. Lander, 'Molecular classification of cancer: class discovery and class prediction by gene expression monitoring,' Science, vol. 286, pp. 531-537, 1999 https://doi.org/10.1126/science.286.5439.531
  6. X. Chen, S. T. Cheung, S. So, S. T. Fan, C. Barry, J. Higgins, K.-M. Lai, S. Dudoit, I. O. L. Ng, M. van de Rijn, D. Botstein and P. O. Brown, 'Gene expression patterns in human liver cancers,' Molecular Biology of the Cell, vol. 13, pp. 1929-1939, 2002 https://doi.org/10.1091/mbc.02-02-0023
  7. D. A. Notterman, U. Alon, A. J. Sierk and A. J. Levine, 'Transcriptional gene expression profiles of colorectaI adenoma, adenocarcinoma and normal tissue examined by oligonucleotide array,' Cancer Research, vol. 61, pp. 3124-3130, 2001
  8. M. A. Shipp, K. N. Ross, P. Tamayo, A. P. Weng, J. L. Kutok, R. C. T. Aguiar, M. Gaasenbeek, M. Angelo, M. Reich, G. S. Pinkus, T. S. Ray, M. A. Koval, K. W. Last, A. Norton, A. Lister, J. Mesirov, D. S. Neuberg, E. S. Lander, J. C. Aster and T. R. Golub, 'Diffuse large B-cell lymphoma outcome prediction by gene expression profiling and supervised machine learning,' Nature Medicine, vol. 8, pp. 68-74, 2002 https://doi.org/10.1038/nm0102-68
  9. D. Singh, P. G. Febbo, K. Ross, D. G. Jackson, J. Manola, C. Ladd, P. Tamayo, A. A. Renshaw, A. V. D'Amico, J. P. Richie, E. S. Lander, M. Loda, P. W. Kantoff, T. R. Golub and W. R. Sellers, 'Gene expression correlates of clinical prostate cancer behavior,' Cancer Cell, vol. 1, pp. 203-209, 2002 https://doi.org/10.1016/S1535-6108(02)00030-2
  10. L. van't Veer, H. Dai, M. J. van de Vijver, Y. D. He, A. A. M. Hart, M. Mao, H. L. Peterse, K. van der Kooy, M. J. Marton, A. T. Witteveen, G. J. Schreiber, R. M. Kerkhoven, C. Roberts, P. S. Linsley, R. Bernards and S. H. Friend, 'Gene expression profiling predicts clinical outcome of breast cancer,' Nature, vol. 415, pp. 530-536, 2002 https://doi.org/10.1038/415530a
  11. B. M. Bolstad, R. A. Irizarry, M. Astrand and T. P. Speed, 'A comparison of normalization methods for high density oligonucleotide array data based on variance and bias,' Bioinformatics, vol. 19, pp. 185-193, 2003 https://doi.org/10.1093/bioinformatics/19.2.185
  12. Y. H. Yang, S. Dudoit, P. Luu, D. M. Lin, V. Peng, J. Ngai and T. P. Speed, 'Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation,' Nucleic Acids Research, vol. 30, pp. e15, 2002 https://doi.org/10.1093/nar/30.4.e15
  13. G. Sherlock, 'Analysis of large-scale gene expression data,' Current Opinion in Immunology, vol. 12, pp. 201-205, 2000 https://doi.org/10.1016/S0952-7915(99)00074-6
  14. R. Tibshirani, T. Hastie, B. Narasimhan and G. Chu, 'Diagnosis of multiple cancer types by shrunken centroids of gene expression,' Proceedings of National Academy of Science USA, vol. 99, pp. 6567-6572, 2002 https://doi.org/10.1073/pnas.082099299
  15. V. G. Tusher, R. Tibshirani and G. Chu, 'Significance analysis of microarrays applied to the ionizing radiation response,' Proceedings of National Academy of Science USA, vol. 98, pp. 5116-5121, 2001 https://doi.org/10.1073/pnas.091062498
  16. Y. Lu and J. Han, 'Cancer classification using gene expression data,' Information Systems, vol. 28, pp. 243-268, 2003 https://doi.org/10.1016/S0306-4379(02)00072-8
  17. I. Guyon, J. Weston, S. Barnhill and V. Vapnik, 'Gene Selection for Cancer Classification using Support Vector Machines,' Machine Learning, vol. 46, pp. 389-422, 2002 https://doi.org/10.1023/A:1012487302797
  18. Y. Lee and C. Lee, 'Classification of multiple cancer types by multicategory support vector machines using gene expression data,' Bioinformatics, vol. 19, pp. 1132-1139, 2003 https://doi.org/10.1093/bioinformatics/btg102
  19. M. Defernez and E. K. Kemsley, 'The use and misuse of chemometrics for treating classification problems,' Trends in Analytical Chemistry, vol. 16, pp. 216-221,1997 https://doi.org/10.1016/S0165-9936(97)00015-0
  20. A. Brazma and J. Vilo, 'Gene expression data analysis,' FEBS Letters, vol. 480, pp. 17-24, 2000 https://doi.org/10.1016/S0014-5793(00)01772-5
  21. S. Sharma, Applied Multivarate Techniques, John Wiley and Sons, New York, 1996
  22. S. Dudoit, Y. H. Yang, T. P. Speed and M. J. Callow, 'Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments,' Statistica Sinica, vol. 12, pp.111-139, 2002
  23. R. A. Johnson and D. W. Wichern, Applied Multivariate Statistical Analysis, Third Edition, Prentice Hall, 1992
  24. G. H. Golub and C. F. van Loan, Matrix Computations, The Johns Hopkins University Press, 1983
  25. L. H. Chiang, E. Russell and R. D. Braatz, 'Fault diagnosis in chemical processes using Fisher discriminant analysis, discriminant partial least squares, and principal component analysis,' Chemometrics and Intelligent Laboratory Systems, vol. 50, pp. 243-252, 2000 https://doi.org/10.1016/S0169-7439(99)00061-1
  26. J.-H. Cho, D. Lee, J. H. Park, K. Kim and I.-B. Lee, 'Optimal approach for classification of acute leukemia subtypes based on gene expression data,' Biotechnology Progress, vol. 18, pp. 847-854, 2002 https://doi.org/10.1021/bp025517o
  27. B. Scholkopf, A. Smola and K.-R. Muller, 'Nonlinear component analysis as a kernel eigenvalue problem,' Neural Computation, vol. 10, pp. 1299-1319, 1998 https://doi.org/10.1162/089976698300017467
  28. S. Mika, G. Ratsch, J. Weston, B. Scholkopf and K.-R. Muller, 'Fisher discriminant analysis with kernels,' Proc. IEEE Neural Networks for Signal Processing Workshop, pp. 41-48, 1999 https://doi.org/10.1109/NNSP.1999.788121
  29. S. Haykin, Neural Networks : a comprehensive foundation, Second edition, Prentice Hall, 1999
  30. D. Lee, S. W. Choi, M. Kim, J. H. Park, M. Kim, J. Kim and I.-B. Lee, 'Discovery of differentially expressed genes related to histological subtype of hepatocellular carcinoma,' Biotechnology Progress, vol. 19, pp. 1011-1015, 2003 https://doi.org/10.1021/bp025746a
  31. J. Khan, J. S. Wei, M. Ringner, L. H. Saal, M. Ladanyi, F. Westermann, F. Berthold, M. Schwab, C. R. Antonescu, C. Peterson and P. S. Meltzer, 'Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks,' Nature Medicine, vol. 7, pp. 673-679, 2001 https://doi.org/10.1038/89044
  32. J. Fridlyand, S. Dudoit and T. P. Speed, 'Comparison of discrimination methods for the classification of tumors using gene expression data,' Journal of the American Statistical Association, vol. 97, pp. 77-87, 2002 https://doi.org/10.1198/016214502753479248
  33. A. Ben-Dor, L. Bruhn, N. Friedman, I. Nachman, M. Schummer and Z. Yakhini, 'Tissue classification with gene expression profiles,' Journal of Computational Biology, vol. 7, pp. 559-583, 2000 https://doi.org/10.1089/106652700750050943
  34. I. Hedenfalk, D. Duggan, Y. Chen, M. Radmacher, M. Bittner, R. Simon, P. Meltzer, B. Gusterson, M. Esteller, O.-P. Kallioniemi, B. Wilfond, A. Borg. and J. Trent 'Gene-expression profiles in hereditary breast cancer,' New England Journal of Medicine, vol. 344, pp. 539-548, 2001 https://doi.org/10.1056/NEJM200102223440801
  35. A. Rakotomamonjy, 'Variable selection using SVM-based criteria,' Journal of Machine Learning Research, vol. 3, pp. 1357-1370, 2003 https://doi.org/10.1162/153244303322753706
  36. J. Xu, X. Zhang and Y. Li, 'Kernel MSE algorithm: A unified framework for KFD, LS-SVM and KRR,' Proceeding of International Joint Conference on Neural Networks 2001, pp. 1486-1491, 2001 https://doi.org/10.1109/IJCNN.2001.939584
  37. R. O. Duda, P. E. Hart and D. G. Stork, Pattern Classification, Second Edition, John Wiley & Sons, 2001
  38. J.-H. Cho, D. Lee, J. H. Park and I.-B. Lee, 'New gene selection method for classification of cancer subtypes considering withinclass variation,' FEBS Letters, vol. 551, pp. 3-7, 2003 https://doi.org/10.1016/S0014-5793(03)00819-6
  39. J.-H. Cho, D. Lee, J. H. Park and I.-B. Lee, 'Gene selection and classification from microarray data using kernel machine,' FEBS Letters, vol. 571, pp. 93-98, 2004 https://doi.org/10.1016/j.febslet.2004.05.087