DOI QR코드

DOI QR Code

직교요인을 이용한 국소선형 로지스틱 마이크로어레이 자료의 판별분석

Local Linear Logistic Classification of Microarray Data Using Orthogonal Components

  • 백장선 (전남대학교 자연과학대학 통계학과) ;
  • 손영숙 (전남대학교 자연과학대학 통계학과)
  • Baek, Jang-Sun (Department of Statistics, Chonnam National University) ;
  • Son, Young-Sook (Department of Statistics, Chonnam National University)
  • 발행 : 2006.11.30

초록

본 논문에서는 마이크로어레이 (microarray) 자료에 판별분석을 적용 시 나타나는 고차원 및 소표본 문제의 해결방법으로서 직교요인을 새로운 특징변수로 사용한 비모수적 국소선형 로지스틱 판별분석을 제안한다. 제안된 방법은 국소우도에 기반한 것으로서 다범주 판별분석에 적용될 수 있으며, 고려된 직교인자는 주성분 요인, 부분최소제곱 요인, 인자분석 요인 등이다. 대표적인 두 가지 실제 마이크로어레이 자료에 적용한 결과 직교요인들 중에서 부분최소제곱 요인을 특징변수로 사용한 경우 고전적인 통계적 판별분석보다 향상된 분류 능력을 나타내고 있음을 확인하였다.

The number of variables exceeds the number of samples in microarray data. We propose a nonparametric local linear logistic classification procedure using orthogonal components for classifying high-dimensional microarray data. The proposed method is based on the local likelihood and can be applied to multi-class classification. We applied the local linear logistic classification method using PCA, PLS, and factor analysis components as new features to Leukemia data and colon data, and compare the performance of the proposed method with the conventional statistical classification procedures. The proposed method outperforms the conventional ones for each component, and PLS has shown best performance when it is embedded in the proposed method among the three orthogonal components.

키워드

참고문헌

  1. Alizadeh, A. A. et al. (2000). Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling, Nature, 403, 491-492 https://doi.org/10.1038/35000684
  2. Alon, V., Barkai, N., Notterman, D. A., Gish, K., Ybarra, S., Mack, D., Levine, A. J. (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays, Proc. Nat. Acad. Sci., 96, 6745-6750
  3. Anderson, J. A. (1975). Quadratic logistic discrimination, Biometrika, 62, 149-154 https://doi.org/10.1093/biomet/62.1.149
  4. Antoniadis, A., Lambert-Lacroix, S. and Leblanc, F. (2003). Effective dimension reduction methods for tumor classification using gene expression data, Bioinformatics, 19, 563-570 https://doi.org/10.1093/bioinformatics/btg062
  5. Baek, J. and Son, Y. S. (2006). Local linear logistic discriminant analysis with partial least square components, To appear in Lecture Notes in Artificial Intelligence (LNAI 4093)
  6. Bicciato, S., Luchini, A. and Di Bello, C. (2003). PCA disjoint models for multiclass cancer analysis using gene expression data, Bioinformatics, 19, 571-578 https://doi.org/10.1093/bioinformatics/btg051
  7. Bolstard B. M. et al. (2003). A comparison of normalization methods for high density oligonucleotide array data based on bias and variance, Bioinformatics, 19, 185-193 https://doi.org/10.1093/bioinformatics/19.2.185
  8. Dudoit, S. et al. (2002). Comparison of discrimination methods for the classification of tumors using gene expression data, J. Am. Stat. Assoc., 97, 77-87 https://doi.org/10.1198/016214502753479248
  9. Dudoit, S. and Fridlyand, J. (2003). Classification in microarray experiments, In Speed, T.P., Statistical analysis of gene expression microarray data, Chapman and Hall-CRC, New York
  10. Fan, J. and Gijbels, I. (1996). Local polynomial modeling and its applications, London: Chapman & Hall
  11. Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeck, M., Mesirov, P., Coller, H., Loh, M. L., Downing, J. R., Caligiuri, M. A. et al. (1999). Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring, Science, 286, 531-537 https://doi.org/10.1126/science.286.5439.531
  12. Loader, C. (1999). Local regression and likelihood, New York: Springer
  13. Martella, F. (2006). Classification of microarray data with factor mixture models, Bioinformatics, 22, 202-208 https://doi.org/10.1093/bioinformatics/bti779
  14. McLachlan, G. J. et al. (2002). A mixture model-based approach to the clustering of microarray expression data, Bioinformatics, 18, 413-422 https://doi.org/10.1093/bioinformatics/18.3.413
  15. Nguyen, D. and Rocke, D. (2002). Tumor classification by partial least squares using microarray gene expression data. Bioinformatics, 18, 39-50 https://doi.org/10.1093/bioinformatics/18.1.39
  16. West, M., Blanchette, C., Dressman, H., Huang, F., Ishida, S., Spang, R., Zuzan, H., Olason, J., Marks, I., Nevins, J. (2001). Predicting the clinical status of human breast cancer by using gene expression profiles. PNAS, 98, 11462-11467
  17. Xia, Y., Tong, H., Li, W. K. and X, Z. L. (2002). An adaptive estimation of dimension reduction space, J. R. Statist. Soc. B., 64, 363-410
  18. Yeung K. Y. et al. (2001). Model-based clustering and data transformations for gene expression data, Bioinformatics, 17, 977-987 https://doi.org/10.1093/bioinformatics/17.10.977