DOI QR코드

DOI QR Code

Exploratory Analysis of Gene Expression Data Using Biplot

행렬도를 이용한 유전자발현자료의 탐색적 분석

  • 박미라 (을지의과대학교 의예과)
  • Published : 2005.07.01

Abstract

Genome sequencing and microarray technology produce ever-increasing amounts of complex data that needs statistical analysis. Visualization is an effective analytic technique that exploits the ability of the human brain to process large amounts of data. In this study, biplot approach applied to microarray data to see the relationship between genes and samples. The supplementary data method to classify new sample to known category is suggested. The methods are validated by applying it to well known microarray data such as Golub et al.(1999), Alizadeh et al.(2000), Ross et al.(2000). The results are compared to the results of several clustering methods. Modified graph which combine partitioning method and biplot is also suggested.

마이크로어레이 실험에서는 유전자의 기능과 상호작용의 이해를 돕기 위한 방안으로 유전자발현자료의 시각화방법이 많이 사용되고 있다. 행렬도는 유전자와 샘플들을 동시에 그려볼 수 있어서, 유전자 또는 샘플의 군집이나 유전자-샘플간 연관작용을 알아보는데 더욱 유용하게 쓰일 수 있다. 본고에서는 마이크로어레이실험에서 행렬도를 이용하여 유전자의 군집 및 연관성을 알아보는 방법을 소개하고, 추가점기법을 이용하여 새로운 샘플을 분류하는 방법을 제안하였다. Golub et al.(1999)의 백혈병 데이터와 Alizadeh et al. (2000)의 림프구데이터, Ross et al.(2000)의 NCI60 종양조직데이터를 이용하여 유용성을 살펴보았으며, 계층적 군집분석 및 k-평균 군집분석 등 다른 기법을 이용한 결과와 비교하고 이러한 기법을 행렬도와 연계하는 방안을 살펴보았다.

Keywords

References

  1. 최용석 (1999). <행렬도의 이해와 응용>, 부산대학교 출판부, 부산
  2. 허명희 (1999). <다변량 수량화>, 자유아카데미, 서울
  3. Alizadeh, A.A., Eisen, M.B., Davis, R.E., Ma, C., Lossos, I.S., Rosenwald, A., Boldrick, J.C., Sabet, H., Tran, T., Yu, X., Powell, J.I.., Yang, L., Marti, G.E., Moore, T., Hudson, Jr J., Lu, L., Lewis, D.B., Tibshirani, R., Sherlock, G., Chan, W.C., Greiner, T.C., Weisenburger, D.D., Armitage, J.O., Warnke, R., Levy, R., Wilson, W., Grever, M.R., Byrd, J.C., Bostein, D., Brown, P.O., and Staudt, L.M.(2000). Different type of diffuse large b-cell lymphoma identified by gene expression profiling, Nature, 403, 503-511Ȁၹ잖⨀滑돀살?⨀塨?⨀頁⨀턄돐灕잖⨀灕잖⨀联잖⨀␀␀耀 ㌀띑ࠄ롛?⨀ၭ⨀Ā Ԁね⨀̀ Āꁋ?⨀尀ꀏԀ쀻?⨀ꀏԀ￿〳?⨀ༀ퀇Ԁ;?⨀耀Ā￿‪?⨀̀퀇Ԁ?⨀耀Ā￿瀲?⨀耀Ā￿䀚?⨀܀퀇Ԁဢ?⨀퀇Ā￿뀑?⨀਀耀Ԁ〒?⨀퀇Ā￿ ?⨀퀇Ԁ￿?⨀耀Ā￿瀐?⨀耀Ā￿?⨀耀Ā￿䃸?⨀퀇Ԁ￿က?⨀퀇Ā￿惨?⨀퀇Ԁ￿ヰ?⨀퀇Ā￿탟?⨀耀Ԁ￿僠?⨀퀇Ā￿?⨀퀇Ԁ￿샗?⨀퀇Ā￿Ⴠ?⨀ꀏԀ￿悠?⨀퀇Ԁ￿エ?⨀퀇Ā￿ https://doi.org/10.1038/35000501
  4. Alon, D., BarKai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D., Levine, A.J. (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays, Proc. Natl. Acad. Sci., 96 , 6745-6750
  5. Alter, O., Brown, P.O., Botstein, D. (2000). Singular value decomposition for genome-wide expression data processing and modeling, Proc. Natl. Acad. Sci., 97, 10101-10106
  6. Brown, P. O. and Bostein, D. (1999). Exploring the new world of genome with DNA microarrays, Nature Genetics, 21(Suppl.l):33-37 https://doi.org/10.1038/4462
  7. Chu, S., DeRisi, J., Eisen, M., Mulholland, J., Botstein, D., Brown, P.O., Herskowitz, I. (1998). The transcriptional program of sporulation in budding yeast, Science, 282, 699-705 https://doi.org/10.1126/science.282.5389.699
  8. Dudoit, S., Fridlyand, J., and Speed, T. P. (2002). Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association, 97, 77-87 https://doi.org/10.1198/016214502753479248
  9. Fellenberg, K., Hauser, N. C., Brors, B., Neutzner, A., Hoheisel, J. (2001). Correspondence analysis applied to microarray data, Proc. Nail. Acad. Sci., 98, 10781-10786
  10. Gabriel, K.R. (1971). The biplot graphic display of matrices with application to principal component analysis, Biometrika, 58, 453-466 https://doi.org/10.1093/biomet/58.3.453
  11. Getz, G., Levine, E. and Domany, E. (2000). Coupled two-way clustering analysis of gene microarray, Proc. Natl. Acad. Sci., 97, 12079-12084
  12. Golub, TR., Slonim, DK., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, JP., Coller, H., Loh, ML., Downing, JR., Caligiuri, MA., Bloomfiled, CD., and Lander, ES.(1999). Molecular classification of cancer: class discovery and class prediction by gene expression monitoring, Science, 286, 531-537 https://doi.org/10.1126/science.286.5439.531
  13. Householder, A.S., Young, G. (1938). Matrix approximation and latent roots, American mathematical Monthly, 45, 165-171 https://doi.org/10.2307/2302980
  14. Holter, N. S., Mitra, M., Maritan, A., Cieplak, M., Banavar, J. R., Fedoroff. (2000). Fundamental patterns underlying gene expression profiles: Simplicity from complexity, Proc. Natl. Acad. Sci., 97, 8409-8414
  15. Jain, A. K., Murty, M. N. and Flymn, P. J. (1999). Data clustering: a review, ACM Computing Survey, 31(3)
  16. Kohonen, T. (1995), Self-Organizing Map, Springer-verlag, Berlin
  17. Kishino, H. and Waddle, P. (2000). Correspondence Analysis of Genes and Tissue Types and Finding Genetic Links from Microarray Data, Genome Informatics, 11, 83-95
  18. Landgrebe, J., Wurst, W. and Welzl, G. (2002). Permutation-validated principal components analyisis of microarray data, Genome Biology, 3, research0019.1-0019.11
  19. Lebart, L., Morineau, A. and Warwick, K. (1984). Multivariate Descriptive Statistical Analysis, New York, John & Wiley
  20. Lipshultz, R.J., Fodor, S., Gengers, T., and Lockhart, D. (1999). High-density synthetic oligonucleotide arrays, Nature Genetics, supplement 21, 20-24 https://doi.org/10.1038/4447
  21. Raychadhuri, S., Stuart, J.M., Altman, R. B. (2000). Principal components analysis to summarize microarray experiments: Application to sporulation time series, Pac Symp Biocomput, 455-466
  22. Ross, DT., Scherf, D., Eisen, MB., Perou, CM., Rees, C., Spellman, P., Iyer, V., Jeffrey, SS., Van de Rijn, M., Waltham, M., Pergamenschikov, A., Lee, JC., Lashkari, D., Shalon, D., Myers, TG., Weinstein, IN., Botstein, D., and Brown, PO. (2000). Systematic variation in gene expression patterns in human cancer cell lines, Nat Genet., Mar24-3; 227-235
  23. Spellman, P.T., Sherlock, G., Zhang, M.Q., Iyer, V. R., Anders, K., Eisen, M. B., Brown, P.O., Botstein, D., Futcher, B. (1998). Comprehensive identification of cell cycleregulated ganes of the yeast Saccharomyces cerevisiae by microarray hybridization, Mol. Biol. Cell, 9, 3273-3297 https://doi.org/10.1091/mbc.9.12.3273
  24. Tamayo, P., Slonim, D., mesirov, J., Zhu, Q., Kitareewan, S., Dmitrovsky, E., Lander, E.S., Golub, T.R. (1999). Interpreting patterns of gene expression with self-organizing maps: Methods and application to hematopoietic differentiation, Proc. Natl. Acad. Sci., 96, 2907-2912
  25. Tibshirani, R., Hastie, T., Eisen, M., Ross, D., Botstein, D., Brown, P. (1999). Clustering methods for the analysis of DNA micro array data, Tech Report, Dept. of Health Research and Policy, Stanford Univ., www.stat.stanford.edu/-tibs/lab/publications.html
  26. Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P.O., Hastie, T., Tibshiranni, R., Bostein, D. and Altman, R. B. (2001). Missing value estimation methods for DNA microarrays, Bioinformatics, 17, 520-525 https://doi.org/10.1093/bioinformatics/17.6.520