A Method for Microarray Data Analysis based on Bayesian Networks using an Efficient Structural learning Algorithm and Data Dimensionality Reduction

;;;

Journal of KIISE:Software and Applications (한국정보과학회논문지:소프트웨어및응용)

Volume 29 Issue 11
/
Pages.775-784
/
2002
/
1229-6848(pISSN)

Korean Institute of Information Scientists and Engineers (한국정보과학회)

A Method for Microarray Data Analysis based on Bayesian Networks using an Efficient Structural learning Algorithm and Data Dimensionality Reduction

효율적 구조 학습 알고리즘과 데이타 차원축소를 통한 베이지안망 기반의 마이크로어레이 데이타 분석법

황규백 (서울대학교 컴퓨터공학부) ;
장정호 (서울대학교 컴퓨터공학부) ;
장병탁 (서울대학교 컴퓨터공학부)

Published : 2002.12.01

PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Microarray data, obtained from DNA chip technologies, is the measurement of the expression level of thousands of genes in cells or tissues. It is used for gene function prediction or cancer diagnosis based on gene expression patterns. Among diverse methods for data analysis, the Bayesian network represents the relationships among data attributes in the form of a graph structure. This property enables us to discover various relations among genes and the characteristics of the tissue (e.g., the cancer type) through microarray data analysis. However, most of the present microarray data sets are so sparse that it is difficult to apply general analysis methods, including Bayesian networks, directly. In this paper, we harness an efficient structural learning algorithm and data dimensionality reduction in order to analyze microarray data using Bayesian networks. The proposed method was applied to the analysis of real microarray data, i.e., the NC160 data set. And its usefulness was evaluated based on the accuracy of the teamed Bayesian networks on representing the known biological facts.

DNA chip 기술에 의해 얻어지는 마이크로어레이(microarray) 데이타는 세포나 조직 내의 수천 개 유전자의 발현도(expression level)를 한번에 측정한 것으로, 유전자 발현 양상에 기반한 암의 진단, 유전자의 기능 예측 등에 이용되고 있다. 다양한 데이타 분석 기법들 중 베이지안망(Bayesian network)은 데이타의 각 속성들간의 관계를 그래프 형태로 표현할 수 있는 특징을 가지고 있다. 이는 마이크로어레이 데이타의 분석을 통해 여러 유전자와 조직의 특성(암의 종류 등) 사이의 관계를 밝히는데 유용하다 하지만 대부분의 마이크로어레이 데이타는 sparse data로 베이지안망을 비롯한 각종 분석 기법의 적용을 어렵게 하고 있다. 본 논문에서는 베이지안망에 기반한 마이크로어레이 데이타 분석을 위해 효율적 구조 학습 알고리즘과 데이타 차원 축소를 이용한다. 제시되는 분석법은 실제 마이크로어레이 데이타인 NC160 data set에 적용되었으며, 그 유용성은 데이타로부터 학습된 베이지안망이 실제 생물학적으로 알려진 사실들을 어느 정도 정확하게 표현하는지에 의해 평가되었다.

Keywords

References

Schena, M. (ed.), Microarray Biochip Technology, Eaton Publishing, MA, 2000
Eisen, M.B., Spellman, P.T., Brown, P.O., and Botstein, D., Cluster analysis and display of genome-wide expression patterns, Proceedings of the National Academy of Sciences of the United States of America, vol. 95, no. 25, pp. 14863-14868, 1998 https://doi.org/10.1073/pnas.95.25.14863
Raychaudhuri, S., Stuart, J.M., and Altman, R.B., Principal components analysis to summarize microarray experiments: application to sporulation time series, Pacific Symposium on Biocomputing 5 (Proceedings of PSB'00), pp. 452-463, 1999
Khan, J., Wei, J.S., Ringner, M., Saal, L.H., Ladanyi, M., Westermann, F., Berthold, F., Schwab, M., Antonescu, C.R., Peterson, C, and Meltzer, P.S., Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks, Nature Medicine, vol. 7, no. 6, pp. 673-679, 2001 https://doi.org/10.1038/89044
Friedman, N., Linial, M., Nachman, I., and Pe'er, D., Using Bayesian networks to analyze expression data, In Proceedings of the 4th Annual International Conference on Computational Molecular Biology (RECOMB'00), pp. 127-135, 2000 https://doi.org/10.1145/332306.332355
Hartemink, A.J., Gifford, D.K., Jaakkola, T.S., and Young, R.A., Using graphical models and genomic expression data to statistically validate models of genetic regulatory networks, Pacific Symposium on Biocomputing 6 (Proceedings of PSB'01), pp. 422-433, 2000
Hartemink, A.J., Gifford, D.K., Jaakkola, T.S., and Young, R.A., Combining location and expression data for principled discovery of genetic regulatory network models, Pacific Symposium on Biocomputing 7 (Proceedings of PSB'02), pp. 437-449, 2001
Hwang, K.-B., Cho, D.-Y., Park, S.-W., Kim, S.-D., and Zhang, B.-T., Applying machine learning techniques to analysis of gene expression data: cancer diagnosis, Lin, S.M. and Johnson, K.F. (eds.), Methods of Microarray Data Analysis (Proceedings of CAMDA'00), Kluwer Academic Publishers, MA, pp. 167-182, 2002
Leping, L., Pedersen, L.G., Darden, T.A., and Weinberg, C.R., Computational analysis of leukemia microarray expression data using the GA/KNN method, Lin, S.M. and Johnson, K.F. (eds.), Methods of Microarray Data Analysis (Proceedings of CAMDA'00), Kluwer Academic Publishers, MA, pp. 81-95, 2002
Spellman, P.T., Sherlock, G., Zhang, M.Q., Iyer, V.R., Anders, K., Eisen, M.B., Brown, P.O., Botstein, D., and Futcher, B., Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization, Molecular Biology of the Cell, vol. 9, no. 12, pp. 3273-3297, 1998 https://doi.org/10.1091/mbc.9.12.3273
Scherf, U., Ross, D.T., Waltham, M., Smith, L.H., Lee, J.K., Tanabe, L., Kohn, K.W., Reinhold, W.C., Myers, T.G., Andrews, D.T., Scudiero, D.A., Eisen, M.B., Sausville, E.A., Pommier, Y., Botstein, D., Brown, P.O., and Weinstein, J.N., A gene expression database for the molecular pharmacology of cancer, Nature Genetics, vol. 24, no. 3, pp. 236-244, 2000 https://doi.org/10.1038/73439
Jensen, F.V., An Introduction to Bayesian Networks, Springer-Verlag, NY, 1996
Heckerman, D., A tutorial on learning with Bayesian networks, Jordan, M.I. (ed.), Learning in Graphical Models, MIT Press, MA, pp. 301-354, 1999
Friedman, N. and Goldszmidt, M., Learning Bayesian networks with local structure, Jordan, M.I. (ed.), Learning in Graphical Models, MIT Press, MA, pp. 421-459, 1999
Heckerman, D., Geiger, D., and Chickering, D.M., Learning Bayesian networks: the combination of knowledge and statistical data, Machine Learning, vol. 20, no. 3, pp. 197-243, 1995
Chickering, D.M., Learning Bayesian networks is NP-complete, Fisher, D. and Lenz, H.-J. (eds.), Learning from Data: Artificial Intelligence and Statistics V, Springer-Verlag, NY, pp. 121-130, 1996
Cooper, G.F., Computational complexity of probabilistic inference using Bayesian belief networks, Artificial Intelligence, vol. 42, no. 2-3, pp. 393-405, 1990 https://doi.org/10.1016/0004-3702(90)90060-D
Dagum, P. and Luby, M., Approximating probabilistic inference in Bayesian belief networks is NP-hard, Artificial Intelligence, vol. 60, no. 1, pp. 141-153, 1993 https://doi.org/10.1016/0004-3702(93)90036-B
Pearl, J., Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, Morgan Kaufmann Publishers, CA, 1988
Spirtes, P., Glymour, C, and Scheines, R., Causation, Prediction, and Search, 2nd edition, MIT Press, MA, 2000
Friedman, N., Nachman, I., and Pe'er, D., Learning Bayesian network structure from massive datasets: the 'sparse candidate' algorithm, In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence(UAI'99), pp. 206-215, 1999
Friedman, N., Goldszmidt, M., and Wyner, A., Data analysis with Bayesian networks: a bootstrap approach, In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence(UAI'99), pp. 196-205, 1999
Hwang, K.-B., Lee, J.W, Chung, S.-W, and Zhang, B.-T., Construction of large-scale Bayesian networks by local to global search, Lecture Notes in Artificial Intelligence (Proceedings of PRICAT02), vol. 2417, pp. 375- 384, 2002
Graepel, T., Burger, M., and Obermayer, K., Self-organizing maps: generalizations and new optimization techniques, Neurocomputing, vol. 21, pp. 173-190, 1998 https://doi.org/10.1016/S0925-2312(98)00035-6
Dempster, A.P., Laird, N.M., and Rubin, D.B., Maximum likelihood from incomplete data via the EM algorithm(with discussion), Journal of Royal Statistical Society B, vol. 39, no. 1, pp. 1-38, 1977
Zhang. B.-T. and Cho, D-Y, System identification using evolutionary Markov chain Monte Carlo, Journal of Systems Architecture, vol. 47, no. 7, pp. 587-599, 2001 https://doi.org/10.1016/S1383-7621(01)00017-0

Journal of KIISE:Software and Applications (한국정보과학회논문지:소프트웨어및응용)

A Method for Microarray Data Analysis based on Bayesian Networks using an Efficient Structural learning Algorithm and Data Dimensionality Reduction

효율적 구조 학습 알고리즘과 데이타 차원축소를 통한 베이지안망 기반의 마이크로어레이 데이타 분석법

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)