Gene Expression Data Analysis Using Parallel Processor based Pattern Classification Method

병렬 프로세서 기반의 패턴 분류 기법을 이용한 유전자 발현 데이터 분석

  • Choi, Sun-Wook (Dept. of Information Engineering, Inha University) ;
  • Lee, Chong-Ho (Dept. of Information Engineering, Inha University)
  • 최선욱 (인하대학교 정보공학과) ;
  • 이종호 (인하대학교 정보공학과)
  • Published : 2009.11.25

Abstract

Diagnosis of diseases using gene expression data obtained from microarray chip is an active research area recently. It has been done by general machine learning algorithms, because it is difficult to analyze directly. However, recent research results about the analysis based on the interaction between genes is essential for the gene expression analysis, which means the analysis using the traditional machine learning algorithms has limitations. In this paper, we classify the gene expression data using the hyper-network model that considers the higher-order correlations between the features, and then compares the classification accuracies. And also, we present the new hypo-network model that improve the disadvantage of existing model, and compare the processing performances of the existing hypo-network model based on general sequential processor and the improved hypo-network model implemented on parallel processors. In the experimental results, we show that the performance of our model shows improved and competitive classification performance than traditional machine learning methods, as well as, the existing hypo-network model. We show that the performance is maximized when the hypernetwork model is implemented on our parallel processors.

최근 활발히 연구가 진행 중인 마이크로어레이로부터 얻어지는 유전자 발현 데이터를 이용한 질병 진단은, 데이터를 직접적으로 분석하기 힘들기 때문에 일반적으로 기계 학습 알고리즘을 사용하여 이루어져왔다. 그러나 유전자 발현 데이터를 분석함에 있어서 유전자들 간의 상호작용을 고려하는 분석이 필요하다는 최근의 연구 결과들은 기존 기계 학습 알고리즘들을 이용한 분석에 한계가 있음을 의미한다고 볼 수 있다. 본 논문에서는 특징들 사이의 고차원 상관관계를 고려 가능한 하이퍼네트워크 모델을 이용하여 유전자 발현 데이터의 분류를 수행하고 기존의 기계 학습 알고리즘들과 분류 성능을 비교한다. 또한 기존 하이퍼네트워크 모델의 단점을 개선 한 모델을 제안하고, 이를 병렬 프로세서 상에서 구현하여 처리 성능을 비교한다. 실험 결과 제안 된 모델은 기존의 기계 학습 방법들과의 비교에서도 경쟁력 있는 분류 성능을 보여주었고, 기존 하이퍼네트워크 모델 보다 안정적이고 향상된 분류 성능을 보여주었다. 또한 이를 병렬 프로세서 상에서 구현 할 경우 처리 성능을 극대화 할 수 있음을 보였다.

Keywords

References

  1. Klebanov, L., and Yakovlev, A., 'Diverse correlation structures in gene expression data and their utility in improving statistical inference', The Annals of Applied Statistics, 2, pp. 538-559, 2007
  2. Duggan, D.J. and Bittner, M. and Chen, Y. and Meltzer, P. and Trent, J.M., 'Expression profiling using cDNA microarrays', Nature genetics, Vol. 21, pp. 10-14, 1999. https://doi.org/10.1038/4434
  3. Zhang, B.T., 'Hypernetworks: A molecular evolutio nary architecture for cognitive learning and memory', Computational Intelligence Magazine, IEEE 3(3), pp. 49-63, 2008 https://doi.org/10.1109/MCI.2008.926615
  4. Golub, T. and Slonim, D. and Tamayo, P. and Huard, C. and Gaasenbeek, M. and Mesirov, J. and Coller, H. and Loh, M. and Downing, J. and Caligiuri, M., et al., 'Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring', Science 286(5439), pp. 531-537, 1999 https://doi.org/10.1126/science.286.5439.531
  5. Khan, J. and Wei, J. and Ringner, M. and Saal, L. and Ladanyi, M. and Westermann, F. and Berthold, F. and Schwab, M. and Antonescu, C. and Peterson, C., et al., 'Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks', Nature Medicine 7, pp. 673-679, 2001 https://doi.org/10.1038/89044
  6. Wee, J.W. and Lee, C.H., 'Concurrent Support Vector Machine processor for disease diagnosis', Lecture Notes in Computer Science, vol. 3316, pp. 1129-1134, 2004 https://doi.org/10.1007/978-3-540-30499-9_175
  7. I. Pournara, C.S. Bouganis, G.A. Constantinides, 'FPGA-accelerated Bayesian learning for reconstruction of gene regulatory networks', in proceeding of the 15th International Conference on Field Programmable Logic and Applications, pp. 323-328, Tampere, Finland, 2005
  8. Dar-Jen Chang, Ahmed H. Desoky, Ming Ouyang, Eric C. Rouchka, 'Compute Pairwise Manhattan Distance and Pearson Correlation Coefficient of Data Points with GPU', in proceeding of the 10th ACIS International Conference on Software Engineering, Artificial Intelligences, Networking and Parallel/Distributed Computing, pp. 501–506, Daegu, Korea, 2009 https://doi.org/10.1109/SNPD.2009.34
  9. Manavski, S. and Valle, G., 'CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment', BMC bioinformatics, 9(Suppl 2):S10, 2008
  10. Zhang, B.T. and Jang, H.Y., 'A bayesian algorithm for in vitro molecular evolution of pattern classifiers', Lecture Notes in Computer Science. vol. 3384, pp. 458-467, 2002 https://doi.org/10.1007/11493785_39
  11. Tkacik, T., 'A hardware random number generator', Lecture Notes in Computer Science, vol. 2523, pp. 450-453, 2003 https://doi.org/10.1007/3-540-36400-5_32
  12. Harris, M., 'Optimizing parallel reduction in CUDA', CUDA Advanced Topics, CUDA ZONE, 2008
  13. Yeoh, E. and Ross, M. and Shurtleff, S. and Williams, W. and Patel, D. and Mahfouz, R. and Behm, F. and Raimondi, S. and Relling, M. and Patel, A. and et al., 'Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling', Cancer Cell, 1(2), pp. 133-143, 2002 https://doi.org/10.1016/S1535-6108(02)00032-6
  14. Bhattacharjee, A. and Richards, W. and Staunton, J. and Li, C. and Monti, S. and Vasa, P. and Ladd, C. and Beheshti, J. and Bueno, R. and Gillette, M., et al., 'Classification of human lung carcinomas by mrna expression profiling reveals distinct adenocarcinoma subclasses', Proceedings of the National Academy of Sciences, pp. 13790-13795, 2001