DOI QR코드

DOI QR Code

Applying Particle Swarm Optimization for Enhanced Clustering of DNA Chip Data

DNA Chip 데이터의 군집화 성능 향상을 위한 Particle Swarm Optimization 알고리즘의 적용기법

  • 이민수 (이화여자대학교 컴퓨터공학과)
  • Received : 2009.07.09
  • Accepted : 2010.04.13
  • Published : 2010.06.30

Abstract

Experiments and research on genes have become very convenient by using DNA chips, which provide large amounts of data from various experiments. The data provided by the DNA chips could be represented as a two dimensional matrix, in which one axis represents genes and the other represents samples. By performing an efficient and good quality clustering on such data, the classification work which follows could be more efficient and accurate. In this paper, we use a bio-inspired algorithm called the Particle Swarm Optimization algorithm to propose an efficient clustering mechanism for large amounts of DNA chip data, and show through experimental results that the clustering technique using the PSO algorithm provides a faster yet good quality result compared with other existing clustering solutions.

최근 DNA 칩의 등장으로 유전자 관련 실험과 연구가 매우 용이해졌으며 이를 활용한 다양한 실험 결과로 대량의 데이터가 제공되고 있다. DNA칩에 의해 제공된 데이터는 2차원 행렬로 표현되며 하나의 축은 유전자를 나타내고 다른 하나의 축은 샘플정보를 나타낸다. 이러한 데이터에 대하여 빠른 시간 안에 좋은 품질의 군집화를 수행함으로써 이후의 분석 단계인 분류화 작업의 정확도와 효율성을 높일 수 있다. 본 논문에서는 생태계 모방 알고리즘의 하나인 Particle Swarm Optimization 알고리즘을 사용하여 방대한 양의 DNA칩 데이터에 대한 효율적인 군집화 기법을 제안하였으며 실험을 통해서 PSO 기반의 군집화 알고리즘이 기존의 군집화 알고리즘들보다 수행속도 및 품질 면에서 우수한 성능을 가짐을 보였다.

Keywords

References

  1. Dinesh Singh, Phillip G. Febbo, et al, Gene expression correlates of clinical prostate cancer behavior, the Center Cell vol.1, issue2, March 200, pp.200-209 https://doi.org/10.1016/S1535-6108(02)00030-2
  2. Barrett MT, Scheffer A, et al, Comparative genomic hybridization using oligonucleotide microarrays and total genomic DNA, Proc. National Academy of Sciences USA, 2004 Dec. 21
  3. D.J. Lockhart, H.L. Dong, M.C. Byrne, M.T. Follettie, M.V. Gallo, M.S. Chee, M. Mittmann, C.W. Wang, M. Kobayashi, H. Horton, E.L. Brown, Expression monitoring by hybridization to high-density oligonucleotide arrays, Nature Biotechnology, Vol.14 No.13, pp.1675-1680, 1996. https://doi.org/10.1038/nbt1296-1675
  4. J.L. DeRisi, V.R. Iver, P.O. Brown, Exploring the metabolic and genetic control of gene expression on a genomic scale, Science, Vol.278 No.5338, pp.680-686, 1997. https://doi.org/10.1126/science.278.5338.680
  5. C. Debouck, P.N. Goodfellow, DNA microarrays in drug discovery and development, Nature Genetics, Vol.21 No.1 suppl, pp.48-50, 1999. https://doi.org/10.1038/4475
  6. D. Bowtell, J. Sambrook, DNA Microarrays, CSHL Press, 2002.
  7. E.K. Tang, P.N. Suganthan and X.Yao, Feature Selection for Microarray Data Using Least Squares SVM and Particle Swarm Optimization, Computational Intelligence in Bioinformatics and Computational Biology(CIBCB), pp.1-8, 2005
  8. Qi Shen, Wei-Min Shi, Wei Kong, Bao-Xian Ye, A combination of modified particle swarm optimization algorithm and support vector machine for gene selection and tumor classification, Talanta, Vol.71, No.4, pp.1679-1683, 2007 https://doi.org/10.1016/j.talanta.2006.07.047
  9. Hualong Yu, Guochan Gu, Haibo Liu, Jing Shen, Changming Zhu, A Novel Discrete Particle Swarm Optimization Algorithm for Microarray Data-based Tumor Marker Gene Selection, International Conference on Computer Science and Software Engineering, pp.1057-1060, 2008 https://doi.org/10.1109/CSSE.2008.631
  10. M.B. Eisen, P.T. Spellman, P.O. Browndagger, D. Botstein, Cluster analysis and display of genome-wide expression patterns, Proceedings of the National Academy of Sciences of the United States of America (PNAS), Vol.95, No.25, pp.14863-14868, 1998. https://doi.org/10.1073/pnas.95.25.14863
  11. X. Xiao, E.R. Dow, R. Eberhart, Z.B. Miled, R.J. Oppelt, Gene Clustering Using Self-Organizing Maps and Particle Swarm Optimization, IEEE International Workshop On High Performance Computational Biology, 2003.
  12. I.D. Falco, A.D. Cioppa, E. Tarantino, Facing classification problems with Particle Swarm Optimization, Soft Computing, Vol.7, No.3, pp.652-658, June 2007. https://doi.org/10.1016/j.asoc.2005.09.004
  13. DNA chip. http://mbel.kaist.ac.kr/research/DNAchip_en.html
  14. J. Han, M. Kamber, Data Mining Concepts and Techniques, Morgan Kaufmann, 2001.
  15. I.H. Witten, E. Frank, Data Mining : Practical Machine Learning Tools and Techniques, 2nd edition, Morgan Kaufmann, 2005.
  16. J.H. Holland, Adaptation in Natural and Artificial Systems, MIT Press, Cambridge, MA, 1992.
  17. Sudipto Guha, Rajeev Rastogi, Kyuscok Shim, CURE: And Efficient clustering Algorithm for Large Databases, Proc. ACM SIGMOD Int Conf. on Management of Data, pp.73-84, New York, 1998 https://doi.org/10.1145/276304.276312
  18. K.E. Parsopoulos, M.N. Vrahatis, Recent approaches to global optimization problems through Particle Swarm Optimization, Natural Computing Vol.1, No.2-3, pp.235-306, June 2002. https://doi.org/10.1023/A:1016568309421
  19. Y. Shi, R.C. Eberhart, Parameter selection in particle swarm optimization, Proceedings of Evolutionary Programming VII, pp.591-600, 1998