DOI QR코드

DOI QR Code

데이터마이닝을 활용한 유전자 질병 분석을 위한 MKSV시스템 구현

For Gene Disease Analysis using Data Mining Implement MKSV System

  • 정유정 (조선대학교 sw융합교육원) ;
  • 최광미 (조선대학교 sw융합교육원)
  • 투고 : 2019.06.30
  • 심사 : 2019.08.15
  • 발행 : 2019.08.31

초록

오늘날 다양한 생명현상을 다루고있는 질병연구와 같은 효율적인 목적을 달성하기 위해서는 이들 연구로부터 획득한 빅데이터를 처리하여 효과적인 현실적 가치를 부여할 수 있어야 한다. 본 논문에서 제안한 MKSV알고리즘은 최적의 확률분포를 추정하여 입력패턴을 결정 한 후 데이터마이닝 기법으로 분류한 결과 효율적인 계산량과 인식률을 획득할 수 있었다. MKSV 알고리즘은 유전자 데이터의 확률적 흐름을 시뮬레이션하여 빅데이터의 데이터마이닝 과정을 통해 데이터를 분류하여 빠르고 효과적인 성능 향상을 보임으로써 현 사회에 급증하는 질병과 유전자의 관련성을 연구하는 데 유용할 것이다.

We should give a realistic value on the large amounts of relevant data obtained from these studies to achieve effective objectives of the disease study which is dealing with various vital phenomenon today. In this paper, the proposed MKSV algorithm is estimated by optimal probability distribution, and the input pattern is determined. After classifying it into data mining, it is possible to obtain efficient computational quantity and recognition rate. MKSV algorithm is useful for studying the relationship between disease and gene in the present society by simulating the probabilistic flow of gene data and showing fast and effective performance improvement to classify data through the data mining process of big data.

키워드

KCTSAD_2019_v14n4_781_f0001.png 이미지

그림 1. 유클리디안 변환 테이블 Fig. 1 Euclidean transformation table

KCTSAD_2019_v14n4_781_f0002.png 이미지

그림 2. K=4 일때 분류 샘플 유전자 표현 이미지 Fig. 2 K=4 classification sample gene expression image

KCTSAD_2019_v14n4_781_f0003.png 이미지

그림 3. K-means, SOM, MKSV의 민감도 특이도 Fig 3. Sensitivity specificity of K-means, SOM and MKSV

참고문헌

  1. E. Kim, J. Jeong, and B. Lee, "A Big Data Based Random Motif Frequency Method for Analyzing Human Proteins," J. of the Korea Institute of Electronic Communication Sciences, vol. 13, no. 6, 2018, pp. 1397-1404. https://doi.org/10.13067/JKIECS.2018.13.6.1397
  2. G. Park and Y. Bae, "Performance Comparison of Machine Learning in the Various Kind of Prediction,"J. of the Korea Institute of Electronic Communication Sciences, vol. 14, no. 1, 2019, pp. 169-178. https://doi.org/10.13067/JKIECS.2019.14.1.169
  3. J. Michael and S. Gordon, Data Mining Techniques:For Marketing,Sales, and Customer Relationship Management. Seoul: Hankyung Corporation, 2010, pp. 10-15.
  4. S. Kim, Y. Kim, and R. Kim, Convergence & consilience: communication research methods in a multiple media environment. 발행도시: Korean J. of Communication Science, 2012, pp .53-81.
  5. G. Shmueli, R. Patel, and C. Bruce, Data Mining for Business Intelligence:Concepts, and Applications in Microsoft Office Excel with XLMine. 발행도시: Wiley, 2010.
  6. J. Ryu, S. Kim, J. Park, and J. Lee, "Risk Factors of Impaired Fasting Glucose and Type 2 Diabetes Mellitus - Using Datamining," Korea epidemiological society, vol. 28, no. 2, 2006, pp. 138-151.
  7. Y. Kim, "Screening test data analysis for liver disease prediction model using growth curve," Master's Thesis, Yonsei University, 2002, pp. 1-68.
  8. K. Lee, S. Park, S. Kang, and H. Kang, "Development of Prediction Model for Diabetes Mellitus Using Data Mining Technique," Korean Journal of Health Policy and Administration, vol. 16, no. 2, 2006, pp. 21-48. https://doi.org/10.4332/KJHPA.2006.16.2.021
  9. Y. Kim, "Development of advertising effect prediction model for celebrity models using Big Data," Master's Thesis, Hanyang University, 2019.
  10. T. Kohonen, "Exploration of very large databases by self-organizing maps," In Proc. Conf. on Neural Networks, Houston, TX, USA, 1997.
  11. H. Han, "Introduction to pattern recognitio," hanbit media, 2011.
  12. C. Yoo and C. Park, "Single channel subband blind source separation using temporal dependency of speech via viterbi algorithm," Master's Thesis, Korea Advanced Institute of Science and Technology, 2005.
  13. L. Tao, C. Zhang, and O. Mitsunori, "A comparative study of feature selection and multiclass classfication methods for tissue classification based on gene expression," Bioinformatics, vol. 20, issue 15, 2004, pp. 2429-2437. https://doi.org/10.1093/bioinformatics/bth267
  14. C. Hsu and C. Lin, "A Comparison of Methods for Multiclass Support Vector Machines," Trans. of the IEEE, on Neural Networks, vol. 13, no. 2, Mar. 2002, pp. 415-425. https://doi.org/10.1109/72.991427