For Gene Disease Analysis using Data Mining Implement MKSV System

데이터마이닝을 활용한 유전자 질병 분석을 위한 MKSV시스템 구현

  • 정유정 (조선대학교 sw융합교육원) ;
  • 최광미 (조선대학교 sw융합교육원)
  • Received : 2019.06.30
  • Accepted : 2019.08.15
  • Published : 2019.08.31


We should give a realistic value on the large amounts of relevant data obtained from these studies to achieve effective objectives of the disease study which is dealing with various vital phenomenon today. In this paper, the proposed MKSV algorithm is estimated by optimal probability distribution, and the input pattern is determined. After classifying it into data mining, it is possible to obtain efficient computational quantity and recognition rate. MKSV algorithm is useful for studying the relationship between disease and gene in the present society by simulating the probabilistic flow of gene data and showing fast and effective performance improvement to classify data through the data mining process of big data.

KCTSAD_2019_v14n4_781_f0001.png 이미지

그림 1. 유클리디안 변환 테이블 Fig. 1 Euclidean transformation table

KCTSAD_2019_v14n4_781_f0002.png 이미지

그림 2. K=4 일때 분류 샘플 유전자 표현 이미지 Fig. 2 K=4 classification sample gene expression image

KCTSAD_2019_v14n4_781_f0003.png 이미지

그림 3. K-means, SOM, MKSV의 민감도 특이도 Fig 3. Sensitivity specificity of K-means, SOM and MKSV


  1. E. Kim, J. Jeong, and B. Lee, "A Big Data Based Random Motif Frequency Method for Analyzing Human Proteins," J. of the Korea Institute of Electronic Communication Sciences, vol. 13, no. 6, 2018, pp. 1397-1404.
  2. G. Park and Y. Bae, "Performance Comparison of Machine Learning in the Various Kind of Prediction,"J. of the Korea Institute of Electronic Communication Sciences, vol. 14, no. 1, 2019, pp. 169-178.
  3. J. Michael and S. Gordon, Data Mining Techniques:For Marketing,Sales, and Customer Relationship Management. Seoul: Hankyung Corporation, 2010, pp. 10-15.
  4. S. Kim, Y. Kim, and R. Kim, Convergence & consilience: communication research methods in a multiple media environment. 발행도시: Korean J. of Communication Science, 2012, pp .53-81.
  5. G. Shmueli, R. Patel, and C. Bruce, Data Mining for Business Intelligence:Concepts, and Applications in Microsoft Office Excel with XLMine. 발행도시: Wiley, 2010.
  6. J. Ryu, S. Kim, J. Park, and J. Lee, "Risk Factors of Impaired Fasting Glucose and Type 2 Diabetes Mellitus - Using Datamining," Korea epidemiological society, vol. 28, no. 2, 2006, pp. 138-151.
  7. Y. Kim, "Screening test data analysis for liver disease prediction model using growth curve," Master's Thesis, Yonsei University, 2002, pp. 1-68.
  8. K. Lee, S. Park, S. Kang, and H. Kang, "Development of Prediction Model for Diabetes Mellitus Using Data Mining Technique," Korean Journal of Health Policy and Administration, vol. 16, no. 2, 2006, pp. 21-48.
  9. Y. Kim, "Development of advertising effect prediction model for celebrity models using Big Data," Master's Thesis, Hanyang University, 2019.
  10. T. Kohonen, "Exploration of very large databases by self-organizing maps," In Proc. Conf. on Neural Networks, Houston, TX, USA, 1997.
  11. H. Han, "Introduction to pattern recognitio," hanbit media, 2011.
  12. C. Yoo and C. Park, "Single channel subband blind source separation using temporal dependency of speech via viterbi algorithm," Master's Thesis, Korea Advanced Institute of Science and Technology, 2005.
  13. L. Tao, C. Zhang, and O. Mitsunori, "A comparative study of feature selection and multiclass classfication methods for tissue classification based on gene expression," Bioinformatics, vol. 20, issue 15, 2004, pp. 2429-2437.
  14. C. Hsu and C. Lin, "A Comparison of Methods for Multiclass Support Vector Machines," Trans. of the IEEE, on Neural Networks, vol. 13, no. 2, Mar. 2002, pp. 415-425.