DOI QR코드

DOI QR Code

A study of methodology for identification models of cardiovascular diseases based on data mining

데이터마이닝을 이용한 심혈관질환 판별 모델 방법론 연구

  • Lee, Bum Ju (Digital Health Research Division, Korea Institute of Oriental Medicine)
  • Received : 2022.05.24
  • Accepted : 2022.07.02
  • Published : 2022.07.31

Abstract

Cardiovascular diseases is one of the leading causes of death in the world. The objectives of this study were to build various models using sociodemographic variables based on three variable selection methods and seven machine learning algorithms for the identification of hypertension and dyslipidemia and to evaluate predictive powers of the models. In experiments based on full variables and correlation-based feature subset selection methods, our results showed that performance of models using naive Bayes was better than those of models using other machine learning algorithms in both two diseases. In wrapper-based feature subset selection method, performance of models using logistic regression was higher than those of models using other algorithms. Our finding may provide basic data for public health and machine learning fields.

심혈관 질환은 전 세계적으로 주요 사망원인들 중 하나이다. 본 연구는 보다 우수한 심혈관질환 판별 모델을 생성하기 위한 방법에 대한 연구로써, 3가지 변수 선택법과 7가지 머신러닝 알고리즘을 바탕으로 사회인구학적 변수들을 이용하여 고혈압과 이상지질혈증 판별모델들을 생성하고, 생성된 모델들의 성능을 비교 평가한다. 본 연구의 결과에서는 두 가지 질병 모두에서, 전체변수 및 correlation-based feature subset selection 메소드 기반 모델들에서는 naive Bayes 모델이 다른 머신러닝을 이용한 모델들보다 다소 우수한 판별 성능이 있는 것으로 나타났고, wrapper 메소드 기반 변수 선택법에서는 logistic regression 모델이 다른 모든 모델보다 성능이 다소 우수한 것으로 나타났다. 본 연구의 결과는 원격의료 및 대중보건 분야에서 향후 한국인의 심혈관질환 판별 및 예측 모델 생성을 위한 참고자료로 활용될 수 있을 것으로 기대된다.

Keywords

Acknowledgement

이 논문은 2021년도 정보(과학기술정보통신부)의 재원으로 정보통신기획평가원의 지원을 받아 수행된 연구임 (No.2021-0-00104), 비대면 심혈관 건강관리를 위한 디지털헬스서비스 플랫폼 개발

References

  1. H.H. Lee, S.M.J. Cho, H. Lee, J. Baek, J.H. Bae, W.J. Chung, H.C. Kim, "Korea Heart Disease Fact Sheet 2020: Analysis of Nationwide Data," Korean circulation journal, Vol. 51, No. 6, pp. 495-503, 2021. DOI.org/10.4070/kcj.2021.0097
  2. B.J. Lee, J.Y. Kim, "A Comparison of the Predictive Power of Anthropometric Indices for Hypertension and Hypotension Risk," PLoS ONE, Vol. 90, No. 1, pp. e84897, 2014. DOI.org/10.1371/journal.pone.0084897
  3. B.J. Lee, B. Ku, "A comparison of trunk circumference and width indices for hypertension and type 2 diabetes in a large-scale screening: a retrospective cross-sectional study," Scientific Reports, Vol. 8, pp. 13284, 2018. DOI.org/10.1038/s41598-018-31624-x
  4. J.H. Chi, B.J. Lee, "Risk factors for hypertension and diabetes comorbidity in a Korean population: A cross-sectional study," PLoS ONE Vol. 17, No. 1, pp. e0262757, 2022. DOI.org/10.1371/journal.pone.0262757
  5. C.F. Lin, Y.H. Chang, S.C. Chien, Y.H. Lin, H.T. Yeh, "Epidemiology of Dyslipidemia in the Asia Pacific Region," International Journal of Gerontology, Vol. 12, No. 1, pp. 2-6, 2018. DOI.org/10.1016/j.ijge.2018.02.010
  6. B.J. Lee, "Prediction Model of Hypertension Using Sociodemographic Characteristics Based on Machine Learning," KIPS Transactions on Software and Data Engineering, Vol. 10, No. 11, pp. 541-546, 2021. DOI.org/10.3745/KTSDE.2021.10.11.541
  7. M.H. Kim, J.H. Seo, J.Y. Lee, "Nomogram building to predict dyslipidemia using a naive Bayesian classifier model," The Korean Journal of Applied Statistics, Vol. 32, No. 4, pp. 619-630, 2019. DOI.org/10.5351/KJAS.2019.32.4.619
  8. Y.H. Lee, E.M. Kwak, M. Jo, "Factors affecting cardiovascular disease in Korea adults: Focusing on smoking behavior including urine cotnine and health behaviors," The Journal of the Convergence on Culture Technology, Vol. 7, No. 3, pp. 293-301, 2021. DOI.org/10.17703/JCCT.2021.7.3.293
  9. X. Z, F. Tang, J. Ji, W. Han, P. Lu, "Risk Prediction of Dyslipidemia for Chinese Han Adults Using Random Forest Survival Model," Clinical Epidemioloy. Vol. 11, pp. 1047-1055, 2019. DOI.org/10.2147/CLEP.S223694