DOI QR코드

DOI QR Code

고혈압 위험 예측에 적용된 특징 선택 방법의 비교

Comparison of Feature Selection Methods Applied on Risk Prediction for Hypertension

  • 투고 : 2021.06.30
  • 심사 : 2021.08.19
  • 발행 : 2022.03.31

초록

본 논문에서는 질병관리청 국민건강영양조사(KNHANES: Korea National Health and Nutrition Examination Survey) 데이터베이스에서 특징선택 방법으로 고혈압을 감지 예측하는 방법을 개선했다. 또한 만성 고혈압과 관련된 다양한 위험 요인을 확인하였다. 본 논문은 3가지로 나누어, 첫째 결측값을 제거하고 Z-변환을 하는 데이터 전처리 단계이다. 다음은 데이터 셋에서 특징선택법을 기반으로 하는 요인분석(FA)을 사용하는 특징선택 단계이며, 특징선택을 기반으로 다중공선형 분석(MC)와 특징중요도(FI)을 비교했다. 마지막으로 예측분석단계에서 고혈압 위험을 감지하고 예측하는데 적용했다. 본 연구에서는 각 분류 모델에 대해 ROC 곡선(AUC) 아래의 평균 표준 오차(MSE), F1 점수 및 면적을 비교한다. 테스트 결과 제안한 MC-FA-RF모델은 80.12% 가장 높은 정확도를 보이고, MSE, f-score, AUC 모델의 경우 각각 0.106, 83.49%의, 85.96% 으로 나타났다. 이러한 결과는 고혈압위험 예측에 대한 제안된 MC-FA-RF 방법이 다른 방법에 비해 우수함을 보이고 있다.

In this paper, we have enhanced the risk prediction of hypertension using the feature selection method in the Korean National Health and Nutrition Examination Survey (KNHANES) database of the Korea Centers for Disease Control and Prevention. The study identified various risk factors correlated with chronic hypertension. The paper is divided into three parts. Initially, the data preprocessing step of removes missing values, and performed z-transformation. The following is the feature selection (FS) step that used a factor analysis (FA) based on the feature selection method in the dataset, and feature importance (FI) and multicollinearity analysis (MC) were compared based on FS. Finally, in the predictive analysis stage, it was applied to detect and predict the risk of hypertension. In this study, we compare the accuracy, f-score, area under the ROC curve (AUC), and mean standard error (MSE) for each model of classification. As a result of the test, the proposed MC-FA-RF model achieved the highest accuracy of 80.12%, MSE of 0.106, f-score of 83.49%, and AUC of 85.96%, respectively. These results demonstrate that the proposed MC-FA-RF method for hypertension risk predictions is outperformed other methods.

키워드

과제정보

This research was financially supported by the Ministry of Trade, Industry, and Energy (MOTIE) and Korea Institute for Advancement of Technology (KIAT) through the National Innovation Cluster R&D program (Cooperative Regional Industry Development Program with the relocated Public Institutes, P0002072).

참고문헌

  1. Korea Centers for Disease Control & Prevention [Internet], http://knhanes.cdc.go.kr.
  2. S., Kalantari, et al., "Predictors of early adulthood hypertension during adolescence: A population-based cohort study," BMC Public Health, Vol.17, No.1, pp.1-8, 2017. https://doi.org/10.1186/s12889-016-3954-4
  3. J. van der Leeuw, M. H. de Borst, L. M. Kieneker, S. J. Bakker, R. T. Gansevoort, and M. B. Rookmaaker, "Separating the effects of 24-hour urinary chloride and sodium excretion on blood pressure and risk of hypertension: Results from PREVEND," PloS one, Vol.15, No.2, pp.e0228490, 2020. https://doi.org/10.1371/journal.pone.0228490
  4. K. Kim, E. Ji, J. Y. Choi, S. W. Kim, S. Ahn, and C. H. Kim, "Ten-year trends of hypertension treatment and control rate in Korea," Scientific Reports, Vol.11, No.1, pp.1-8, 2021. https://doi.org/10.1038/s41598-020-79139-8
  5. K. Dashdondov and M. H. Kim, "Multivariate outlier removing for the risk prediction of gas leakage based methane gas," Journal of the Korea Convergence Society, Vol.11, No.12, pp.23-30, 2020. https://doi.org/10.15207/JKCS.2020.11.12.023
  6. K. Dashdondov and M. H. Kim, "Prediction of hypertension in Korean men using the outlier detection method," International Conference on the Multimedia and Ubiquitous Engineering (MUE2021), Jeju, Korea, Apr. 22-24, 2021.
  7. D. E. Farrar and R. R. Glauber, "Multicollinearity in regression analysis: The problem revisited," Review of Economics and Statistics, Vol.49, No.1 pp.92-107, 1967. https://doi.org/10.2307/1937887
  8. R. M. O'brien, "A caution regarding rules of thumb for variance inflation factors," Quality & Quantity, Vol.41, No.5, pp.673-690, 2007. https://doi.org/10.1007/s11135-006-9018-6
  9. V. N. Vapnik, "The nature of statistical learning theory," Springer, New York, 1995.
  10. W., Chang, et al., "A machine-learning-based prediction method for hypertension outcomes based on medical data," Diagnostics, Vol.9, No.4, pp.178, 2019. https://doi.org/10.3390/diagnostics9040178
  11. D. J. Denis, "Applied univariate, bivariate, and multivariate statistics: Understanding statistics for social and natural scientists, With Applications in SPSS and R.," John Wiley & Sons, 2021.
  12. K. Dashdondov and M. H. Kim, "Mahalanobis distance based multivariate outlier detection to improve performance of hypertension prediction," Neural Processing Letters, pp.1-13, 2021.