DOI QR코드

DOI QR Code

Asian Ethnic Group Classification Model Using Data Mining

데이터마이닝 방법을 이용한 아시아 민족 분류 모형 구축

  • Kim, Yoon Geon (Department of Applied Statistics, Yonsei University) ;
  • Lee, Ji Hyun (Department of Forensic Medicine, Seoul National University College of Medicine) ;
  • Cho, Sohee (Institute of Forensic Science, Seoul National University College of Medicine) ;
  • Kim, Moon Young (Institute of Forensic Science, Seoul National University College of Medicine) ;
  • Lee, Soong Deok (Department of Forensic Medicine, Seoul National University College of Medicine) ;
  • Ha, Eun Ho (Department of Information and Statistics, Yonsei University) ;
  • Ahn, Jae Joon (Department of Information and Statistics, Yonsei University)
  • 김윤건 (연세대학교 응용통계학과) ;
  • 이지현 (서울대학교 의과대학 법의학교실) ;
  • 조소희 (서울대학교 의학연구원 법의학연구소) ;
  • 김문영 (서울대학교 의학연구원 법의학연구소) ;
  • 이숭덕 (서울대학교 의과대학 법의학교실) ;
  • 하은호 (연세대학교 정보통계학과) ;
  • 안재준 (연세대학교 정보통계학과)
  • Received : 2017.05.01
  • Accepted : 2017.05.22
  • Published : 2017.05.31

Abstract

In addition to identifying genetic differences between target populations, it is also important to determine the impact of genetic differences with regard to the respective target populations. In recent years, there has been an increasing number of cases where this approach is needed, and thus various statistical methods must be considered. In this study, genetic data from populations of Southeast and Southwest Asia were collected, and several statistical approaches were evaluated on the Y-chromosome short tandem repeat data. In order to develop a more accurate and practical classification model, we applied gradient boosting and ensemble techniques. To infer between the Southeast and Southwest Asian populations, the overall performance of the classification models was better than that of the decision trees and regression models used in the past. In conclusion, this study suggests that additional statistical approaches, such as data mining techniques, could provide more useful interpretations for forensic analyses. These trials are expected to be the basis for further studies extending from target regions to the entire continent of Asia as well as the use of additional genes such as mitochondrial genes.

Keywords

Acknowledgement

Supported by : National Research Foundation (NRF)