• Title/Summary/Keyword: Predictive decision tree

Search Result 115, Processing Time 0.019 seconds

A Study on the Effect of the Document Summarization Technique on the Fake News Detection Model (문서 요약 기법이 가짜 뉴스 탐지 모형에 미치는 영향에 관한 연구)

  • Shim, Jae-Seung;Won, Ha-Ram;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.201-220
    • /
    • 2019
  • Fake news has emerged as a significant issue over the last few years, igniting discussions and research on how to solve this problem. In particular, studies on automated fact-checking and fake news detection using artificial intelligence and text analysis techniques have drawn attention. Fake news detection research entails a form of document classification; thus, document classification techniques have been widely used in this type of research. However, document summarization techniques have been inconspicuous in this field. At the same time, automatic news summarization services have become popular, and a recent study found that the use of news summarized through abstractive summarization has strengthened the predictive performance of fake news detection models. Therefore, the need to study the integration of document summarization technology in the domestic news data environment has become evident. In order to examine the effect of extractive summarization on the fake news detection model, we first summarized news articles through extractive summarization. Second, we created a summarized news-based detection model. Finally, we compared our model with the full-text-based detection model. The study found that BPN(Back Propagation Neural Network) and SVM(Support Vector Machine) did not exhibit a large difference in performance; however, for DT(Decision Tree), the full-text-based model demonstrated a somewhat better performance. In the case of LR(Logistic Regression), our model exhibited the superior performance. Nonetheless, the results did not show a statistically significant difference between our model and the full-text-based model. Therefore, when the summary is applied, at least the core information of the fake news is preserved, and the LR-based model can confirm the possibility of performance improvement. This study features an experimental application of extractive summarization in fake news detection research by employing various machine-learning algorithms. The study's limitations are, essentially, the relatively small amount of data and the lack of comparison between various summarization technologies. Therefore, an in-depth analysis that applies various analytical techniques to a larger data volume would be helpful in the future.

Prediction of Landslides and Determination of Its Variable Importance Using AutoML (AutoML을 이용한 산사태 예측 및 변수 중요도 산정)

  • Nam, KoungHoon;Kim, Man-Il;Kwon, Oil;Wang, Fawu;Jeong, Gyo-Cheol
    • The Journal of Engineering Geology
    • /
    • v.30 no.3
    • /
    • pp.315-325
    • /
    • 2020
  • This study was performed to develop a model to predict landslides and determine the variable importance of landslides susceptibility factors based on the probabilistic prediction of landslides occurring on slopes along the road. Field survey data of 30,615 slopes from 2007 to 2020 in Korea were analyzed to develop a landslide prediction model. Of the total 131 variable factors, 17 topographic factors and 114 geological factors (including 89 bedrocks) were used to predict landslides. Automated machine learning (AutoML) was used to classify landslides and non-landslides. The verification results revealed that the best model, an extremely randomized tree (XRT) with excellent predictive performance, yielded 83.977% of prediction rates on test data. As a result of the analysis to determine the variable importance of the landslide susceptibility factors, it was composed of 10 topographic factors and 9 geological factors, which was presented as a percentage for each factor. This model was evaluated probabilistically and quantitatively for the likelihood of landslide occurrence by deriving the ranking of variable importance using only on-site survey data. It is considered that this model can provide a reliable basis for slope safety assessment through field surveys to decision-makers in the future.

Taxonomy of Performance Shaping Factors for Human Error Analysis of Railway Accidents (철도사고의 인적오류 분석을 위한 수행도 영향인자 분류)

  • Baek, Dong-Hyun;Koo, Lock-Jo;Lee, Kyung-Sun;Kim, Dong-San;Shin, Min-Ju;Yoon, Wan-Chul;Jung, Myung-Chul
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.31 no.1
    • /
    • pp.41-48
    • /
    • 2008
  • Enhanced machine reliability has dramatically reduced the rate and number of railway accidents but for further reduction human error should be considered together that accounts for about 20% of the accidents. Therefore, the objective of this study was to suggest a new taxonomy of performance shaping factors (PSFs) that could be utilized to identify the causes of a human error associated with railway accidents. Four categories of human factor, task factor, environment factor, and organization factor and 14 sub-categories of physical state, psychological state, knowledge/experience/ability, information/communication, regulation/procedure, specific character of task, infrastructure, device/MMI, working environment, external environment, education, direction/management, system/atmosphere, and welfare/opportunity along with 131 specific factors was suggested by carefully reviewing 8 representative published taxonomy of Casualty Analysis Methodology for Maritime Operations (CASMET), Cognitive Reliability and Error Analysis Method (CREAM), Human Factors Analysis and Classification System (HFACS), Integrated Safety Investigation Methodology (ISIM), Korea-Human Performance Enhancement System (K-HPES), Rail safety and Standards Board (RSSB), $TapRoot^{(R)}$, and Technique for Retrospective and Predictive Analysis of Cognitive Errors (TRACEr). Then these were applied to the case of the railway accident occurred between Komo and Kyungsan stations in 2003 for verification. Both cause decision chart and why-because tree were developed and modified to aid the analyst to find causal factors from the suggested taxonomy. The taxonomy was well suited so that eight causes were found to explain the driver's error in the accident. The taxonomy of PSFs suggested in this study could cover from latent factors to direct causes of human errors related with railway accidents with systematic categorization.

Affected Model of Indoor Radon Concentrations Based on Lifestyle, Greenery Ratio, and Radon Levels in Groundwater (생활 습관, 주거지 주변 녹지 비율 및 지하수 내 라돈 농도 따른 실내 라돈 농도 영향 모델)

  • Lee, Hyun Young;Park, Ji Hyun;Lee, Cheol-Min;Kang, Dae Ryong
    • Journal of health informatics and statistics
    • /
    • v.42 no.4
    • /
    • pp.309-316
    • /
    • 2017
  • Objectives: Radon and its progeny pose environmental risks as a carcinogen, especially to the lungs. Investigating factors affecting indoor radon concentrations and models thereof are needed to prevent exposure to radon and to reduce indoor radon concentrations. The purpose of this study was to identify factors affecting indoor radon concentration and to construct a comprehensive model thereof. Methods: Questionnaires were administered to obtain data on residential environments, including building materials and life style. Decision tree and structural equation modeling were applied to predict residences at risk for higher radon concentrations and to develop the comprehensive model. Results: Greenery ratio, impermeable layer ratio, residence at ground level, daily ventilation, long-term heating, crack around the measuring device, and bedroom were significantly shown to be predictive factors of higher indoor radon concentrations. Daily ventilation reduced the probability of homes having indoor radon concentrations ${\geq}200Bq/m^3$ by 11.6%. Meanwhile, a greenery ratio ${\geq}65%$ without daily ventilation increased this probability by 15.3% compared to daily ventilation. The constructed model indicated greenery ratio and ventilation rate directly affecting indoor radon concentrations. Conclusions: Our model highlights the combined influences of geographical properties, groundwater, and lifestyle factors of an individual resident on indoor radon concentrations in Korea.

Performance Comparison of Machine Learning based Prediction Models for University Students Dropout (머신러닝 기반 대학생 중도 탈락 예측 모델의 성능 비교)

  • Seok-Bong Jeong;Du-Yon Kim
    • Journal of the Korea Society for Simulation
    • /
    • v.32 no.4
    • /
    • pp.19-26
    • /
    • 2023
  • The increase in the dropout rate of college students nationwide has a serious negative impact on universities and society as well as individual students. In order to proactive identify students at risk of dropout, this study built a decision tree, random forest, logistic regression, and deep learning-based dropout prediction model using academic data that can be easily obtained from each university's academic management system. Their performances were subsequently analyzed and compared. The analysis revealed that while the logistic regression-based prediction model exhibited the highest recall rate, its f-1 value and ROC-AUC (Receiver Operating Characteristic - Area Under the Curve) value were comparatively lower. On the other hand, the random forest-based prediction model demonstrated superior performance across all other metrics except recall value. In addition, in order to assess model performance over distinct prediction periods, we divided these periods into short-term (within one semester), medium-term (within two semesters), and long-term (within three semesters). The results underscored that the long-term prediction yielded the highest predictive efficacy. Through this study, each university is expected to be able to identify students who are expected to be dropped out early, reduce the dropout rate through intensive management, and further contribute to the stabilization of university finances.