• Title/Summary/Keyword: Ensemble models

Search Result 365, Processing Time 0.029 seconds

A Recommending System for Care Plan(Res-CP) in Long-Term Care Insurance System (데이터마이닝 기법을 활용한 노인장기요양급여 권고모형 개발)

  • Han, Eun-Jeong;Lee, Jung-Suk;Kim, Dong-Geon;Ka, Im-Ok
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.6
    • /
    • pp.1229-1237
    • /
    • 2009
  • In the long-term care insurance(LTCI) system, the question of how to provide the most appropriate care has become a major issue for the elderly, their family, and for policy makers. To help beneficiaries use LTC services appropriately to their needs of care, National Health Insurance Corporation(NHIC) provide them with the individualized care plan, named the Long-term Care User Guide. It includes recommendations for beneficiaries' most appropriate type of care. The purpose of this study is to develop a recommending system for care plan(Res-CP) in LTCI system. We used data set for Long-term Care User Guide in the 3rd long-term care insurance pilot programs. To develop the model, we tested four models, including a decision-tree model in data-mining, a logistic regression model, and a boosting and boosting techniques in an ensemble model. A decision-tree model was selected to describe the Res-CP, because it may be easy to explain the algorithm of Res-CP to the working groups. Res-CP might be useful in an evidence-based care planning in LTCI system and may contribute to support use of LTC services efficiently.

A Korean Community-based Question Answering System Using Multiple Machine Learning Methods (다중 기계학습 방법을 이용한 한국어 커뮤니티 기반 질의-응답 시스템)

  • Kwon, Sunjae;Kim, Juae;Kang, Sangwoo;Seo, Jungyun
    • Journal of KIISE
    • /
    • v.43 no.10
    • /
    • pp.1085-1093
    • /
    • 2016
  • Community-based Question Answering system is a system which provides answers for each question from the documents uploaded on web communities. In order to enhance the capacity of question analysis, former methods have developed specific rules suitable for a target region or have applied machine learning to partial processes. However, these methods incur an excessive cost for expanding fields or lead to cases in which system is overfitted for a specific field. This paper proposes a multiple machine learning method which automates the overall process by adapting appropriate machine learning in each procedure for efficient processing of community-based Question Answering system. This system can be divided into question analysis part and answer selection part. The question analysis part consists of the question focus extractor, which analyzes the focused phrases in questions and uses conditional random fields, and the question type classifier, which classifies topics of questions and uses support vector machine. In the answer selection part, the we trains weights that are used by the similarity estimation models through an artificial neural network. Also these are a number of cases in which the results of morphological analysis are not reliable for the data uploaded on web communities. Therefore, we suggest a method that minimizes the impact of morphological analysis by using character features in the stage of question analysis. The proposed system outperforms the former system by showing a Mean Average Precision criteria of 0.765 and R-Precision criteria of 0.872.

A Study on Fault Classification by EEMD Application of Gear Transmission Error (전달오차의 EEMD적용을 통한 기어 결함분류연구)

  • Park, Sungho;Choi, Joo-Ho
    • Journal of the Computational Structural Engineering Institute of Korea
    • /
    • v.30 no.2
    • /
    • pp.169-177
    • /
    • 2017
  • In this paper, classification of spall and crack faults of gear teeth is studied by applying the ensemble empirical mode decomposition(EEMD) for the gear transmission error(TE). Finite element models of the gears with the two faults are built, and TE is obtained by simulation of the gears under loaded contact. EEMD is applied to the residuals of the TE which are the difference between the normal and faulty signal. From the result, the difference of spall and crack faults are clearly identified by the intrinsic mode functions(IMF). A simple test bed is installed to illustrate the approach, which consists of motor, brake and a pair of spur gears. Two gears are employed to obtain the TE for the normal, spalled, and cracked gears, and the type of the faults are separated by the same EEMD application process. In order to quantify the results, crest factors are applied to each IMF. Characteristics of spall and crack are well represented by the crest factors of the first and the third IMF, which are used as the feature signals. The classification is carried out using the Bayes decision theory using the feature signals acquired through the experiments.

Development of Stochastic Downscaling Method for Rainfall Data Using GCM (GCM Ensemble을 활용한 추계학적 강우자료 상세화 기법 개발)

  • Kim, Tae-Jeong;Kwon, Hyun-Han;Lee, Dong-Ryul;Yoon, Sun-Kwon
    • Journal of Korea Water Resources Association
    • /
    • v.47 no.9
    • /
    • pp.825-838
    • /
    • 2014
  • The stationary Markov chain model has been widely used as a daily rainfall simulation model. A main assumption of the stationary Markov model is that statistical characteristics do not change over time and do not have any trends. In other words, the stationary Markov chain model for daily rainfall simulation essentially can not incorporate any changes in mean or variance into the model. Here we develop a Non-stationary hidden Markov chain model (NHMM) based stochastic downscaling scheme for simulating the daily rainfall sequences, using general circulation models (GCMs) as inputs. It has been acknowledged that GCMs perform well with respect to annual and seasonal variation at large spatial scale and they stand as one of the primary sources for obtaining forecasts. The proposed model is applied to daily rainfall series at three stations in Nakdong watershed. The model showed a better performance in reproducing most of the statistics associated with daily and seasonal rainfall. In particular, the proposed model provided a significant improvement in reproducing the extremes. It was confirmed that the proposed model could be used as a downscaling model for the purpose of generating plausible daily rainfall scenarios if elaborate GCM forecasts can used as a predictor. Also, the proposed NHMM model can be applied to climate change studies if GCM based climate change scenarios are used as inputs.

Bayesian networks-based probabilistic forecasting of hydrological drought considering drought propagation (가뭄의 전이 현상을 고려한 수문학적 가뭄에 대한 베이지안 네트워크 기반 확률 예측)

  • Shin, Ji Yae;Kwon, Hyun-Han;Lee, Joo-Heon;Kim, Tae-Woong
    • Journal of Korea Water Resources Association
    • /
    • v.50 no.11
    • /
    • pp.769-779
    • /
    • 2017
  • As the occurrence of drought is recently on the rise, the reliable drought forecasting is required for developing the drought mitigation and proactive management of water resources. This study developed a probabilistic hydrological drought forecasting method using the Bayesian Networks and drought propagation relationship to estimate future drought with the forecast uncertainty, named as the Propagated Bayesian Networks Drought Forecasting (PBNDF) model. The proposed PBNDF model was composed with 4 nodes of past, current, multi-model ensemble (MME) forecasted information and the drought propagation relationship. Using Palmer Hydrological Drought Index (PHDI), the PBNDF model was applied to forecast the hydrological drought condition at 10 gauging stations in Nakdong River basin. The receiver operating characteristics (ROC) curve analysis was applied to measure the forecast skill of the forecast mean values. The root mean squared error (RMSE) and skill score (SS) were employed to compare the forecast performance with previously developed forecast models (persistence forecast, Bayesian network drought forecast). We found that the forecast skill of PBNDF model showed better performance with low RMSE and high SS of 0.1~0.15. The overall results mean the PBNDF model had good potential in probabilistic drought forecasting.

Doubly-robust Q-estimation in observational studies with high-dimensional covariates (고차원 관측자료에서의 Q-학습 모형에 대한 이중강건성 연구)

  • Lee, Hyobeen;Kim, Yeji;Cho, Hyungjun;Choi, Sangbum
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.3
    • /
    • pp.309-327
    • /
    • 2021
  • Dynamic treatment regimes (DTRs) are decision-making rules designed to provide personalized treatment to individuals in multi-stage randomized trials. Unlike classical methods, in which all individuals are prescribed the same type of treatment, DTRs prescribe patient-tailored treatments which take into account individual characteristics that may change over time. The Q-learning method, one of regression-based algorithms to figure out optimal treatment rules, becomes more popular as it can be easily implemented. However, the performance of the Q-learning algorithm heavily relies on the correct specification of the Q-function for response, especially in observational studies. In this article, we examine a number of double-robust weighted least-squares estimating methods for Q-learning in high-dimensional settings, where treatment models for propensity score and penalization for sparse estimation are also investigated. We further consider flexible ensemble machine learning methods for the treatment model to achieve double-robustness, so that optimal decision rule can be correctly estimated as long as at least one of the outcome model or treatment model is correct. Extensive simulation studies show that the proposed methods work well with practical sample sizes. The practical utility of the proposed methods is proven with real data example.

A Vision Transformer Based Recommender System Using Side Information (부가 정보를 활용한 비전 트랜스포머 기반의 추천시스템)

  • Kwon, Yujin;Choi, Minseok;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.3
    • /
    • pp.119-137
    • /
    • 2022
  • Recent recommendation system studies apply various deep learning models to represent user and item interactions better. One of the noteworthy studies is ONCF(Outer product-based Neural Collaborative Filtering) which builds a two-dimensional interaction map via outer product and employs CNN (Convolutional Neural Networks) to learn high-order correlations from the map. However, ONCF has limitations in recommendation performance due to the problems with CNN and the absence of side information. ONCF using CNN has an inductive bias problem that causes poor performances for data with a distribution that does not appear in the training data. This paper proposes to employ a Vision Transformer (ViT) instead of the vanilla CNN used in ONCF. The reason is that ViT showed better results than state-of-the-art CNN in many image classification cases. In addition, we propose a new architecture to reflect side information that ONCF did not consider. Unlike previous studies that reflect side information in a neural network using simple input combination methods, this study uses an independent auxiliary classifier to reflect side information more effectively in the recommender system. ONCF used a single latent vector for user and item, but in this study, a channel is constructed using multiple vectors to enable the model to learn more diverse expressions and to obtain an ensemble effect. The experiments showed our deep learning model improved performance in recommendation compared to ONCF.

A Method of Machine Learning-based Defective Health Functional Food Detection System for Efficient Inspection of Imported Food (효율적 수입식품 검사를 위한 머신러닝 기반 부적합 건강기능식품 탐지 방법)

  • Lee, Kyoungsu;Bak, Yerin;Shin, Yoonjong;Sohn, Kwonsang;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.3
    • /
    • pp.139-159
    • /
    • 2022
  • As interest in health functional foods has increased since COVID-19, the importance of imported food safety inspections is growing. However, in contrast to the annual increase in imports of health functional foods, the budget and manpower required for inspections for import and export are reaching their limit. Hence, the purpose of this study is to propose a machine learning model that efficiently detects unsuitable food suitable for the characteristics of data possessed by government offices on imported food. First, the components of food import/export inspections data that affect the judgment of nonconformity were examined and derived variables were newly created. Second, in order to select features for the machine learning, class imbalance and nonlinearity were considered when performing exploratory analysis on imported food-related data. Third, we try to compare the performance and interpretability of each model by applying various machine learning techniques. In particular, the ensemble model was the best, and it was confirmed that the derived variables and models proposed in this study can be helpful to the system used in import/export inspections.

Water Level Prediction on the Golok River Utilizing Machine Learning Technique to Evaluate Flood Situations

  • Pheeranat Dornpunya;Watanasak Supaking;Hanisah Musor;Oom Thaisawasdi;Wasukree Sae-tia;Theethut Khwankeerati;Watcharaporn Soyjumpa
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2023.05a
    • /
    • pp.31-31
    • /
    • 2023
  • During December 2022, the northeast monsoon, which dominates the south and the Gulf of Thailand, had significant rainfall that impacted the lower southern region, causing flash floods, landslides, blustery winds, and the river exceeding its bank. The Golok River, located in Narathiwat, divides the border between Thailand and Malaysia was also affected by rainfall. In flood management, instruments for measuring precipitation and water level have become important for assessing and forecasting the trend of situations and areas of risk. However, such regions are international borders, so the installed measuring telemetry system cannot measure the rainfall and water level of the entire area. This study aims to predict 72 hours of water level and evaluate the situation as information to support the government in making water management decisions, publicizing them to relevant agencies, and warning citizens during crisis events. This research is applied to machine learning (ML) for water level prediction of the Golok River, Lan Tu Bridge area, Sungai Golok Subdistrict, Su-ngai Golok District, Narathiwat Province, which is one of the major monitored rivers. The eXtreme Gradient Boosting (XGBoost) algorithm, a tree-based ensemble machine learning algorithm, was exploited to predict hourly water levels through the R programming language. Model training and testing were carried out utilizing observed hourly rainfall from the STH010 station and hourly water level data from the X.119A station between 2020 and 2022 as main prediction inputs. Furthermore, this model applies hourly spatial rainfall forecasting data from Weather Research and Forecasting and Regional Ocean Model System models (WRF-ROMs) provided by Hydro-Informatics Institute (HII) as input, allowing the model to predict the hourly water level in the Golok River. The evaluation of the predicted performances using the statistical performance metrics, delivering an R-square of 0.96 can validate the results as robust forecasting outcomes. The result shows that the predicted water level at the X.119A telemetry station (Golok River) is in a steady decline, which relates to the input data of predicted 72-hour rainfall from WRF-ROMs having decreased. In short, the relationship between input and result can be used to evaluate flood situations. Here, the data is contributed to the Operational support to the Special Water Resources Management Operation Center in Southern Thailand for flood preparedness and response to make intelligent decisions on water management during crisis occurrences, as well as to be prepared and prevent loss and harm to citizens.

  • PDF

Assessing habitat suitability for timber species in South Korea under SSP scenarios (SSP 시나리오에 따른 국내 용재수종의 서식지 적합도 평가)

  • Hyeon-Gwan Ahn;Chul-Hee Lim
    • Korean Journal of Environmental Biology
    • /
    • v.40 no.4
    • /
    • pp.567-578
    • /
    • 2022
  • Various social and environmental problems have recently emerged due to global climate change. In South Korea, coniferous forests in the highlands are decreasing due to climate change whereas the distribution of subtropical species is gradually increasing. This study aims to respond to changes in the distribution of forest species in South Korea due to climate change. This study predicts changes in future suitable areas for Pinus koraiensis, Cryptomeria japonica, and Chamaecyparis obtusa cultivated as timber species based on climate, topography, and environment. Appearance coordinates were collected only for natural forests in consideration of climate suitability in the National Forest Inventory. Future climate data used the SSP scenario by KMA. Species distribution models were ensembled to predict future suitable habitat areas for the base year(2000-2019), near future(2041-2060), and distant future(2081-2100). In the baseline period, the highly suitable habitat for Pinus koraiensis accounted for approximately 13.87% of the country. However, in the distant future(2081-2100), it decreased to approximately 0.11% under SSP5-8.5. For Cryptomeria japonica, the habitat for the base year was approximately 7.08%. It increased to approximately 18.21% under SSP5-8.5 in the distant future. In the case of Chamaecyparis obtusa, the habitat for the base year was approximately 19.32%. It increased to approximately 90.93% under SSP5-8.5 in the distant future. Pinus koraiensis, which had been planted nationwide, gradually moved north due to climate change with suitable habitats in South Korea decreased significantly. After the near future, Pinus koraiensis was not suitable for the afforestation as timber species in South Korea. Chamaecyparis obtusa can be replaced in most areas. In the case of Cryptomeria japonica, it was assessed that it could replace part of the south and central region.