• Title/Summary/Keyword: Backward Elimination

Search Result 36, Processing Time 0.028 seconds

A Development of Suicidal Ideation Prediction Model and Decision Rules for the Elderly: Decision Tree Approach (의사결정나무 기법을 이용한 노인들의 자살생각 예측모형 및 의사결정 규칙 개발)

  • Kim, Deok Hyun;Yoo, Dong Hee;Jeong, Dae Yul
    • The Journal of Information Systems
    • /
    • v.28 no.3
    • /
    • pp.249-276
    • /
    • 2019
  • Purpose The purpose of this study is to develop a prediction model and decision rules for the elderly's suicidal ideation based on the Korean Welfare Panel survey data. By utilizing this data, we obtained many decision rules to predict the elderly's suicide ideation. Design/methodology/approach This study used classification analysis to derive decision rules to predict on the basis of decision tree technique. Weka 3.8 is used as the data mining tool in this study. The decision tree algorithm uses J48, also known as C4.5. In addition, 66.6% of the total data was divided into learning data and verification data. We considered all possible variables based on previous studies in predicting suicidal ideation of the elderly. Finally, 99 variables including the target variable were used. Classification analysis was performed by introducing sampling technique through backward elimination and data balancing. Findings As a result, there were significant differences between the data sets. The selected data sets have different, various decision tree and several rules. Based on the decision tree method, we derived the rules for suicide prevention. The decision tree derives not only the rules for the suicidal ideation of the depressed group, but also the rules for the suicidal ideation of the non-depressed group. In addition, in developing the predictive model, the problem of over-fitting due to the data imbalance phenomenon was directly identified through the application of data balancing. We could conclude that it is necessary to balance the data on the target variables in order to perform the correct classification analysis without over-fitting. In addition, although data balancing is applied, it is shown that performance is not inferior in prediction rate when compared with a biased prediction model.

Factors Related to Diabetic Patients' Quality of Life: The 8th Korean National Health and Nutrition Examination (1st Year, 2019) (당뇨 환자의 삶의 질 관련 요인: 제 8기 1차년도(2019년) 국민건강영양조사)

  • Woo, Sang Jun;Kim, Eun A
    • The Journal of Korean Society for School & Community Health Education
    • /
    • v.23 no.2
    • /
    • pp.51-64
    • /
    • 2022
  • Objectives: This study aims to examine diabetic patients' quality of life by using the data of the 8th Korea National Health and Nutrition Examination Survey (1st year, 2019), identify the factors related to this, and utilize the results as basic data for intervention that can improve diabetic patients' quality of life. Methods: For the research subjects, this study extracted 624 patients who were diagnosed with Diabetes by a doctor from the total sample of 8,110 participants of the 8th Korea National Health and Nutrition Examination Survey. The SPSS(version25.0) program was used for the analysis of the collected data. Then, this study used a backward elimination multiple regression analysis method that applied complex sample, to examine the factors related with the finally estimated quality of life. Results: The results of this study revealed that diabetic patients' quality of life was related with gender, age, occupation, restriction of activity, subjected health status. The final model explained 35.7% of the variance (Wald F=28.210, p<.001). Conclusions: In order to improve the quality of life of diabetic patients, it would be desirable to provide differentiated management by developing a customized intervention strategy that takes into account gender, age, and occupation. When managing diabetic patients, the state, local governments, and hospitals should include content that prevents and copes with restrictions on activities that may occur due to disease. In addition, it is required to prepare a strategy to induce positive perception of the subject's own health status.

Update on the risk factors for opisthorchiasis and cholangiocarcinoma in Thailand

  • Sattrachai Prasopdee;Thittinan Rojthongpond;Yanwadee Chitkoolsamphan;Montinee Pholhelm;Siraphatsorn Yusuk;Junya Pattaraarchachai;Kritiya Butthongkomvong;Jutharat Kulsantiwong;Teva Phanaksri;Anthicha Kunjantarachot;Smarn Tesana;Thanakrit Sathavornmanee;Veerachai Thitapakorn
    • Parasites, Hosts and Diseases
    • /
    • v.61 no.4
    • /
    • pp.463-470
    • /
    • 2023
  • This study aimed to identify the recent risk factors for Opisthorchis viverrini infection and cholangiocarcinoma (CCA) to improve disease prevention. The participants were divided into the following 3 groups based on their health status: healthy control (nonOV and nonCCA), those with O. viverrini infection (OV), and those with CCA. A questionnaire was used to explore their lifestyle and behaviors. Multivariate logistic regression and backward elimination were used to identify the significant risk factors. The results showed that the significant risk factors for both O. viverrini infection and CCA were age>50 years (odd ratio (OR)=8.44, P<0.001, 95% confidence intervals (CI) 2.98-23.90 and OR=43.47, P=0.001, 95% CI 14.71-128.45, respectively) and raw fish consumption (OR=8.48, P<0.001, 95% CI 3.18-22.63 and OR=3.15, P=0.048, 95% CI 1.01-9.86, respectively). A history of O. viverrini infection was identified as an additional risk factor for CCA (OR=20.93, P=0.011, 95% CI 2.04-215.10). This study provided an update on the risk factors for O. viverrini infection and CCA. Asymptomatic patients with O. viverrini infection, particularly those>50 years old, should be carefully monitored to prevent CCA.

Self-optimizing feature selection algorithm for enhancing campaign effectiveness (캠페인 효과 제고를 위한 자기 최적화 변수 선택 알고리즘)

  • Seo, Jeoung-soo;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.4
    • /
    • pp.173-198
    • /
    • 2020
  • For a long time, many studies have been conducted on predicting the success of campaigns for customers in academia, and prediction models applying various techniques are still being studied. Recently, as campaign channels have been expanded in various ways due to the rapid revitalization of online, various types of campaigns are being carried out by companies at a level that cannot be compared to the past. However, customers tend to perceive it as spam as the fatigue of campaigns due to duplicate exposure increases. Also, from a corporate standpoint, there is a problem that the effectiveness of the campaign itself is decreasing, such as increasing the cost of investing in the campaign, which leads to the low actual campaign success rate. Accordingly, various studies are ongoing to improve the effectiveness of the campaign in practice. This campaign system has the ultimate purpose to increase the success rate of various campaigns by collecting and analyzing various data related to customers and using them for campaigns. In particular, recent attempts to make various predictions related to the response of campaigns using machine learning have been made. It is very important to select appropriate features due to the various features of campaign data. If all of the input data are used in the process of classifying a large amount of data, it takes a lot of learning time as the classification class expands, so the minimum input data set must be extracted and used from the entire data. In addition, when a trained model is generated by using too many features, prediction accuracy may be degraded due to overfitting or correlation between features. Therefore, in order to improve accuracy, a feature selection technique that removes features close to noise should be applied, and feature selection is a necessary process in order to analyze a high-dimensional data set. Among the greedy algorithms, SFS (Sequential Forward Selection), SBS (Sequential Backward Selection), SFFS (Sequential Floating Forward Selection), etc. are widely used as traditional feature selection techniques. It is also true that if there are many risks and many features, there is a limitation in that the performance for classification prediction is poor and it takes a lot of learning time. Therefore, in this study, we propose an improved feature selection algorithm to enhance the effectiveness of the existing campaign. The purpose of this study is to improve the existing SFFS sequential method in the process of searching for feature subsets that are the basis for improving machine learning model performance using statistical characteristics of the data to be processed in the campaign system. Through this, features that have a lot of influence on performance are first derived, features that have a negative effect are removed, and then the sequential method is applied to increase the efficiency for search performance and to apply an improved algorithm to enable generalized prediction. Through this, it was confirmed that the proposed model showed better search and prediction performance than the traditional greed algorithm. Compared with the original data set, greed algorithm, genetic algorithm (GA), and recursive feature elimination (RFE), the campaign success prediction was higher. In addition, when performing campaign success prediction, the improved feature selection algorithm was found to be helpful in analyzing and interpreting the prediction results by providing the importance of the derived features. This is important features such as age, customer rating, and sales, which were previously known statistically. Unlike the previous campaign planners, features such as the combined product name, average 3-month data consumption rate, and the last 3-month wireless data usage were unexpectedly selected as important features for the campaign response, which they rarely used to select campaign targets. It was confirmed that base attributes can also be very important features depending on the type of campaign. Through this, it is possible to analyze and understand the important characteristics of each campaign type.

The Relationship between Innovation Capability of R&D and the Firm's Performance : Comparing Regional Strategy Industry with Non-Regional Strategy Industry in Daegu (R&D 혁신역량과 기업성과 간의 관계: 대구지역 전략산업과 비전략산업 간 비교분석)

  • Shin, Jin-Kyo;Jo, Jeong-Il
    • Management & Information Systems Review
    • /
    • v.30 no.2
    • /
    • pp.211-235
    • /
    • 2011
  • We examined the relationship between innovation capability of R&D and the firm's performance by mainly comparing regional strategy industry with non-regional strategy industry. Also, this analysis involved comparing the relationship by regional strategy industry. For the purpose of this study, we divided innovation capability of R&D into input, process and output. The first of main results in this study was that regional strategic industry was significantly higher than non-regional strategy industry in innovation capability of R&D with the exception of the CEO's mind for technological innovations. However, we found no significant difference in the firm's performance. Second, in the results of comparing innovation capability of R&D and the firm's performance by regional strategy industry, electronic-information equipment industry was significantly superior to other industries. Third, it was found that the relationship between innovation capability of R&D and the firm's performance was different by regional strategy industry. Also, R&D manpower and R&D process were more significant factors affecting the firm's performance rather than R&D input and output.

  • PDF

Development of the Standard Blood Inventory Level Decision Rule in Hospitals (병원의 표준 혈액재고량 산출식 개발)

  • Kim, Byoung-Yik
    • Journal of Preventive Medicine and Public Health
    • /
    • v.21 no.1 s.23
    • /
    • pp.195-206
    • /
    • 1988
  • Two major issues of the blood bank management are quality assurance and inventory control. Recently, in Korea blood donation has gained popularity increasingly to allow considerable improvement of the quality assurance with respect to blood collection, transportation, storage, component preparation skills and hematological tests. Nevertheless the inventory control, the other issue of blood bank management, has been neglected so far. For the supply of blood by donation barely meets the demand, the blood bank policy on the inventory control has been 'the more the better.' The shortage itself by no means unnecessitate inventory control. In fact, in spite of shortage, no small amount of blood is outdated. The efficient blood inventory control makes it possible to economize the blood usage in the practice of state-of-the-art medical care. For the efficient blood inventory control in Korean hospitals, this tudy is to develop formulae forecasting the standard blood inventory level and suggest a set of policies improving the blood inventory control. For this study informations of $A^+$ whole bloods and packed cells inventory control were collected from a University Hospital and the Central Blood Bank of the Korean Red Cross. Using this informations, 1,461 daily blood inventory records were formulated.48 varieties of blood inventory control environment were identified on the basis of selected combinations of 4 inventory control variables-crossmatch, transfusion, inhospital donation and age of bloods from external supply. In order to decide the optimal blood inventory level for each environment, simulation models were designed to calculate the measures of performance of each environment. After the decision of 48 optimal blood inventory levels, stepwise multiple regression analysis was started where the independent variables were 4 inventory control variables and the dependent variable was optimal inventory level of each environment. Finally the standard blood inventory level decision rule was developed using the backward elimination procedure to select the best regression equation. And the effective alternatives of the issuing policy and crossmatch release period were suggested according to the measures of performance under the condition of the standard blood inventory level. The results of this study' were as follows ; 1. The formulae to calculate the standard blood inventory level($S^*$)was $S^*=2.8617X(d)^{0.9342}$ where d is the mean daily crossmatch(demand) for a blood type. 2. The measures of performace - outdate rate, average period of storage, mean age of transfused bloods, and mean daily available inventory level - were improved after maintenance of the standard inventory level in comparison with the present system. 3. Issuing policy of First In-First Out(FIFO) decreased the outdate rate, while Last In-First Out(LIFO) decreased the mean age of transfused bloods. The decrease of the crossmatch release period reduced the outdate rate and the mean age of transfused bloods.

  • PDF

Prevalence and risk factors of subclinical bovine mastitis in some dairy farms of Sylhet district of Bangladesh

  • Kahir, Md. Abdul;Islam, Md. Mazharul;Rahman, A.K.M. Anisur;Nahar, A.;Rahman, Md. Siddiqur;Son, Hee-Jong
    • Korean Journal of Veterinary Service
    • /
    • v.31 no.4
    • /
    • pp.497-504
    • /
    • 2008
  • A cross-sectional study was undertaken to report prevalence and to identify risk factors of subclinical mastitis of dairy cattle in Sylhet district of Bangladesh. Among 325 dairy farms of the district 12 farms(3.7%) were selected conveniently for this study. All the dairy cows of the 12 farms were selected for sample collection. Fresh milk samples from each of the selected dairy cows were collected aseptically in separate sterilized test tube as RF, RH, LF and LH quarter of the udder. Rapid modified White Side Test(WST) was used to detect subclinical mastitis(SCM). Results of WST and data derived from filled in questionnaire were entered in Microsoft Excel 2003 and transferred to $STATA^{(R)}$, version 8.0/Intercooled(Stata Corporation, Texas, USA, 2003). The overall prevalence of SCM and its distribution in different categories of variables in cow and their exact binomial 95% confidence intervals were calculated in $STATA^{(R)}$. Simple bivariable associations among independent variables were investigated by $x^2$ test in $STATA^{(R)}$. Multiple logistic regression analysis with backward elimination method was used to identify risk factors of SCM. To identify significant variation in quarter SCM, linear regression analysis was performed after arcsine transformation of the data. The overall prevalence of SCM found in this study is 54%. Dairy cows with teat lesions had significantly increased SCM(OR=12342, P value=0.000, 95% CI=762, 199798) than others without teat lesions. The Holstein Friesian X Jersey X Sahiwal breed has significantly decreased(OR=0.18, p=0.03, 95% CI 0.04, 0.85) SCM than other breeds. The prevalence of SCM found in this study is in agreement with others. The injury in the teat increases the probability of getting infected with microbes and thereby mastitis. If the prevalence of teat lesion can be decreased the probability of subclinical mastitis will also be decreased. The negatively associated Holstein Friesian X Jersey X Sahiwall breed may help in planning mastitis control program if this finding can be validated by a more powerful case-control or cohort study design.

A Study on Relapse Predictors in Korean Alcohol-Dependent Patients - A 24 Weeks Follow up Study - (24주 추적 조사를 통한 한국인 알코올 의존 환자의 재발 예측 인자 규명 연구)

  • Kim, Cheol Min;Kim, Sung Gon;NamKoong, Kee;Cho, Dong Hwan;Lee, Byung Ook;Choi, Ihn Geun;Kim, Min Jeong
    • Korean Journal of Biological Psychiatry
    • /
    • v.14 no.4
    • /
    • pp.249-255
    • /
    • 2007
  • Objectives : The aim of this prospective study is to investigate predictors estimating relapse in Korean alcohol-dependent patients using variables like alcohol history, drinking craving, treatment motivation and insight. Methods : Alcohol dependent patients(N=48) who completed questionnaires about sociodemographic variables and drinking history, Timeline Follow-Back(TLFB), Obsessive-Compulsive Drinking Scale(OCSD), Alcohol Urge Questionnaire(AUQ), Pennsylvania Alcohol Craving Scale(PACS), University of Rhode Island Change Assessment(URICA), Hanil Alcohol Insight Scale(HAIS) were followed-up for 24weeks. Subjects who drank heavily(5 standard drinking or more/day) or were not followed up anymore were classified as the relapse group. We used logistic regression analysis with backward elimination of SPSS PC+11.5 to investigate relapse estimate predictors. Results : Average drinking amount per drinking day for last 1 year and HAIS score were predictors of relapse in alcohol-dependent patients. Conclusions : Our findings suggest that therapist should give more attention to alcohol-dependent patients who had more drinks per drinking day for last 1 year and had lower insight level.

  • PDF

Development of Field Scale Model for Estimating Garlic Growth Based on UAV NDVI and Meteorological Factors

  • Na, Sang-Il;Min, Byoung-keol;Park, Chan-Won;So, Kyu-Ho;Park, Jae-Moon;Lee, Kyung-Do
    • Korean Journal of Soil Science and Fertilizer
    • /
    • v.50 no.5
    • /
    • pp.422-433
    • /
    • 2017
  • Unmanned Aerial Vehicle (UAV) has several advantages over conventional remote sensing techniques. They can acquire high-resolution images quickly and repeatedly. And with a comparatively lower flight altitude, they can obtain good quality images even in cloudy weather. In this paper, we developed for estimating garlic growth at field scale model in major cultivation regions. We used the $NDVI_{UAV}$ that reflects the crop conditions, and seven meteorological elements for 3 major cultivation regions from 2015 to 2017. For this study, UAV imagery was taken at Taean, Changnyeong, and Hapcheon regions nine times from early February to late June during the garlic growing season. Four plant growth parameters, plant height (P.H.), leaf number (L.N.), plant diameter (P.D.), and fresh weight (F.W.) were measured for twenty plants per plot for each field campaign. The multiple linear regression models were suggested by using backward elimination and stepwise selection in the extraction of independent variables. As a result, model of cold type explain 82.1%, 65.9%, 64.5%, and 61.7% of the P.H., F.W., L.N., P.D. with a root mean square error (RMSE) of 7.98 cm, 5.91 g, 1.05, and 3.43 cm. Especially, model of warm type explain 92.9%, 88.6%, 62.8%, 54.6% of the P.H., P.D., L.N., F.W. with a root mean square error (RMSE) of 16.41 cm, 9.08 cm, 1.12, 19.51 g. The spatial distribution map of garlic growth was in strong agreement with the field measurements in terms of field variation and relative numerical values when $NDVI_{UAV}$ was applied to multiple linear regression models. These results will also be useful for determining the UAV multi-spectral imagery necessary to estimate growth parameters of garlic.

Illegal Cash Accommodation Detection Modeling Using Ensemble Size Reduction (신용카드 불법현금융통 적발을 위한 축소된 앙상블 모형)

  • Lee, Hwa-Kyung;Han, Sang-Bum;Jhee, Won-Chul
    • Journal of Intelligence and Information Systems
    • /
    • v.16 no.1
    • /
    • pp.93-116
    • /
    • 2010
  • Ensemble approach is applied to the detection modeling of illegal cash accommodation (ICA) that is the well-known type of fraudulent usages of credit cards in far east nations and has not been addressed in the academic literatures. The performance of fraud detection model (FDM) suffers from the imbalanced data problem, which can be remedied to some extent using an ensemble of many classifiers. It is generally accepted that ensembles of classifiers produce better accuracy than a single classifier provided there is diversity in the ensemble. Furthermore, recent researches reveal that it may be better to ensemble some selected classifiers instead of all of the classifiers at hand. For the effective detection of ICA, we adopt ensemble size reduction technique that prunes the ensemble of all classifiers using accuracy and diversity measures. The diversity in ensemble manifests itself as disagreement or ambiguity among members. Data imbalance intrinsic to FDM affects our approach for ICA detection in two ways. First, we suggest the training procedure with over-sampling methods to obtain diverse training data sets. Second, we use some variants of accuracy and diversity measures that focus on fraud class. We also dynamically calculate the diversity measure-Forward Addition and Backward Elimination. In our experiments, Neural Networks, Decision Trees and Logit Regressions are the base models as the ensemble members and the performance of homogeneous ensembles are compared with that of heterogeneous ensembles. The experimental results show that the reduced size ensemble is as accurate on average over the data-sets tested as the non-pruned version, which provides benefits in terms of its application efficiency and reduced complexity of the ensemble.