• Title/Summary/Keyword: Variables selection

Search Result 1,189, Processing Time 0.028 seconds

Two-stage imputation method to handle missing data for categorical response variable

  • Jong-Min Kim;Kee-Jae Lee;Seung-Joo Lee
    • Communications for Statistical Applications and Methods
    • /
    • v.30 no.6
    • /
    • pp.577-587
    • /
    • 2023
  • Conventional categorical data imputation techniques, such as mode imputation, often encounter issues related to overestimation. If the variable has too many categories, multinomial logistic regression imputation method may be impossible due to computational limitations. To rectify these limitations, we propose a two-stage imputation method. During the first stage, we utilize the Boruta variable selection method on the complete dataset to identify significant variables for the target categorical variable. Then, in the second stage, we use the important variables for the target categorical variable for logistic regression to impute missing data in binary variables, polytomous regression to impute missing data in categorical variables, and predictive mean matching to impute missing data in quantitative variables. Through analysis of both asymmetric and non-normal simulated and real data, we demonstrate that the two-stage imputation method outperforms imputation methods lacking variable selection, as evidenced by accuracy measures. During the analysis of real survey data, we also demonstrate that our suggested two-stage imputation method surpasses the current imputation approach in terms of accuracy.

A New Variable Selection Method Based on Mutual Information Maximization by Replacing Collinear Variables for Nonlinear Quantitative Structure-Property Relationship Models

  • Ghasemi, Jahan B.;Zolfonoun, Ehsan
    • Bulletin of the Korean Chemical Society
    • /
    • v.33 no.5
    • /
    • pp.1527-1535
    • /
    • 2012
  • Selection of the most informative molecular descriptors from the original data set is a key step for development of quantitative structure activity/property relationship models. Recently, mutual information (MI) has gained increasing attention in feature selection problems. This paper presents an effective mutual information-based feature selection approach, named mutual information maximization by replacing collinear variables (MIMRCV), for nonlinear quantitative structure-property relationship models. The proposed variable selection method was applied to three different QSPR datasets, soil degradation half-life of 47 organophosphorus pesticides, GC-MS retention times of 85 volatile organic compounds, and water-to-micellar cetyltrimethylammonium bromide partition coefficients of 62 organic compounds.The obtained results revealed that using MIMRCV as feature selection method improves the predictive quality of the developed models compared to conventional MI based variable selection algorithms.

Effect of Menu Reliability on Consumer Satisfaction at Rice Cake Cafe using Domestically Grown Agricultural Products - Focus on Jeonju Hanok Village - (국내산 농산물을 사용한 떡 카페 메뉴에 대한 신뢰가 소비자 만족에 미치는 영향 - 전주 한옥마을을 중심으로 -)

  • Kim, Su In
    • Journal of the East Asian Society of Dietary Life
    • /
    • v.25 no.5
    • /
    • pp.922-931
    • /
    • 2015
  • To investigate and analyze the influence of trust on consumer satisfaction in rice cake cafe menus using domestically grown food ingredients, this study divided selection attributes of rice cake cafe menus into safety, nutrition, ethicality and marketability through an exploratory factor analysis and analyzed reliability and correlation among these variables. As a result, these four factors were adopted as selection factors, and the results of the correlation analysis among the variables showed that the four factors were statistically correlated with trust and customer satisfaction. The result of validity and reliability testing on consumer trust showed that the menus were reliable and trustworthy as they had been made using domestically grown agricultural products. Analysis of how cafe selection attributes affect trust showed that the these variables had a significantly positive influence on trust in the order of safety, marketability, nutrition, and ethicality. The influence of the selection attributes on customer satisfaction was statistically significant, and the independent variables had a significantly positive influence on trust in the order of marketability, ethicality, safety, and nutrition. In verifying the mediation effect of trust on customer satisfaction, the four factors of rice cake cafe selection attributes had statistically significant mediation effects.

Variable selection in the kernel Cox regression

  • Shim, Joo-Yong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.22 no.4
    • /
    • pp.795-801
    • /
    • 2011
  • In machine learning and statistics it is often the case that some variables are not important, while some variables are more important than others. We propose a novel algorithm for selecting such relevant variables in the kernel Cox regression. We employ the weighted version of ANOVA decomposition kernels to choose optimal subset of relevant variables in the kernel Cox regression. Experimental results are then presented which indicate the performance of the proposed method.

Variable Selection with Regression Trees

  • Chang, Young-Jae
    • The Korean Journal of Applied Statistics
    • /
    • v.23 no.2
    • /
    • pp.357-366
    • /
    • 2010
  • Many tree algorithms have been developed for regression problems. Although they are regarded as good algorithms, most of them suffer from loss of prediction accuracy when there are many noise variables. To handle this problem, we propose the multi-step GUIDE, which is a regression tree algorithm with a variable selection process. The multi-step GUIDE performs better than some of the well-known algorithms such as Random Forest and MARS. The results based on simulation study shows that the multi-step GUIDE outperforms other algorithms in terms of variable selection and prediction accuracy. It generally selects the important variables correctly with relatively few noise variables and eventually gives good prediction accuracy.

A study on the prediction of optimized injection molding conditions and the feature selection using the Artificial Neural Network(ANN) (인공신경망을 통한 사출 성형조건의 최적화 예측 및 특성 선택에 관한 연구)

  • Yang, Dong-Cheol;Kim, Jong-Sun
    • Design & Manufacturing
    • /
    • v.16 no.3
    • /
    • pp.50-57
    • /
    • 2022
  • The qualities of the products produced by injection molding are strongly influenced by the process variables of the injection molding machine set by the engineer. It is very difficult to predict the qualities of the injection molded product considering the stochastic nature of the manufacturing process, since the processing conditions have a complex impact on the quality of the injection molded product. It is recognized that the artificial neural network(ANN) is capable of mapping the intricate relationship between the input and output variables very accurately, therefore, many studies are being conducted to predict the relationship between the results of the product and the process variables using ANN. However in the condition of a small number of data sets, the predicting performance and robustness of the ANN model could be reduced due to too many input variables. In the present study, the ANN model that predicts the length of the injection molded product for multiple combinations of process variables was developed. And the accuracy of each ANN model was compared for 8 process variables and 4 important process inputs that were determined by the feature selection. Based on the comparison, it was verified that the performance of the ANN model increased when only 4 important variables were applied.

Analyzing empirical performance of correlation based feature selection with company credit rank score dataset - Emphasis on KOSPI manufacturing companies -

  • Nam, Youn Chang;Lee, Kun Chang
    • Journal of the Korea Society of Computer and Information
    • /
    • v.21 no.4
    • /
    • pp.63-71
    • /
    • 2016
  • This paper is about applying efficient data mining method which improves the score calculation and proper building performance of credit ranking score system. The main idea of this data mining technique is accomplishing such objectives by applying Correlation based Feature Selection which could also be used to verify the properness of existing rank scores quickly. This study selected 2047 manufacturing companies on KOSPI market during the period of 2009 to 2013, which have their own credit rank scores given by NICE information service agency. Regarding the relevant financial variables, total 80 variables were collected from KIS-Value and DART (Data Analysis, Retrieval and Transfer System). If correlation based feature selection could select more important variables, then required information and cost would be reduced significantly. Through analysis, this study show that the proposed correlation based feature selection method improves selection and classification process of credit rank system so that the accuracy and credibility would be increased while the cost for building system would be decreased.

Selection Attributes and Pursuit Benefits of Processed Fishery Products (수산물가공식품의 선택속성 및 추구혜택에 관한 연구)

  • Kim, Jong-Sung;Ha, Kyu-Soo
    • Journal of the Korean Society of Food Culture
    • /
    • v.25 no.5
    • /
    • pp.516-524
    • /
    • 2010
  • Consumers are highly interested in processed fishery products that are healthy and superior in terms of convenience, nourishment, and taste. However, current domestic research on processed fishery products is marginal. We systematically analyzed consumer consumption patterns and the relationship to pursuit benefits, selection attributes, satisfaction levels, and reasons for purchase. Consumers considered product information the most important selection attribute, whereas convenience scored highest for pursuit benefits. Furthermore, the influences of selection attributes and pursuit benefits on satisfaction level and the reason for purchasing an item were analyzed using demographic properties as control variables. The variables that affected satisfaction level were residential district (region: B= -0.268, p<0.05.), recipe (B=0.098, p<0.05), nutrients (B=0.124, p<0.05), convenience (B=0.283, p<0.001), and economics (B=0.138, p<0.05). The variables affecting the reason for purchasing were nutrients (B=0.173, p<0.001), convenience (B=0.277, p<0.001) and satisfaction level (B=0.163, p<0.001). Pursuit intention had significant effects on purchase intention; however, selection attributes had no significant effect on purchase intention. Therefore, consumer satisfaction had a significant effect on purchase intention. This result showed that if consumers were satisfied, they intended to repurchase. Attempts to increase repurchases by consumer are needed by fulfilling consumer satisfaction. These data can be utilized as a fundamental reference for sales promotions.

Variable selection in censored kernel regression

  • Choi, Kook-Lyeol;Shim, Jooyong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.1
    • /
    • pp.201-209
    • /
    • 2013
  • For censored regression, it is often the case that some input variables are not important, while some input variables are more important than others. We propose a novel algorithm for selecting such important input variables for censored kernel regression, which is based on the penalized regression with the weighted quadratic loss function for the censored data, where the weight is computed from the empirical survival function of the censoring variable. We employ the weighted version of ANOVA decomposition kernels to choose optimal subset of important input variables. Experimental results are then presented which indicate the performance of the proposed variable selection method.

Brand Selection of shirts and Jeans Relating to Consumers' Characteristics: A Comparative Study between Domestic and Foreign Brand (셔츠 및 청바지의 상표선택과 소비자 특성에 관한 연구)

  • 이명희
    • Journal of the Korean Home Economics Association
    • /
    • v.35 no.1
    • /
    • pp.263-276
    • /
    • 1997
  • The objectives of this study were to examine the differences in brand selection motives according to the domestic and foreign brand selection with shirts and jeans and to disclose the relationships between the brand selection and consumers' charcteristics; like their demographic variables sociability and superiority. Samples were 262 college women in Seoul Korea. The data were analyzed using t-test paired t-test χ2-test and discriminant analysis. The results of the study were the followings. 1. Purchasers of foreign brand were influenced by 'quality' 'wearing of others', 'reputation of brand', 'possibility of credit card use' more than those of domestic while purchasers of domestic brand were influenced by price. 2. Purchasers of foreign brand had more tendency to decide which brand to buy in advance than those of domestic. 3. 6 brand selection motives consumers' income and sociability contributed to discriminating the group of domestic and foreign brand purchase with shirts. The accuracy of the predicting the groups by the 8 variables was 75.95% Consumers high in sociability and income belonged to the group of foreign brand purchase. 4,6 brand selection motives consumers' age and superiority contributed to discriminating the group of domestic and foreign brand purchase with jeans. The accuracy of the predicting the groups by the 8 variables was 72.52% Consumers high in sociability and income belonged to the group of foreign brand purchase. 4. 6 brand selection motives consumer's age and superiority contributed to discriminating the group of domestic and foreign brand purchase with jeans. The accuracy of the predicting the groups by the 8 variables was 72.52% Consumers high in superiority and youngers belonged to the group of foreign brand purchase.

  • PDF