• Title/Summary/Keyword: Model selection

Search Result 3,994, Processing Time 0.031 seconds

Simultaneous outlier detection and variable selection via difference-based regression model and stochastic search variable selection

  • Park, Jong Suk;Park, Chun Gun;Lee, Kyeong Eun
    • Communications for Statistical Applications and Methods
    • /
    • v.26 no.2
    • /
    • pp.149-161
    • /
    • 2019
  • In this article, we suggest the following approaches to simultaneous variable selection and outlier detection. First, we determine possible candidates for outliers using properties of an intercept estimator in a difference-based regression model, and the information of outliers is reflected in the multiple regression model adding mean shift parameters. Second, we select the best model from the model including the outlier candidates as predictors using stochastic search variable selection. Finally, we evaluate our method using simulations and real data analysis to yield promising results. In addition, we need to develop our method to make robust estimates. We will also to the nonparametric regression model for simultaneous outlier detection and variable selection.

A Decision-making Model for SCM System Selection (SCM 시스템 선정을 위한 의사 결정 모델)

  • Seo Kwang-Kyu
    • Journal of the Korea Safety Management & Science
    • /
    • v.7 no.4
    • /
    • pp.165-177
    • /
    • 2005
  • Supply Chain Management(SCM) system is a critical investment that can affect future competitiveness and performance of a company. Selection of a right SCM system is one of the critical issues. This paper provides the characteristic factors of SCM system selection and the SCM system evaluation and selection model based on Analytic Hierarchy Process(AHP). The proposed model can systematically construct the objectives of SCM system selection to support the business goals. A empirical example demonstrates the feasibility of the proposed model and the model can help a company to make better decision-making in selecting SCM system.

Ensemble Gene Selection Method Based on Multiple Tree Models

  • Mingzhu Lou
    • Journal of Information Processing Systems
    • /
    • v.19 no.5
    • /
    • pp.652-662
    • /
    • 2023
  • Identifying highly discriminating genes is a critical step in tumor recognition tasks based on microarray gene expression profile data and machine learning. Gene selection based on tree models has been the subject of several studies. However, these methods are based on a single-tree model, often not robust to ultra-highdimensional microarray datasets, resulting in the loss of useful information and unsatisfactory classification accuracy. Motivated by the limitations of single-tree-based gene selection, in this study, ensemble gene selection methods based on multiple-tree models were studied to improve the classification performance of tumor identification. Specifically, we selected the three most representative tree models: ID3, random forest, and gradient boosting decision tree. Each tree model selects top-n genes from the microarray dataset based on its intrinsic mechanism. Subsequently, three ensemble gene selection methods were investigated, namely multipletree model intersection, multiple-tree module union, and multiple-tree module cross-union, were investigated. Experimental results on five benchmark public microarray gene expression datasets proved that the multiple tree module union is significantly superior to gene selection based on a single tree model and other competitive gene selection methods in classification accuracy.

Evaluating Variable Selection Techniques for Multivariate Linear Regression (다중선형회귀모형에서의 변수선택기법 평가)

  • Ryu, Nahyeon;Kim, Hyungseok;Kang, Pilsung
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.42 no.5
    • /
    • pp.314-326
    • /
    • 2016
  • The purpose of variable selection techniques is to select a subset of relevant variables for a particular learning algorithm in order to improve the accuracy of prediction model and improve the efficiency of the model. We conduct an empirical analysis to evaluate and compare seven well-known variable selection techniques for multiple linear regression model, which is one of the most commonly used regression model in practice. The variable selection techniques we apply are forward selection, backward elimination, stepwise selection, genetic algorithm (GA), ridge regression, lasso (Least Absolute Shrinkage and Selection Operator) and elastic net. Based on the experiment with 49 regression data sets, it is found that GA resulted in the lowest error rates while lasso most significantly reduces the number of variables. In terms of computational efficiency, forward/backward elimination and lasso requires less time than the other techniques.

Behrens-Fisher Problem from a Model Selection Point of View

  • Jeon, Jong-Woo;Lee, Kee-Won
    • Journal of the Korean Statistical Society
    • /
    • v.20 no.2
    • /
    • pp.99-107
    • /
    • 1991
  • Behrens-Fisher problem is viewed from a model selection approach. Normal distribution is regarded as an approximating model, A criterion, called TIC, is derived and is compared with selection criteria such as AIC and a bootstrap estimator. Stochastic approximation is used since no closed form expression is available for the bootstrap estimator.

  • PDF

A study of selection operator using distance information between individuals in genetic algorithm

  • Ito, Minoru;Sugisaka, Masanori
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2003.10a
    • /
    • pp.1521-1524
    • /
    • 2003
  • In this paper, we propose a "Distance Correlation Selection operator (DCS)" as a new selection operator. For Genetic Algorithm (GA), many improvements have been proposed. The MGG (Minimal Generation Gap) model proposed by Satoh et.al. shows good performance. The MGG model has all advantages of conventional models and the ability of avoiding the premature convergence and suppressing the evolutionary stagnation. The proposed method is an extension of selection operator in the original MGG model. Generally, GA has two types of selection operators, one is "selection for reproduction", and the other is "selection for survival"; the former is for crossover and the latter is the individuals which survive to the next generation. The proposed method is an extension of the former. The proposed method utilizes distance information between individuals. From this extension, the proposed method aims to expand a search area and improve ability to search solution. The performance of the proposed method is examined with several standard test functions. The experimental results show good performance better than the original MGG model.

  • PDF

Formwork System Selection Model for Tall Building Construction Using the Adaboost Algorithm

  • Shin, Yoon-Seok
    • Journal of the Korea Institute of Building Construction
    • /
    • v.11 no.5
    • /
    • pp.523-529
    • /
    • 2011
  • In a tall building construction with reinforced concrete structures, the selection of an appropriate formwork system is a crucial factor for the success of the project. Thus, selecting an appropriate formwork system affects the entire construction duration and cost, as well as subsequent construction activities. However, in practice, the selection of an appropriate formwork system has depended mainly on the intuitive and subjective opinion of working level employees with restricted experience. Therefore, in this study, a formwork system selection model using the Adaboost algorithm is proposed to support the selection of a formwork system that is suitable for the construction site conditions. To validate the applicability of the proposed model, the selection models Adaboost and ANN were both applied to actual case data of tall building construction in Korea. The Adaboost model showed slightly better accuracy than that of the ANN model. The Adaboost model can assist engineers to determine the appropriate formwork system at the inception of future projects.

Project Selection & Evaluation System Design and Implementation-Literature Review and Case Study- (연구과제 선정.평가 체계설계에 관한 연구)

  • 용세중;최덕출;한종우;정용훈;이원영
    • Journal of Technology Innovation
    • /
    • v.2 no.1
    • /
    • pp.116-141
    • /
    • 1994
  • This paper presents a model for R&D project selection and evaluation system design developed through literature review. The model emphasizes the fitness between the five elements of the system : evaluation phase and purpose, personnel and organization, evaluation critiria and decision model, evaluation form and procedure, and projects. The model was applied in real situation as a test case. The important findings are that a good project selection and evaluation model contributes only partially to the effectiveness of the project selection and that system development and implementation activity is a dynamic and multi-facetted learning process.

  • PDF

System Trading using Case-based Reasoning based on Absolute Similarity Threshold and Genetic Algorithm (절대 유사 임계값 기반 사례기반추론과 유전자 알고리즘을 활용한 시스템 트레이딩)

  • Han, Hyun-Woong;Ahn, Hyun-Chul
    • The Journal of Information Systems
    • /
    • v.26 no.3
    • /
    • pp.63-90
    • /
    • 2017
  • Purpose This study proposes a novel system trading model using case-based reasoning (CBR) based on absolute similarity threshold. The proposed model is designed to optimize the absolute similarity threshold, feature selection, and instance selection of CBR by using genetic algorithm (GA). With these mechanisms, it enables us to yield higher returns from stock market trading. Design/Methodology/Approach The proposed CBR model uses the absolute similarity threshold varying from 0 to 1, which serves as a criterion for selecting appropriate neighbors in the nearest neighbor (NN) algorithm. Since it determines the nearest neighbors on an absolute basis, it fails to select the appropriate neighbors from time to time. In system trading, it is interpreted as the signal of 'hold'. That is, the system trading model proposed in this study makes trading decisions such as 'buy' or 'sell' only if the model produces a clear signal for stock market prediction. Also, in order to improve the prediction accuracy and the rate of return, the proposed model adopts optimal feature selection and instance selection, which are known to be very effective in enhancing the performance of CBR. To validate the usefulness of the proposed model, we applied it to the index trading of KOSPI200 from 2009 to 2016. Findings Experimental results showed that the proposed model with optimal feature or instance selection could yield higher returns compared to the benchmark as well as the various comparison models (including logistic regression, multiple discriminant analysis, artificial neural network, support vector machine, and traditional CBR). In particular, the proposed model with optimal instance selection showed the best rate of return among all the models. This implies that the application of CBR with the absolute similarity threshold as well as the optimal instance selection may be effective in system trading from the perspective of returns.

Korean women wage analysis using selection models (표본 선택 모형을 이용한 국내 여성 임금 데이터 분석)

  • Jeong, Mi Ryang;Kim, Mijeong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.5
    • /
    • pp.1077-1085
    • /
    • 2017
  • In this study, we have found the major factors which affect Korean women's wage analysing the data provided by 2015 Korea Labor Panel Survey (KLIPS). In general, wage data is difficult to analyze because random sampling is infeasible. Heckman sample selection model is the most widely used method for analysing the data with sample selection. Heckman proposed two kinds of selection models: the one is the model with maximum likelihood method and the other is the Heckman two stage model. Heckman two stage model is known to be robust to the normal assumption of bivariate error terms. Recently, Marchenko and Genton (2012) proposed the Heckman selectiont model which generalizes the Heckman two stage model and concluded that Heckman selection-t model is more robust to the error assumptions. Employing the two models, we carried out the analysis of the data and we compared those results.