• Title/Summary/Keyword: Variables selection

Search Result 1,192, Processing Time 0.027 seconds

Landslide susceptibility assessment using feature selection-based machine learning models

  • Liu, Lei-Lei;Yang, Can;Wang, Xiao-Mi
    • Geomechanics and Engineering
    • /
    • v.25 no.1
    • /
    • pp.1-16
    • /
    • 2021
  • Machine learning models have been widely used for landslide susceptibility assessment (LSA) in recent years. The large number of inputs or conditioning factors for these models, however, can reduce the computation efficiency and increase the difficulty in collecting data. Feature selection is a good tool to address this problem by selecting the most important features among all factors to reduce the size of the input variables. However, two important questions need to be solved: (1) how do feature selection methods affect the performance of machine learning models? and (2) which feature selection method is the most suitable for a given machine learning model? This paper aims to address these two questions by comparing the predictive performance of 13 feature selection-based machine learning (FS-ML) models and 5 ordinary machine learning models on LSA. First, five commonly used machine learning models (i.e., logistic regression, support vector machine, artificial neural network, Gaussian process and random forest) and six typical feature selection methods in the literature are adopted to constitute the proposed models. Then, fifteen conditioning factors are chosen as input variables and 1,017 landslides are used as recorded data. Next, feature selection methods are used to obtain the importance of the conditioning factors to create feature subsets, based on which 13 FS-ML models are constructed. For each of the machine learning models, a best optimized FS-ML model is selected according to the area under curve value. Finally, five optimal FS-ML models are obtained and applied to the LSA of the studied area. The predictive abilities of the FS-ML models on LSA are verified and compared through the receive operating characteristic curve and statistical indicators such as sensitivity, specificity and accuracy. The results showed that different feature selection methods have different effects on the performance of LSA machine learning models. FS-ML models generally outperform the ordinary machine learning models. The best FS-ML model is the recursive feature elimination (RFE) optimized RF, and RFE is an optimal method for feature selection.

Selection of Variables for Response Surface Experiments with Mixtures

  • Park, Sung H.
    • Journal of the Korean Statistical Society
    • /
    • v.6 no.2
    • /
    • pp.103-115
    • /
    • 1977
  • A strategy for selecting subsets of variables from a given linear model in a mixture system is discussed. The purpose is to achieve better fitting surfaces for estimation of the response in an experimental region of interest. A criterion is proposed for screening variables and illustrated with an example.

  • PDF

A Study on the Selection of Dependent Variables of Momentum Equations in the General Curvilinear Coordinate System for Computational Fluid Dynamics (전산유체역학을 위한 일반 곡률좌표계에서 운동량 방정식의 종속변수 선정에 관한 연구)

  • Kim, Won-Kap;Choi, Young Don
    • Transactions of the Korean Society of Mechanical Engineers B
    • /
    • v.23 no.2
    • /
    • pp.198-209
    • /
    • 1999
  • This study reports the selection of dependent variables for momentum equations in general curvilinear coordinates. Catesian, covariant and contravariant velocity components were examined for the dependent variable. The focus of present study is confined to staggered grid system Each dependent variable selected for momentum equations are tested for several flow fields. Results show that the selection of Cartesian and covariant velocity components intrinsically can not satisfy mass conservation of control volume unless additional converting processes ore used. Also, Cartesian component can only be used for the flow field in which main-flow direction does not change significantly. Convergence rate for the selection of covariant velocity component decreases quickly as with the increase of non-orthogonality of grid system. But the selection of contravariant velocity component reduces the total mass residual of discretized equations rapidly to the limit of machine accuracy and the solutions are insensitive to the main-flow direction.

Comparison of model selection criteria in graphical LASSO (그래프 LASSO에서 모형선택기준의 비교)

  • Ahn, Hyeongseok;Park, Changyi
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.4
    • /
    • pp.881-891
    • /
    • 2014
  • Graphical models can be used as an intuitive tool for modeling a complex stochastic system with a large number of variables related each other because the conditional independence between random variables can be visualized as a network. Graphical least absolute shrinkage and selection operator (LASSO) is considered to be effective in avoiding overfitting in the estimation of Gaussian graphical models for high dimensional data. In this paper, we consider the model selection problem in graphical LASSO. Particularly, we compare various model selection criteria via simulations and analyze a real financial data set.

Subset selection in multiple linear regression: An improved Tabu search

  • Bae, Jaegug;Kim, Jung-Tae;Kim, Jae-Hwan
    • Journal of Advanced Marine Engineering and Technology
    • /
    • v.40 no.2
    • /
    • pp.138-145
    • /
    • 2016
  • This paper proposes an improved tabu search method for subset selection in multiple linear regression models. Variable selection is a vital combinatorial optimization problem in multivariate statistics. The selection of the optimal subset of variables is necessary in order to reliably construct a multiple linear regression model. Its applications widely range from machine learning, timeseries prediction, and multi-class classification to noise detection. Since this problem has NP-complete nature, it becomes more difficult to find the optimal solution as the number of variables increases. Two typical metaheuristic methods have been developed to tackle the problem: the tabu search algorithm and hybrid genetic and simulated annealing algorithm. However, these two methods have shortcomings. The tabu search method requires a large amount of computing time, and the hybrid algorithm produces a less accurate solution. To overcome the shortcomings of these methods, we propose an improved tabu search algorithm to reduce moves of the neighborhood and to adopt an effective move search strategy. To evaluate the performance of the proposed method, comparative studies are performed on small literature data sets and on large simulation data sets. Computational results show that the proposed method outperforms two metaheuristic methods in terms of the computing time and solution quality.

The Relationships among Selection Attribute, Trust, Experiential Value, and Recommendation for Sport Center Consumers (스포츠센터 이용객들의 레스토랑선택속성이 신뢰, 경험가치, 그리고 추천의도에 미치는 영향)

  • Kim, Hwa-Young;Park, Hea-Bin;Park, Joung-Mi;Lee, Sang-Mook
    • Culinary science and hospitality research
    • /
    • v.23 no.4
    • /
    • pp.66-73
    • /
    • 2017
  • This study was performed to verify the relationships among selection attribute, restaurant trust, experiential value, and recommendation focusing on sport center consumers. The data were collected from visitors who registered more than three months in the sport center in South Korea. Total 500 survey was distributed and 330 participants were used for further statistical analysis. SPSS 23.0 and AMOS 21.0 for Windows were used for statistical analysis. Five factors of selection attribute (menu, interior, exterior, staff, convenience) were extracted, and measured by using 15 questions. According to the results of this study, interior, exterior, and staff factors have positive effects on restaurant trust, and interior and menu were significant predictors of the experiential value. In addition, present study confirmed the theoretical relationship among trust, experiential value, and recommend intention as perceived by sport center visitors. Although there are many studies which demonstrated the relationships among selection attribute and other outcome variables, little research explained the relationships among the variables from sport center consumers. Therefore, this study will contribute to provide meaningful results and some practical implications for both academia and the related foodservice industry.

A Study on Teenagers' Selection Attribute and Satisfaction of Fast Food Menu (청소년의 패스트푸드 메뉴 선택 속성과 만족도에 관한 연구)

  • JeGal, Yun-Hee;Hong, Ki-Woon;Ryoo, Kyung-Min
    • Culinary science and hospitality research
    • /
    • v.15 no.2
    • /
    • pp.108-120
    • /
    • 2009
  • This study intends to examine the effect of the teenagers' selection attribute and satisfaction of fast food menu. Sampling was taken among the students of high schools at Changwon in Kyungnam province, and a total of 400 samples were distributed, among which 306 valid samples were used for analysis. After date cording, answers were processed by SPSS 12.0. As a result of the IPA analysis, the 1st quadrant included 8 variables, the 2nd quadrant 6 variables, the 3rd quadrant 5 variables, and the 4 quadrant 8 variables. Also, as a result of the factor analysis for menu selection attribute, 5 factors(atmosphere, food, staff, comfort, compensation) were extracted. It was found that only food has a statistically significant impact on satisfaction with fast food.

  • PDF

Identifying Factors for Corn Yield Prediction Models and Evaluating Model Selection Methods

  • Chang Jiyul;Clay David E.
    • KOREAN JOURNAL OF CROP SCIENCE
    • /
    • v.50 no.4
    • /
    • pp.268-275
    • /
    • 2005
  • Early predictions of crop yields call provide information to producers to take advantages of opportunities into market places, to assess national food security, and to provide early food shortage warning. The objectives of this study were to identify the most useful parameters for estimating yields and to compare two model selection methods for finding the 'best' model developed by multiple linear regression. This research was conducted in two 65ha corn/soybean rotation fields located in east central South Dakota. Data used to develop models were small temporal variability information (STVI: elevation, apparent electrical conductivity $(EC_a)$, slope), large temporal variability information (LTVI : inorganic N, Olsen P, soil moisture), and remote sensing information (green, red, and NIR bands and normalized difference vegetation index (NDVI), green normalized difference vegetation index (GDVI)). Second order Akaike's Information Criterion (AICc) and Stepwise multiple regression were used to develop the best-fitting equations in each system (information groups). The models with $\Delta_i\leq2$ were selected and 22 and 37 models were selected at Moody and Brookings, respectively. Based on the results, the most useful variables to estimate corn yield were different in each field. Elevation and $EC_a$ were consistently the most useful variables in both fields and most of the systems. Model selection was different in each field. Different number of variables were selected in different fields. These results might be contributed to different landscapes and management histories of the study fields. The most common variables selected by AICc and Stepwise were different. In validation, Stepwise was slightly better than AICc at Moody and at Brookings AICc was slightly better than Stepwise. Results suggest that the Alec approach can be used to identify the most useful information and select the 'best' yield models for production fields.

Input Variables Selection of Artificial Neural Network Using Mutual Information (상호정보량 기법을 적용한 인공신경망 입력자료의 선정)

  • Han, Kwang-Hee;Ryu, Yong-Jun;Kim, Tae-Soon;Heo, Jun-Haeng
    • Journal of Korea Water Resources Association
    • /
    • v.43 no.1
    • /
    • pp.81-94
    • /
    • 2010
  • Input variable selection is one of the various techniques for improving the performance of artificial neural network. In this study, mutual information is applied for input variable selection technique instead of correlation coefficient that is widely used. Among 152 variables of RDAPS (Regional Data Assimilation and Prediction System) output results, input variables for artificial neural network are chosen by computing mutual information between rainfall records and RDAPS' variables. At first the rainfall forecast variable of RDAPS result, namely APCP, is included as input variable and the other input variables are selected according to the rank of mutual information and correlation coefficient. The input variables using mutual information are usually those variables about wind velocity such as D300, U925, etc. Several statistical error estimates show that the result from mutual information is generally more accurate than those from the previous research and correlation coefficient. In addition, the artificial neural network using input variables computed by mutual information can effectively reduce the relative errors corresponding to the high rainfall events.

Input variables selection using genetic algorithm in training an artificial neural network (인공신경망 학습단계에서의 Genetic Algorithm을 이용한 입력변수 선정)

  • 이재식;차봉근
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 1996.10a
    • /
    • pp.27-30
    • /
    • 1996
  • Determination of input variables for artificial neural network (ANN) depends entirely on the judgement of a modeller. As the number of input variables increases, the training time for the resulting ANN increases exponentially. Moreover, larger number of input variables does not guarantee better performance. In this research, we employ Genetic Algorithm for selecting proper input variables that yield the best performance in training the resulting ANN.

  • PDF