• 제목/요약/키워드: Selection and Elimination

검색결과 106건 처리시간 0.021초

다중선형회귀모형에서의 변수선택기법 평가 (Evaluating Variable Selection Techniques for Multivariate Linear Regression)

  • 류나현;김형석;강필성
    • 대한산업공학회지
    • /
    • 제42권5호
    • /
    • pp.314-326
    • /
    • 2016
  • The purpose of variable selection techniques is to select a subset of relevant variables for a particular learning algorithm in order to improve the accuracy of prediction model and improve the efficiency of the model. We conduct an empirical analysis to evaluate and compare seven well-known variable selection techniques for multiple linear regression model, which is one of the most commonly used regression model in practice. The variable selection techniques we apply are forward selection, backward elimination, stepwise selection, genetic algorithm (GA), ridge regression, lasso (Least Absolute Shrinkage and Selection Operator) and elastic net. Based on the experiment with 49 regression data sets, it is found that GA resulted in the lowest error rates while lasso most significantly reduces the number of variables. In terms of computational efficiency, forward/backward elimination and lasso requires less time than the other techniques.

연결강도분석을 이용한 통합된 부도예측용 신경망모형

  • 이웅규;임영하
    • 한국정보시스템학회:학술대회논문집
    • /
    • 한국정보시스템학회 2002년도 추계학술대회
    • /
    • pp.289-312
    • /
    • 2002
  • This study suggests the Link weight analysis approach to choose input variables and an integrated model to make more accurate bankruptcy prediction model. the Link weight analysis approach is a method to choose input variables to analyze each input node's link weight which is the absolute value of link weight between an input nodes and a hidden layer. There are the weak-linked neurons elimination method, the strong-linked neurons selection method in the link weight analysis approach. The Integrated Model is a combined type adapting Bagging method that uses the average value of the four models, the optimal weak-linked-neurons elimination method, optimal strong-linked neurons selection method, decision-making tree model, and MDA. As a result, the methods suggested in this study - the optimal strong-linked neurons selection method, the optimal weak-linked neurons elimination method, and the integrated model - show much higher accuracy than MDA and decision making tree model. Especially the integrated model shows much higher accuracy than MDA and decision making tree model and shows slightly higher accuracy than the optimal weak-linked neurons elimination method and the optimal strong-linked neurons selection method.

  • PDF

다중회귀모형에서 전진선택과 후진제거의 기하학적 표현 (Geometrical description based on forward selection & backward elimination methods for regression models)

  • 홍종선;김명진
    • Journal of the Korean Data and Information Science Society
    • /
    • 제21권5호
    • /
    • pp.901-908
    • /
    • 2010
  • 다중회귀모형에서 변수선택법 중에서 전진선택과 후진제거의 과정을 기하학적으로 표현하는 그래픽적 방법을 제안한다. 반지름이 1인 반원의 제1사분면에는 전진선택 과정을, 제2사분면에는 후진제거 과정을 표현한다. 각 단계에서 회귀제곱합을 벡터로 표현하고, 추가제곱합 또는 부분결정계수를 벡터 사이의 각도로 나타내며 벡터의 끝을 연결할 때 통계적으로 유의하면 점선으로 표현하여 부분가설검정의 통계적 분석결과를 인지할 수 있도록 작성한다. 이 방법을 이용하면 전진선택과 후진제거 방법에 의한 최종모형을 비교 분석하고 전체적으로 모형의 적합도를 파악할 수 있다.

유전자 선택을 위해 속성 삭제에 기반을 둔 최적화된 분류기 설계 (A Design of an Optimized Classifier based on Feature Elimination for Gene Selection)

  • 이병관;박석규;유슬리나 티파니
    • 한국정보전자통신기술학회논문지
    • /
    • 제8권5호
    • /
    • pp.384-393
    • /
    • 2015
  • 본 논문은 두 가지 속성 삭제 방법인 ReliefF와 SVM-REF를 조합하여 유전자 선택을 위한 속성 삭제에 기반을 둔 최적화된 분류법(OCFE)을 제안한다. ReliefF 알고리즘은 데이터의 중요도에 따라 데이터 순위를 매기고 필터(filter) 속성 선택 알고리즘이다. SVM-RFE 알고리즘은 속성의 가중치 기반으로 데이터 순위를 매기고 데이터를 감싸는 래퍼(wrapper) 속성 선택 알고리즘이다. 이러한 두 가지 기법을 조합함으로써, 우리는 SVM-RFE는 0.3096779이고 OCFE는 0.3016138으로 에러율 평균이 좀 더 낮게 나타났다. 또한, 제안된 기법은 SVM-RFE가 69%이고 OCFE는 70%으로 좀 더 정확한 것으로 나타났다.

개별 속성의 선택 및 제거효과 순위를 이용한 사례기반 추론의 속성 선정 (Feature Selection for Case-Based Reasoning using the Order of Selection and Elimination Effects of Individual Features)

  • 이재식;이혁희
    • 지능정보연구
    • /
    • 제8권2호
    • /
    • pp.117-137
    • /
    • 2002
  • 사례기반 추론은 과거의 사례를 기반으로 새로운 사례에 대한 답을 제시하는 기계학습의 한 분야이다. 과거의 사례는 일정한 형식으로 사례 베이스에 저장되는데, 저장의 형식을 결정하는 것이 속성이다. 속성은 사례의 특징을 가장 잘 표현할 수 있는 것들로 구성되며, 속성값간의 유사도 도출을 통해서 유사 사례를 검색하게 된다. 따라서, 사례기반 추론은 사용되는 속성에 따라서 성능이 달라지게 된다 본 연구에서는 먼저 속성을 하나씩만 사용하여 사례기반 추론을 수행하여 각 속성의 선택효과를 측정하고, 하나씩만 제거하고 사례기반 추론을 수행하여 각 속성의 제거효과를 측정하였다. 이 측정치들을 근거로 속성의 부분집합을 구성하여 사례기반 추론을 구현한 결과, 속성을 전부 사용했을 때보다 성능과 효율성이 우수한 사례기반 추론 시스템을 구축할 수 있었다.

  • PDF

지형자료의 계층화를 이용한 하계망 일반화 (Generalization of the Stream Network by the Geographic Hierarchy of Landform Data)

  • 김남신
    • 대한지리학회지
    • /
    • 제40권4호
    • /
    • pp.441-453
    • /
    • 2005
  • 본 연구의 목적은 지형자료에 대한 계층화 알고리즘을 개발하여 하계망을 일반화하고자 하였다. 하계망은 계층적인 구조를 갖기 때문에 일반화를 위해 선형사상들에 대한 지형자료의 계층화가 요구된다. 하계망 일반화의 절차는 하계망의 계층화, 차수별 선택과 제거, 그리고 알고리즘 적용으로 진행하였다. 계층화는 하계망의 고도에 따른 방향 결정, Stroke Segment 서열화. Strahler 차수화로 진행하였으며, 선형사상의 선택과 제거는 지리자료의 질의를 통해 차수와 선의 길이를 기준으로 처리하였다 개선된 Simoo 알고리즘은 선형사상의 곡률을 낮추고 완만화에 효과적이었다 연구결과는 공간적으로 다양한 계층구조를 갖는 사상들에 대한 일반화를 개선할 수 있을 것으로 기대된다.

Multivariate Procedure for Variable Selection and Classification of High Dimensional Heterogeneous Data

  • Mehmood, Tahir;Rasheed, Zahid
    • Communications for Statistical Applications and Methods
    • /
    • 제22권6호
    • /
    • pp.575-587
    • /
    • 2015
  • The development in data collection techniques results in high dimensional data sets, where discrimination is an important and commonly encountered problem that are crucial to resolve when high dimensional data is heterogeneous (non-common variance covariance structure for classes). An example of this is to classify microbial habitat preferences based on codon/bi-codon usage. Habitat preference is important to study for evolutionary genetic relationships and may help industry produce specific enzymes. Most classification procedures assume homogeneity (common variance covariance structure for all classes), which is not guaranteed in most high dimensional data sets. We have introduced regularized elimination in partial least square coupled with QDA (rePLS-QDA) for the parsimonious variable selection and classification of high dimensional heterogeneous data sets based on recently introduced regularized elimination for variable selection in partial least square (rePLS) and heterogeneous classification procedure quadratic discriminant analysis (QDA). A comparison of proposed and existing methods is conducted over the simulated data set; in addition, the proposed procedure is implemented to classify microbial habitat preferences by their codon/bi-codon usage. Five bacterial habitats (Aquatic, Host Associated, Multiple, Specialized and Terrestrial) are modeled. The classification accuracy of each habitat is satisfactory and ranges from 89.1% to 100% on test data. Interesting codon/bi-codons usage, their mutual interactions influential for respective habitat preference are identified. The proposed method also produced results that concurred with known biological characteristics that will help researchers better understand divergence of species.

잡음추측을 이용한 자동적인 에지검출 문턱값 선택과 그 응용 (Automatic threshold selection for edge detection using a noise estimation scheme and its application)

  • 김형수;오승준
    • 한국통신학회논문지
    • /
    • 제21권3호
    • /
    • pp.553-563
    • /
    • 1996
  • Detecting edges is one of issues with essentialimprotance in the area of image analysis. An edge in an image is a boundary or contour at which a significant change occurs in image intensity. Edge detection has been studied in many addlications such as imagesegmentation, robot vision, and image compression. In this paper, we propose an automatic threshold selection scheme for edge detection and show its application to noise elimination. The scheme suggested here applied statistical properties of the noise estimated from a noisy image to threshold selection. Since a selected threshold value in the scheme depends on not the characgreistic of an orginal image but the statistical feature of added noise, we can remove ad-hoc manners used for selecting the threshold value as well as decide the value theoretically. Furthermore, that shceme can reduce the number of edge pixels either generated or lost by noise. an application of the scheme to noise elimination is shown here. Noise in the input image can be eliminated with considering the direction of each edge pixedl on the edge map obtained by applying the threshold selection scheme proposed in this paper. Achieving significantly improved results in terms of SNR as well as subjective quality, we can claim that the suggested method works well.

  • PDF

초기 탐색 위치의 효율적 선택에 의한 고속 움직임 추정 (Fast Motion Estimation Using Efficient Selection of Initial Search Position)

  • 남수영;김석규;임채환;김남철
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2000년도 추계종합학술대회 논문집(4)
    • /
    • pp.167-170
    • /
    • 2000
  • In this paper, we present a fast algorithm for the motion estimation using the efficient selection of an initial search position. In the method, we select the initial search position using the motion vector from the subsmpled images, the predicted motion vector from the neighbor blocks, and the (0,0) motion vector. While searching the candidate blocks, we use the spiral search pattern with the successive elimination algorithm(SEA) and the partial distortion elimination(PDE). The experiment results show that the complexity of the proposed algorithm is about 2∼3 times faster than the three-step search(TSS) with the PSNR loss of just 0.05[dB]∼0.1[dB] than the full search algorithm PSNR. The search complexity can be reduced with quite a few PSNR loss by controling the number of the depth in the spiral search pattern.

  • PDF

A Two-Stage Elimination Type Selection Procedure for Stochastically Increasing Distributions : with an Application to Scale Parameters Problem

  • Lee, Seung-Ho
    • Journal of the Korean Statistical Society
    • /
    • 제19권1호
    • /
    • pp.24-44
    • /
    • 1990
  • The purpose of this paper is to extend the idea of Tamhane and Bechhofer (1977, 1979) concerning the normal means problem to some general class of distributions. The key idea in Tamhane and Bechhofer is the derivation of the computable lower bounds on the probability of a correct selection. To derive such lower bounds, they used the specific covariance structure of a multivariate normal distribution. It is shown that such lower bounds can be obtained for a class of stochastically increasing distributions under certain conditions, which is sufficiently general so as to include the normal means problem as a special application. As an application of the general theory to the scale parameters problem, a two-stage elimination type procedure for selecting the population associated with the smallest variance from among several normal populations is proposed. The design constants are tabulated and the relative efficiencies are computed.

  • PDF