• 제목/요약/키워드: masking and swamping effects

검색결과 14건 처리시간 0.022초

Multiple Deletions in Logistic Regression Models

  • Jung, Kang-Mo
    • Communications for Statistical Applications and Methods
    • /
    • 제16권2호
    • /
    • pp.309-315
    • /
    • 2009
  • We extended the results of Roy and Guria (2008) to multiple deletions in logistic regression models. Since single deletions may not exactly detect outliers or influential observations due to swamping effects and masking effects, it needs multiple deletions. We developed conditional deletion diagnostics which are designed to overcome problems of masking effects. We derived the closed forms for several statistics in logistic regression models. They give useful diagnostics on the statistics.

Identifying Multiple Leverage Points ad Outliers in Multivariate Linear Models

  • Yoo, Jong-Young
    • Communications for Statistical Applications and Methods
    • /
    • 제7권3호
    • /
    • pp.667-676
    • /
    • 2000
  • This paper focuses on the problem of detecting multiple leverage points and outliers in multivariate linear models. It is well known that he identification of these points is affected by masking and swamping effects. To identify them, Rousseeuw(1985) used robust estimators of MVE(Minimum Volume Ellipsoids), which have the breakdown point of 50% approximately. And Rousseeuw and van Zomeren(1990) suggested the robust distance based on MVE, however, of which the computation is extremely difficult when the number of observations n is large. In this study, e propose a new algorithm to reduce the computational difficulty of MVE. The proposed method is powerful in identifying multiple leverage points and outlies and also effective in reducing the computational difficulty of MVE.

  • PDF

Identification of Regression Outliers Based on Clustering of LMS-residual Plots

  • Kim, Bu-Yong;Oh, Mi-Hyun
    • Communications for Statistical Applications and Methods
    • /
    • 제11권3호
    • /
    • pp.485-494
    • /
    • 2004
  • An algorithm is proposed to identify multiple outliers in linear regression. It is based on the clustering of residuals from the least median of squares estimation. A cut-height criterion for the hierarchical cluster tree is suggested, which yields the optimal clustering of the regression outliers. Comparisons of the effectiveness of the procedures are performed on the basis of the classic data and artificial data sets, and it is shown that the proposed algorithm is superior to the one that is based on the least squares estimation. In particular, the algorithm deals very well with the masking and swamping effects while the other does not.

잠재적 이상치군에 대한 검정 (Outlier tests on potential outliers)

  • 서한손
    • 응용통계연구
    • /
    • 제30권1호
    • /
    • pp.159-167
    • /
    • 2017
  • 일반적으로 잠재적 이상치군은 검정과정을 통해 최종적으로 이상치 여부를 판단하지만 검정절차를 생략하거나 모의실험에 의해 계산된 유의값을 기반으로 검정을 수행하는 이상치 탐지법들도 있다. 본 논문에서는 가면화나 수렁화현상을 피하기 위하여 이상치후보군에 속한 개별 관찰치를 검정하지 않고 이상치후보군의 부분집합들을 검정하는 절차를 제안한다. 제안된 방법의 활용을 보여주는 예제와 다른 방법과의 검정력 비교를 위한 모의실험 결과가 제시된다.

Influential Points in GLMs via Backwards Stepping

  • Jeong, Kwang-Mo;Oh, Hae-Young
    • Communications for Statistical Applications and Methods
    • /
    • 제9권1호
    • /
    • pp.197-212
    • /
    • 2002
  • When assessing goodness-of-fit of a model, a small subset of deviating observations can give rise to a significant lack of fit. It is therefore important to identify such observations and to assess their effects on various aspects of analysis. A Cook's distance measure is usually used to detect influential observation. But it sometimes is not fully effective in identifying truly influential set of observations because there may exist masking or swamping effects. In this paper we confine our attention to influential subset In GLMs such as logistic regression models and loglinear models. We modify a backwards stepping algorithm, which was originally suggested for detecting outlying cells in contingency tables, to detect influential observations in GLMs. The algorithm consists of two steps, the identification step and the testing step. In identification step we Identify influential observations based on influencial measures such as Cook's distances. On the other hand in testing step we test the subset of identified observations to be significant or not Finally we explain the proposed method through two types of dataset related to logistic regression model and loglinear model, respectively.

Unmasking Multiple Outliers in Multivariate Data

  • Yoo Jong-Young
    • Communications for Statistical Applications and Methods
    • /
    • 제13권1호
    • /
    • pp.29-38
    • /
    • 2006
  • We proposed a procedure for detecting of multiple outliers in multivariate data. Rousseeuw and van Zomeren (1990) have suggested the robust distance $RD_i$ by using the Resampling Algorithm. But $RD_i$ are based on the assumption that X is in the general position.(X is said to be in the general position when every subsample of size p+1 has rank p) From the practical points of view, this is clearly unrealistic. In this paper, we proposed a computing method for approximating MVE, which is not subject to these problems. The procedure is easy to compute, and works well even if subsample is singular or nearly singular matrix.

The Detection and Testing of Multiple Outliers in Linear Regression

  • Park, Jin-Pyo;Zamar, Ruben H.
    • Journal of the Korean Data and Information Science Society
    • /
    • 제15권4호
    • /
    • pp.921-934
    • /
    • 2004
  • We consider the problem of identifying and testing outliers in linear regression. First, we consider the scale-ratio tests for testing the null hypothesis of no outliers. A test based on the ratio of two residual scale estimates is proposed. We show the asymptotic distribution of test statistics and investigate the properties of the test. Next we consider the problem of identifying the outliers. A forward procedure based on the suggested test is proposed and shown to perform fairly well. The forward procedure is unaffected by masking and swamping effects because the test statistics used a robust scale estimate.

  • PDF

The Identification Of Multiple Outliers

  • Park, Jin-Pyo
    • Journal of the Korean Data and Information Science Society
    • /
    • 제11권2호
    • /
    • pp.201-215
    • /
    • 2000
  • The classical method for regression analysis is the least squares method. However, if the data contain significant outliers, the least squares estimator can be broken down by outliers. To remedy this problem, the robust methods are important complement to the least squares method. Robust methods down weighs or completely ignore the outliers. This is not always best because the outliers can contain some very important information about the population. If they can be detected, the outliers can be further inspected and appropriate action can be taken based on the results. In this paper, I propose a sequential outlier test to identify outliers. It is based on the nonrobust estimate and the robust estimate of scatter of a robust regression residuals and is applied in forward procedure, removing the most extreme data at each step, until the test fails to detect outliers. Unlike other forward procedures, the present one is unaffected by swamping or masking effects because the statistics is based on the robust regression residuals. I show the asymptotic distribution of the test statistics and apply the test to several real data and simulated data for the test to be shown to perform fairly well.

  • PDF

V-mask Type Criterion for Identification of Outliers In Logistic Regression

  • Kim Bu-Yong
    • Communications for Statistical Applications and Methods
    • /
    • 제12권3호
    • /
    • pp.625-634
    • /
    • 2005
  • A procedure is proposed to identify multiple outliers in the logistic regression. It detects the leverage points by means of hierarchical clustering of the robust distances based on the minimum covariance determinant estimator, and then it employs a V-mask type criterion on the scatter plot of robust residuals against robust distances to classify the observations into vertical outliers, bad leverage points, good leverage points, and regular points. Effectiveness of the proposed procedure is evaluated on the basis of the classic and artificial data sets, and it is shown that the procedure deals very well with the masking and swamping effects.

The Sequential Testing of Multiple Outliers in Linear Regression

  • Park, Jinpyo;Park, Heechang
    • Communications for Statistical Applications and Methods
    • /
    • 제8권2호
    • /
    • pp.337-346
    • /
    • 2001
  • In this paper we consider the problem of identifying and testing the outliers in linear regression. first we consider the problem for testing the null hypothesis of no outliers. The test based on the ratio of two scale estimates is proposed. We show the asymptotic distribution of the test statistic by Monte Carlo simulation and investigate its properties. Next we consider the problem of identifying the outliers. A forward sequential procedure based on the suggested test is proposed and shown to perform fairly well. The forward sequential procedure is unaffected by masking and swamping effects because the test statistic is based on robust estimate.

  • PDF