• Title/Summary/Keyword: masking and swamping effects

Search Result 14, Processing Time 0.018 seconds

Multiple Deletions in Logistic Regression Models

  • Jung, Kang-Mo
    • Communications for Statistical Applications and Methods
    • /
    • v.16 no.2
    • /
    • pp.309-315
    • /
    • 2009
  • We extended the results of Roy and Guria (2008) to multiple deletions in logistic regression models. Since single deletions may not exactly detect outliers or influential observations due to swamping effects and masking effects, it needs multiple deletions. We developed conditional deletion diagnostics which are designed to overcome problems of masking effects. We derived the closed forms for several statistics in logistic regression models. They give useful diagnostics on the statistics.

Identifying Multiple Leverage Points ad Outliers in Multivariate Linear Models

  • Yoo, Jong-Young
    • Communications for Statistical Applications and Methods
    • /
    • v.7 no.3
    • /
    • pp.667-676
    • /
    • 2000
  • This paper focuses on the problem of detecting multiple leverage points and outliers in multivariate linear models. It is well known that he identification of these points is affected by masking and swamping effects. To identify them, Rousseeuw(1985) used robust estimators of MVE(Minimum Volume Ellipsoids), which have the breakdown point of 50% approximately. And Rousseeuw and van Zomeren(1990) suggested the robust distance based on MVE, however, of which the computation is extremely difficult when the number of observations n is large. In this study, e propose a new algorithm to reduce the computational difficulty of MVE. The proposed method is powerful in identifying multiple leverage points and outlies and also effective in reducing the computational difficulty of MVE.

  • PDF

Identification of Regression Outliers Based on Clustering of LMS-residual Plots

  • Kim, Bu-Yong;Oh, Mi-Hyun
    • Communications for Statistical Applications and Methods
    • /
    • v.11 no.3
    • /
    • pp.485-494
    • /
    • 2004
  • An algorithm is proposed to identify multiple outliers in linear regression. It is based on the clustering of residuals from the least median of squares estimation. A cut-height criterion for the hierarchical cluster tree is suggested, which yields the optimal clustering of the regression outliers. Comparisons of the effectiveness of the procedures are performed on the basis of the classic data and artificial data sets, and it is shown that the proposed algorithm is superior to the one that is based on the least squares estimation. In particular, the algorithm deals very well with the masking and swamping effects while the other does not.

Outlier tests on potential outliers (잠재적 이상치군에 대한 검정)

  • Seo, Han Son
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.1
    • /
    • pp.159-167
    • /
    • 2017
  • Observations identified as potential outliers are usually tested for real outliers; however, some outlier detection methods skip a formal test or perform a test using simulated p-values. We introduce test procedures for outliers by testing subsets of potential outliers rather than by testing individual observations of potential outliers to avoid masking or swamping effects. Examples to illustrate methods and a Monte Carlo study to compare the power of the various methods are presented.

Influential Points in GLMs via Backwards Stepping

  • Jeong, Kwang-Mo;Oh, Hae-Young
    • Communications for Statistical Applications and Methods
    • /
    • v.9 no.1
    • /
    • pp.197-212
    • /
    • 2002
  • When assessing goodness-of-fit of a model, a small subset of deviating observations can give rise to a significant lack of fit. It is therefore important to identify such observations and to assess their effects on various aspects of analysis. A Cook's distance measure is usually used to detect influential observation. But it sometimes is not fully effective in identifying truly influential set of observations because there may exist masking or swamping effects. In this paper we confine our attention to influential subset In GLMs such as logistic regression models and loglinear models. We modify a backwards stepping algorithm, which was originally suggested for detecting outlying cells in contingency tables, to detect influential observations in GLMs. The algorithm consists of two steps, the identification step and the testing step. In identification step we Identify influential observations based on influencial measures such as Cook's distances. On the other hand in testing step we test the subset of identified observations to be significant or not Finally we explain the proposed method through two types of dataset related to logistic regression model and loglinear model, respectively.

Unmasking Multiple Outliers in Multivariate Data

  • Yoo Jong-Young
    • Communications for Statistical Applications and Methods
    • /
    • v.13 no.1
    • /
    • pp.29-38
    • /
    • 2006
  • We proposed a procedure for detecting of multiple outliers in multivariate data. Rousseeuw and van Zomeren (1990) have suggested the robust distance $RD_i$ by using the Resampling Algorithm. But $RD_i$ are based on the assumption that X is in the general position.(X is said to be in the general position when every subsample of size p+1 has rank p) From the practical points of view, this is clearly unrealistic. In this paper, we proposed a computing method for approximating MVE, which is not subject to these problems. The procedure is easy to compute, and works well even if subsample is singular or nearly singular matrix.

The Detection and Testing of Multiple Outliers in Linear Regression

  • Park, Jin-Pyo;Zamar, Ruben H.
    • Journal of the Korean Data and Information Science Society
    • /
    • v.15 no.4
    • /
    • pp.921-934
    • /
    • 2004
  • We consider the problem of identifying and testing outliers in linear regression. First, we consider the scale-ratio tests for testing the null hypothesis of no outliers. A test based on the ratio of two residual scale estimates is proposed. We show the asymptotic distribution of test statistics and investigate the properties of the test. Next we consider the problem of identifying the outliers. A forward procedure based on the suggested test is proposed and shown to perform fairly well. The forward procedure is unaffected by masking and swamping effects because the test statistics used a robust scale estimate.

  • PDF

The Identification Of Multiple Outliers

  • Park, Jin-Pyo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.11 no.2
    • /
    • pp.201-215
    • /
    • 2000
  • The classical method for regression analysis is the least squares method. However, if the data contain significant outliers, the least squares estimator can be broken down by outliers. To remedy this problem, the robust methods are important complement to the least squares method. Robust methods down weighs or completely ignore the outliers. This is not always best because the outliers can contain some very important information about the population. If they can be detected, the outliers can be further inspected and appropriate action can be taken based on the results. In this paper, I propose a sequential outlier test to identify outliers. It is based on the nonrobust estimate and the robust estimate of scatter of a robust regression residuals and is applied in forward procedure, removing the most extreme data at each step, until the test fails to detect outliers. Unlike other forward procedures, the present one is unaffected by swamping or masking effects because the statistics is based on the robust regression residuals. I show the asymptotic distribution of the test statistics and apply the test to several real data and simulated data for the test to be shown to perform fairly well.

  • PDF

V-mask Type Criterion for Identification of Outliers In Logistic Regression

  • Kim Bu-Yong
    • Communications for Statistical Applications and Methods
    • /
    • v.12 no.3
    • /
    • pp.625-634
    • /
    • 2005
  • A procedure is proposed to identify multiple outliers in the logistic regression. It detects the leverage points by means of hierarchical clustering of the robust distances based on the minimum covariance determinant estimator, and then it employs a V-mask type criterion on the scatter plot of robust residuals against robust distances to classify the observations into vertical outliers, bad leverage points, good leverage points, and regular points. Effectiveness of the proposed procedure is evaluated on the basis of the classic and artificial data sets, and it is shown that the procedure deals very well with the masking and swamping effects.

The Sequential Testing of Multiple Outliers in Linear Regression

  • Park, Jinpyo;Park, Heechang
    • Communications for Statistical Applications and Methods
    • /
    • v.8 no.2
    • /
    • pp.337-346
    • /
    • 2001
  • In this paper we consider the problem of identifying and testing the outliers in linear regression. first we consider the problem for testing the null hypothesis of no outliers. The test based on the ratio of two scale estimates is proposed. We show the asymptotic distribution of the test statistic by Monte Carlo simulation and investigate its properties. Next we consider the problem of identifying the outliers. A forward sequential procedure based on the suggested test is proposed and shown to perform fairly well. The forward sequential procedure is unaffected by masking and swamping effects because the test statistic is based on robust estimate.

  • PDF