• Title/Summary/Keyword: swamping

Search Result 23, Processing Time 0.018 seconds

Multiple Deletions in Logistic Regression Models

  • Jung, Kang-Mo
    • Communications for Statistical Applications and Methods
    • /
    • v.16 no.2
    • /
    • pp.309-315
    • /
    • 2009
  • We extended the results of Roy and Guria (2008) to multiple deletions in logistic regression models. Since single deletions may not exactly detect outliers or influential observations due to swamping effects and masking effects, it needs multiple deletions. We developed conditional deletion diagnostics which are designed to overcome problems of masking effects. We derived the closed forms for several statistics in logistic regression models. They give useful diagnostics on the statistics.

A Confirmation of Identified Multiple Outliers and Leverage Points in Linear Model (다중 선형 모형에서 식별된 다중 이상점과 다중 지렛점의 재확인 방법에 대한 연구)

  • 유종영;안기수
    • The Korean Journal of Applied Statistics
    • /
    • v.15 no.2
    • /
    • pp.269-279
    • /
    • 2002
  • We considered the problem for confirmation of multiple outliers and leverage points. Identification of multiple outliers and leverage points is difficult because of the masking effect and swamping effect. Rousseeuw and van Zomeren(1990) identified multiple outliers and leverage points by using the Least Median of Squares and Minimum Value of Ellipsoids which are high-breakdown robust estimators. But their methods tend to declare too many observations as extremes. Atkinson(1987) suggested a method for confirming of outliers and Fung(1993) pointed out Atkinson method's limitation and proposed another method by using the add-back model. But we analyzed that Fung's method is affected by adjacent effect. In this thesis, we proposed one procedure for confirmation of outliers and leverage points and compared three example with Fung's method.

Identifying Multiple Leverage Points ad Outliers in Multivariate Linear Models

  • Yoo, Jong-Young
    • Communications for Statistical Applications and Methods
    • /
    • v.7 no.3
    • /
    • pp.667-676
    • /
    • 2000
  • This paper focuses on the problem of detecting multiple leverage points and outliers in multivariate linear models. It is well known that he identification of these points is affected by masking and swamping effects. To identify them, Rousseeuw(1985) used robust estimators of MVE(Minimum Volume Ellipsoids), which have the breakdown point of 50% approximately. And Rousseeuw and van Zomeren(1990) suggested the robust distance based on MVE, however, of which the computation is extremely difficult when the number of observations n is large. In this study, e propose a new algorithm to reduce the computational difficulty of MVE. The proposed method is powerful in identifying multiple leverage points and outlies and also effective in reducing the computational difficulty of MVE.

  • PDF

MULTIPLE OUTLIER DETECTION IN LOGISTIC REGRESSION BY USING INFLUENCE MATRIX

  • Lee, Gwi-Hyun;Park, Sung-Hyun
    • Journal of the Korean Statistical Society
    • /
    • v.36 no.4
    • /
    • pp.457-469
    • /
    • 2007
  • Many procedures are available to identify a single outlier or an isolated influential point in linear regression and logistic regression. But the detection of influential points or multiple outliers is more difficult, owing to masking and swamping problems. The multiple outlier detection methods for logistic regression have not been studied from the points of direct procedure yet. In this paper we consider the direct methods for logistic regression by extending the $Pe\tilde{n}a$ and Yohai (1995) influence matrix algorithm. We define the influence matrix in logistic regression by using Cook's distance in logistic regression, and test multiple outliers by using the mean shift model. To show accuracy of the proposed multiple outlier detection algorithm, we simulate artificial data including multiple outliers with masking and swamping.

Identification of Regression Outliers Based on Clustering of LMS-residual Plots

  • Kim, Bu-Yong;Oh, Mi-Hyun
    • Communications for Statistical Applications and Methods
    • /
    • v.11 no.3
    • /
    • pp.485-494
    • /
    • 2004
  • An algorithm is proposed to identify multiple outliers in linear regression. It is based on the clustering of residuals from the least median of squares estimation. A cut-height criterion for the hierarchical cluster tree is suggested, which yields the optimal clustering of the regression outliers. Comparisons of the effectiveness of the procedures are performed on the basis of the classic data and artificial data sets, and it is shown that the proposed algorithm is superior to the one that is based on the least squares estimation. In particular, the algorithm deals very well with the masking and swamping effects while the other does not.

Outlier tests on potential outliers (잠재적 이상치군에 대한 검정)

  • Seo, Han Son
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.1
    • /
    • pp.159-167
    • /
    • 2017
  • Observations identified as potential outliers are usually tested for real outliers; however, some outlier detection methods skip a formal test or perform a test using simulated p-values. We introduce test procedures for outliers by testing subsets of potential outliers rather than by testing individual observations of potential outliers to avoid masking or swamping effects. Examples to illustrate methods and a Monte Carlo study to compare the power of the various methods are presented.

Some Diagnostic Results in Discriminant Analysis

  • Bae, Whasoo;Hwang, Soonyoung
    • Journal of the Korean Statistical Society
    • /
    • v.30 no.1
    • /
    • pp.139-151
    • /
    • 2001
  • Although lots of works are done in influence diagnostics, results in the multivariate analysis are quite rare. One of recent works done by Fung(1995) is about the single case influence diagnostics in the linear discriminant analysis. In this paper we extend Fung's results to the multiple cases diagnostics which are necessary in the linear discriminant analysis for two reasons among others; First, the masking effect cannot be detected by single case diagnostics and secondly two populations are concerned in the discriminant analysis, i.e., influential cases can occur in one or both populations.

  • PDF

V-mask Type Criterion for Identification of Outliers In Logistic Regression

  • Kim Bu-Yong
    • Communications for Statistical Applications and Methods
    • /
    • v.12 no.3
    • /
    • pp.625-634
    • /
    • 2005
  • A procedure is proposed to identify multiple outliers in the logistic regression. It detects the leverage points by means of hierarchical clustering of the robust distances based on the minimum covariance determinant estimator, and then it employs a V-mask type criterion on the scatter plot of robust residuals against robust distances to classify the observations into vertical outliers, bad leverage points, good leverage points, and regular points. Effectiveness of the proposed procedure is evaluated on the basis of the classic and artificial data sets, and it is shown that the procedure deals very well with the masking and swamping effects.

The Sequential Testing of Multiple Outliers in Linear Regression

  • Park, Jinpyo;Park, Heechang
    • Communications for Statistical Applications and Methods
    • /
    • v.8 no.2
    • /
    • pp.337-346
    • /
    • 2001
  • In this paper we consider the problem of identifying and testing the outliers in linear regression. first we consider the problem for testing the null hypothesis of no outliers. The test based on the ratio of two scale estimates is proposed. We show the asymptotic distribution of the test statistic by Monte Carlo simulation and investigate its properties. Next we consider the problem of identifying the outliers. A forward sequential procedure based on the suggested test is proposed and shown to perform fairly well. The forward sequential procedure is unaffected by masking and swamping effects because the test statistic is based on robust estimate.

  • PDF

The Scale Ratio Testing of Multiple Outliers in Linear Regression

  • Park, Jin-Pyo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.14 no.3
    • /
    • pp.673-685
    • /
    • 2003
  • In this paper we consider the problem of identifying and testing outliers in linear regression. First we consider the problem for testing the null hypothesis of no outliers. A test based on the ratio of two residual scale estimates is proposed. We show the asymptotic distribution of the test statistics by Monte Carlo simulation and investigate its properties. Next we consider the problem of identifying the outliers. A forward sequential procedure using the suggested test is proposed and shown to perform fairly well. Unlike other forward procedures, the present one is unaffected by masking and swamping effects because the test statistic is based on robust scale estimate.

  • PDF