• Title/Summary/Keyword: multiple outliers

Search Result 79, Processing Time 0.021 seconds

Simultaneous Identification of Multiple Outliers and High Leverage Points in Linear Regression

  • Rahmatullah Imon, A.H.M.;Ali, M. Masoom
    • Journal of the Korean Data and Information Science Society
    • /
    • v.16 no.2
    • /
    • pp.429-444
    • /
    • 2005
  • The identification of unusual observations such as outliers and high leverage points has drawn a great deal of attention for many years. Most of these identifications techniques are based on case deletion that focuses more on the outliers than the high leverage points. But residuals together with leverage values may cause masking and swamping for which a good number of unusual observations remain undetected in the presence of multiple outliers and multiple high leverage points. In this paper we propose a new procedure to identify outliers and high leverage points simultaneously. We suggest an additive form of the residuals and the leverages that gives almost an equal focus on outliers and leverages. We analyzed several well-referred data set and discover few outliers and high leverage points that were undetected by the existing diagnostic techniques.

  • PDF

유전자 알고리듬을 이용한 다중이상치 탐색

  • Go Yeong-Hyeon;Lee Hye-Seon;Jeon Chi-Hyeok
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2000.11a
    • /
    • pp.173-179
    • /
    • 2000
  • Genetic algorithm(GA) is applied for detecting multiple outliers. GA is a heuristic optimization tool solving for near optimal solution. We compare the performance of GA and the other diagnostic measures commonly used for detecting outliers in regression model. The results show that GA seems to have better performance than the others for the detection of multiple outliers.

  • PDF

Computational Methods for Detection of Multiple Outliers in Nonlinear Regression

  • Myung-Wook Kahng
    • Communications for Statistical Applications and Methods
    • /
    • v.3 no.2
    • /
    • pp.1-11
    • /
    • 1996
  • The detection of multiple outliers in nonlinear regression models can be computationally not feasible. As a compromise approach, we consider the use of simulated annealing algorithm, an approximate approach to combinatorial optimization. We show that this method ensures convergence and works well in locating multiple outliers while reducing computational time.

  • PDF

A Confirmation of Identified Multiple Outliers and Leverage Points in Linear Model (다중 선형 모형에서 식별된 다중 이상점과 다중 지렛점의 재확인 방법에 대한 연구)

  • 유종영;안기수
    • The Korean Journal of Applied Statistics
    • /
    • v.15 no.2
    • /
    • pp.269-279
    • /
    • 2002
  • We considered the problem for confirmation of multiple outliers and leverage points. Identification of multiple outliers and leverage points is difficult because of the masking effect and swamping effect. Rousseeuw and van Zomeren(1990) identified multiple outliers and leverage points by using the Least Median of Squares and Minimum Value of Ellipsoids which are high-breakdown robust estimators. But their methods tend to declare too many observations as extremes. Atkinson(1987) suggested a method for confirming of outliers and Fung(1993) pointed out Atkinson method's limitation and proposed another method by using the add-back model. But we analyzed that Fung's method is affected by adjacent effect. In this thesis, we proposed one procedure for confirmation of outliers and leverage points and compared three example with Fung's method.

MULTIPLE OUTLIER DETECTION IN LOGISTIC REGRESSION BY USING INFLUENCE MATRIX

  • Lee, Gwi-Hyun;Park, Sung-Hyun
    • Journal of the Korean Statistical Society
    • /
    • v.36 no.4
    • /
    • pp.457-469
    • /
    • 2007
  • Many procedures are available to identify a single outlier or an isolated influential point in linear regression and logistic regression. But the detection of influential points or multiple outliers is more difficult, owing to masking and swamping problems. The multiple outlier detection methods for logistic regression have not been studied from the points of direct procedure yet. In this paper we consider the direct methods for logistic regression by extending the $Pe\tilde{n}a$ and Yohai (1995) influence matrix algorithm. We define the influence matrix in logistic regression by using Cook's distance in logistic regression, and test multiple outliers by using the mean shift model. To show accuracy of the proposed multiple outlier detection algorithm, we simulate artificial data including multiple outliers with masking and swamping.

Detecting Multiple Outliers Using the Gaps of Order Statistics

  • Kim, Hyun Chul
    • Communications for Statistical Applications and Methods
    • /
    • v.2 no.2
    • /
    • pp.184-197
    • /
    • 1995
  • An objective and one-step detection procedure of multiple outliers is suggested by using the gaps of the order statistics. The detection procedure can be used as a routine outlier detection method of a statistical analysis computer program. The procedure is applied to some examples including the data selected by Kitagawa.

  • PDF

The Identification Of Multiple Outliers

  • Park, Jin-Pyo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.11 no.2
    • /
    • pp.201-215
    • /
    • 2000
  • The classical method for regression analysis is the least squares method. However, if the data contain significant outliers, the least squares estimator can be broken down by outliers. To remedy this problem, the robust methods are important complement to the least squares method. Robust methods down weighs or completely ignore the outliers. This is not always best because the outliers can contain some very important information about the population. If they can be detected, the outliers can be further inspected and appropriate action can be taken based on the results. In this paper, I propose a sequential outlier test to identify outliers. It is based on the nonrobust estimate and the robust estimate of scatter of a robust regression residuals and is applied in forward procedure, removing the most extreme data at each step, until the test fails to detect outliers. Unlike other forward procedures, the present one is unaffected by swamping or masking effects because the statistics is based on the robust regression residuals. I show the asymptotic distribution of the test statistics and apply the test to several real data and simulated data for the test to be shown to perform fairly well.

  • PDF

Procedures for Detecting Multiple Outliers in Linear Regression Using R

  • Kwon, Soon-Sun;Lee, Gwi-Hyun;Park, Sung-Hyun
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2005.11a
    • /
    • pp.13-17
    • /
    • 2005
  • In recent years, many people use R as a statistics system. R is frequently updated by many R project teams. We are interested in the method of multiple outlier detection and know that R is not supplied the method of multiple outlier detection. In this talk, we review these procedures for detecting multiple outliers and provide more efficient procedures combined with direct methods and indirect methods using R.

  • PDF

Multiple Deletions in Logistic Regression Models

  • Jung, Kang-Mo
    • Communications for Statistical Applications and Methods
    • /
    • v.16 no.2
    • /
    • pp.309-315
    • /
    • 2009
  • We extended the results of Roy and Guria (2008) to multiple deletions in logistic regression models. Since single deletions may not exactly detect outliers or influential observations due to swamping effects and masking effects, it needs multiple deletions. We developed conditional deletion diagnostics which are designed to overcome problems of masking effects. We derived the closed forms for several statistics in logistic regression models. They give useful diagnostics on the statistics.

DETECTION OF OUTLIERS IN WEIGHTED LEAST SQUARES REGRESSION

  • Shon, Bang-Yong;Kim, Guk-Boh
    • Journal of applied mathematics & informatics
    • /
    • v.4 no.2
    • /
    • pp.501-512
    • /
    • 1997
  • In multiple linear regression model we have presupposed assumptions (independence normality variance homogeneity and so on) on error term. When case weights are given because of variance heterogeneity we can estimate efficiently regression parameter using weighted least squares estimator. Unfortunately this estimator is sen-sitive to outliers like ordinary least squares estimator. Thus in this paper we proposed some statistics for detection of outliers in weighted least squares regression.