DOI QR코드

DOI QR Code

A sequential outlier detecting method using a clustering algorithm

군집 알고리즘을 이용한 순차적 이상치 탐지법

  • Seo, Han Son (Department of Applied Statistics, Konkuk University) ;
  • Yoon, Min (Department of Statistics, Pukyong National University)
  • 서한손 (건국대학교 응용통계학과) ;
  • 윤민 (부경대학교 통계학과)
  • Received : 2016.03.04
  • Accepted : 2016.04.16
  • Published : 2016.06.30

Abstract

Outlier detection methods without performing a test often do not succeed in detecting multiple outliers because they are structurally vulnerable to a masking effect or a swamping effect. This paper considers testing procedures supplemented to a clustering-based method of identifying the group with a minority of the observations as outliers. One of general steps is performing a variety of t-test on individual outlier-candidates. This paper proposes a sequential procedure for searching for outliers by changing cutoff values on a cluster tree and performing a test on a set of outlier-candidates. The proposed method is illustrated and compared to existing methods by an example and Monte Carlo studies.

검정절차가 생략된 이상치 탐지법은 구조적으로 수렁효과나 가면효과에 취약하기 때문에 다수의 이상치를 제대로 탐지하지 못할 때가 있다. 본 연구에서는 군집화에 의하여 구분된 소수 관찰치군을 이상치로 판정하는 방법에 보완될 검정절차를 다룬다. 이에 관련된 일반적인 방법은 탐지된 이상치 후보군의 개별적인 관찰치에 대해 다양한 종류의 t-검정을 수행하는 것이다. 본 연구에서는 이상치 후보군에 대한 검정을 수행하고 군집나무의 절단기준을 변경시켜 새로운 이상치군을 탐색해 나가는 순차적인 방법을 제안한다. 예제와 모의실험을 통해 제시된 방법과 기존의 방법들을 비교한다.

Keywords

References

  1. Hadi, A. S. and Simonoff, J. S. (1993). Procedures for the identification of multiple outliers in linear models, Journal of the American Statistical Association, 88, 1264-1272. https://doi.org/10.1080/01621459.1993.10476407
  2. Kianifard, F. and Swallow, W. H. (1989). Using recursive residuals, calculated on adaptive-ordered observations, to identify outliers in linear regression, Biometrics, 45, 571-585. https://doi.org/10.2307/2531498
  3. Kianifard, F. and Swallow, W. H. (1996). A review of the development and application of recursive residuals in linear models, Journal of the American Statistical Association, 91, 391-400. https://doi.org/10.1080/01621459.1996.10476700
  4. Kim, S. S. and Krzanowski, W. J. (2007). Detecting multiple outliers in linear regression using a cluster method combined with graphical visualization, Computational Statistics, 22, 109-119. https://doi.org/10.1007/s00180-007-0026-3
  5. Mojena, R. (1977). Hierarchical grouping methods and stopping rules: an evaluation, The Computer Journal, 20, 359-363. https://doi.org/10.1093/comjnl/20.4.359
  6. Pena, D. and Yohai, V. J. (1995). The detection of influential subsets in linear regression by using an influence matrix, Journal of the Royal Statistical Society, Series B, 57, 145-156.
  7. Rousseeuw, P. J. and Leroy, A. M. (1987). Robust Regression and Outlier Detection, John Wiley, New York.
  8. Sebert, D. M., Montgomery, D. C., and Rollier, D. (1998). A clustering algorithm for identifying multiple outliers in linear regression, Computational Statistics and Data Analysis, 27, 461-484. https://doi.org/10.1016/S0167-9473(98)00021-8
  9. Seo, H. S. and Yoon, M. (2014). A test on a specific set of outlier candidates in a linear model, The Korean Journal of Applied Statistics, 27, 307-315. https://doi.org/10.5351/KJAS.2014.27.2.307

Cited by

  1. An on-line detection method for outliers of dynamic unstable measurement data pp.1573-7543, 2017, https://doi.org/10.1007/s10586-017-1458-3