DOI QR코드

DOI QR Code

Impact of Outliers on the Statistical Measures of the Environmental Monitoring Data in Busan Coastal Sea

이상자료가 연안 환경자료의 통계 척도에 미치는 영향

  • Cho, Hong-Yeon (Division of the Coastal and Environmental Engineering Research, KIOST) ;
  • Lee, Ki-Seop (Division of Earth Environmental System, College of Natural Sciences Pusan National University) ;
  • Ahn, Soon-Mo (Division of Earth Environmental System, College of Natural Sciences Pusan National University)
  • 조홍연 (한국해양과학기술원 연안공학연구본부) ;
  • 이기섭 (부산대학교 자연과학대학 해양학과) ;
  • 안순모 (부산대학교 자연과학대학 해양학과)
  • Received : 2016.01.18
  • Accepted : 2016.04.12
  • Published : 2016.06.30

Abstract

The statistical measures of the coastal environmental data are used in a variety of statistical inferences, hypothesis tests, and data-driven modeling. If the measures are biased, then the statistical estimations and models may also be biased and this potential for bias is great when data contain some outliers defined as extraordinary large or small data values. This study aims to suggest more robust statistical measures as alternatives to more commonly used measures and to assess the performance these robust measures through a quantitative evaluation of more typical measures, such as in terms of locations, spreads, and shapes, with regard to environmental monitoring data in the Busan coastal sea. The detection of outliers within the data was carried out on the basis of Rosner's test. About 5-10% of the nutrient data were found to contain outliers based on Rosner's test. After removal (zero-weighting) of the outliers in the data sets, the relative change ratios of the mean and standard deviation between before and after outlier-removal conditions revealed the figures 13 and 33%, respectively. The variation magnitudes of skewness and kurtosis are 1.36 and 8.11 in a decreasing trend, respectively. On the other hand, the change ratios for more robust measures regarding the mean and standard deviation are 3.7-10.5%, and the variation magnitudes of robust skewness and kurtosis are about only 2-4% of the magnitude of the non-robust measures. The robust measures can be regarded as outlier-resistant statistical measures based on the relatively small changes in the scenarios before and after outlier removal conditions.

Keywords

References

  1. 국가해양환경정보통합시스템 (2016) 해양환경측정망 원본자료. http://www.meis.go.kr/ Accessed 18 Jan 2016 (Marine Environment Information System (2016) Raw data - marine environmental monitoring network. http://www.meis.go.kr/ Accessed 18 Jan 2016)
  2. 해양환경관리공단 (2015) 국가해양환경측정망 자료 - 부산연안. http://www.koem.or.kr/ Accessed 18 Jan 2016 (Korea Marine Environment Management Corporation (2016) http://www.koem.or.kr/ National marine environmental monitoring network data - Busan coastal sea Accessed 18 Jan 2016)
  3. Barnett V, Lewis T (1994) Outliers in statistical data. John Wiley Sons, 584 p
  4. Bonato M (2011) Robust estimation of skewness and kurtosis in distributions with infinite higher moments. Financ Res Lett 8:77-87 https://doi.org/10.1016/j.frl.2010.12.001
  5. Brys G, Hubert M, Struyf A (2004) A robust measure of skewness. J Comput Graph Stat 13(4):996-1017 https://doi.org/10.1198/106186004X12632
  6. Dixon WJ (1951) Ratios involving extreme values. Ann Math Stat 22(1):68-78 https://doi.org/10.1214/aoms/1177729693
  7. Erceg-Hurn DM, Mirosevich VM (2008) Modern robust statistical methods: an easy way to maximize the accuracy and power of your research. Am Psychol 63(7):591-601 https://doi.org/10.1037/0003-066X.63.7.591
  8. Grubbs FE (1950) Sample criteria for testing outlying observations. Ann Math Stat 21(1):27-58 https://doi.org/10.1214/aoms/1177729885
  9. Huber PJ, Ronchetti EM (2009) Robust statistics. John Wiley & Sons, New York, 380 p
  10. Kim T-H, White H (2004) On more robust estimation of skewness and kurtosis Financ. Res Lett 1:56-73
  11. Martinez WL, Martinez AR (2005) Exploratory data analysis with MATLAB. Chapman & Hall/CRC, Boca Raton, 405 p
  12. Millard SP (2013) EnvStats: an R package for environmental statistics. Springer, New York, 291 p
  13. Moors JJA (1988) A quantile alternative for kurtosis. J Roy Stat Soc D-Sta 37(1):25-32
  14. R Core Team (2015) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://cloud.r-project.org/index.html Accessed 18 Jan 2016
  15. Rosner B (1983) Percentage points for a generalized ESD many-outlier procedure. Technometrics 25(2):165-172 https://doi.org/10.1080/00401706.1983.10487848
  16. Rousseeuw PJ, Croux C (1993) Alternatives to the median absolute deviation. J Am Stat Assoc 88(424):1273-1283 https://doi.org/10.1080/01621459.1993.10476408
  17. Rousseeuw PJ, LeRoy AM (2003) Robust regression and outlier detection. John Wiley & Sons, New Jersey, 329 p

Cited by

  1. Distribution and Trend Analysis of the Significant Wave Heights Using KMA and ECMWF Data Sets in the Coastal Seas, Korea vol.29, pp.3, 2017, https://doi.org/10.9765/KSCOE.2017.29.3.129