DOI QR코드

DOI QR Code

Normality Test of the Water Quality Monitoring Data in Harbour

항만 환경자료의 정규분포 적합 검정

  • Cho, Hong-Yeon (Marine Big-data Center, Korea Institute of Ocean Science and Technology, University of Science anf Technology)
  • 조홍연 (한국해양과학기술원 해양빅데이터센터, 과학기술연합대학원대학 KIOST SCHOOL)
  • Received : 2021.03.02
  • Accepted : 2021.03.24
  • Published : 2021.04.30

Abstract

Normality test (hereafter NT) is a highly recommended test for statistical estimation because the normality assumption on the data is the basic and essential. NT was carried using the KOEM water quality monitoring data in harbor which are composed of total 3,000 data sets (50 stations, 30 water quality parameters including surface and bottom layers, and two seasons, such as summer and winter). The comparative analysis of the normality are carried out using total 18 methods supported by the R program packages. In addition, the Shapiro-Wilk test method is selected as the references method in this study for the analysis on the data transformation and outliers's effects in detail. The numbers of normality assumption rejection (NAR) are estimated and compared to these cases, before and after applications of the Box-Cox transformation and Rosner's outlier test. The NAR numbers are reduced from 24-28 to 3-4 in the "before and after" BC transformation cases with the no outlier-exclusion condition. On the contrary, the NAR numbers are rapidly diminished from 6-9 to below one in the same case with the outlier exclusion condition. Thus, the Box-Cox transformation based on the outlier test of the coastal water quality monitoring data that are not comes form the normal distribution, is highly recommended for the suitable statistical estimation and inferences.

환경자료를 이용한 다양한 통계적인 추정에서 요구되는 정규분포 가정 만족 정도를 파악하기 위하여, KOEM(해양환경공단, Korea Marine Environment Management Corporation) 항만환경 모니터링 자료 3,000세트(관측 정점 50, 표층-저층 구분 수질항목 30, 동계-하계 2)를 대상으로 18가지의 방법으로 정규분포 적합도 검정을 수행하고, 각각의 검정방법에 대한 비교 및 평가를 수행하였다. 추가적으로 자료변환 및 이상자료 영향 평가를 위하여 Shapiro-Wilk 방법을 기준 검정방법으로 선택하였다. 선정된 검정 방법을 이용하여 대표적인 정규분포 변환 방법인 Box-Cox 변환 전·후의 정규분포 적합 기각 정도와 Rosner 이상자료 진단방법을 이용한 정규분포 적합 기각 정도를 추정 및 분석하였다. Box-Cox 변환 전·후 정규분포 기각비율은 하나의 수질항목을 기준으로 24-28개 정점에서 3-4개 정도의 정점으로 크게 감소하였으며, 이상자료로 진단된 자료를 제외한 경우에는 Box-Cox 변환 전·후의 기각개수는 6-9개 정도에서 1개 정도로 감소하였다. 따라서 정규분포를 따르지 않는 연안 환경자료를 이용하여 통계적인 추정을 수행하는 경우에는 이상자료 검정 방법과 Box-Cox 변환을 모두 적용할 필요가 있다.

Keywords

Acknowledgement

본 연구에서 사용한 항만 환경자료를 제공해주신 해양수산부, 해양환경공단(KOEM)에 감사 드립니다.

References

  1. Barnett, V. and Lewis, T. (1994). Outliers in statistical data, John Wiley & Sons.
  2. Cho, H.Y., Lee, K.S. and Ahn, S.M. (2016). Impact of outliers on the statistical measures of the environmental monitoring data in Busan coastal sea, Note. Ocean and Polar Research, 38(2), 149-159. https://doi.org/10.4217/OPR.2016.38.2.149
  3. D'Agostino, R.B. and Stephens, M.A. (1986). Goodness-of-Fit Techniques, Marcel Dekker.
  4. Filliben, J.J. (1975). The probability plot correlation coefficient test for normality. Technometrics, 17, 111-117. https://doi.org/10.1080/00401706.1975.10489279
  5. Frosini, B.V. (1987). On the distribution and power of a goodnessof-fit statistic with parametric and nonparametric applications. "Goodness-of-fit" (edited by Revesz P., Sarkadi K., Sen P.K.). 133-154.
  6. Gavrilov, I. and Pusev, R. (2014). normtest: Tests for Normality. R package version 1.1. https://CRAN.R-project.org/package=normtest.
  7. Geary, R.C. (1935). The ratio of the mean deviation to the standard deviation as a test of normality. Biometrika, 27, 310-332. https://doi.org/10.1093/biomet/27.3-4.310
  8. Gross, J. and Ligges, U. (2015). nortest: Tests for Normality. R package version 1.0-4. https://CRAN.R-project.org/package=nortest.
  9. Hegazy, Y.A.S. and Green, J.R. (1975). Some new goodness-of-fit tests using order statistics. Journal of the Royal Statistical Society. Series C (Applied Statistics), 24, 299-308.
  10. Jarque, C.M. and Bera, A.K. (1987). A test for normality of observations and regression residuals. International Statistical Review, 55, 163-172. https://doi.org/10.2307/1403192
  11. Little, R.J.A. and Rubin, D.B. (2002). Statistical Analysis with Missing Data, Second Edition, John Wiley & Sons.
  12. Looney, S.W. and Gulledge, T.R. (1985). Use of correlation coefficient with normal probability plots. The American Statistician, 39, 75-79. https://doi.org/10.2307/2683917
  13. Millard, S.P. (2013). EnvStats: An R Package for Environmental Statistics. Springer, New York. ISBN 978-1-4614-8455-4, https://www.springer.com.
  14. Ministry of Oceans and Fisheris (2012). Marine Environment Information (System) Portal (2021). https://www.meis.go.kr [accessed 2021.02.26.].
  15. Pohlert, T. (2020). ppcc: Probability Plot Correlation Coefficient Test. R package version 1.2. https://CRAN.R-project.org/package=ppcc.
  16. R Core Team (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.
  17. Razali, N.M. and Wah, Y.B. (2011). Power comparisons of shapirowilk, kolmogrov-smirnov, lilliefors and anderson-darling tests. Journal of Statistical Modeling and Analytics, 2(1), 21-33.
  18. Royston, P. (1995). Remark AS R94: A remark on Algorithm AS 181: The W test for normality. Applied Statistics, 44, 547-551. doi:10.2307/2986146.
  19. Royston, P. (1993). A pocket-calculator algorithm for the Shapiro-Francia test for non-normality: an application to medicine. Statistics in Medicine, 12, 181-184. https://doi.org/10.1002/sim.4780120209
  20. Shapiro, S.S., Wilk, M.B. and Chen, H.J. (1968). A comparative study of various tests for normality. Journal of the American Statistical Association, 63, 1343-1372. https://doi.org/10.1080/01621459.1968.10480932
  21. Spiegelhalter, D.J. (1977). A test for normality against symmetric alternatives. Biometrika, 64, 415-418. https://doi.org/10.1093/biomet/64.2.415
  22. Stephens, M.A. (1986). Tests based on EDF statistics. Goodnessof-Fit Techniques. (edited by D'Agostino, R.B. and Stephens, M.A.). Marcel Dekker, New York.
  23. Thode, Jr., H.C. (2002). Testing for Normality. Marcel Dekker, New York.
  24. Urzua, C.M. (1996). On the correct use of omnibus tests for normality. Economics Letters, 53, 247-251. https://doi.org/10.1016/S0165-1765(96)00923-8
  25. Weisberg, S. and Bingham, C. (1975). An approximate analysis of variance test for non-normality suitable for machine calculation. Technometrics, 17, 133-134. https://doi.org/10.1080/00401706.1975.10489283