• Title/Summary/Keyword: outlier test

Search Result 13, Processing Time 0.069 seconds

Asymptotic Properties of Outlier Tests in Nonlinear Regression

  • Kahng, Myung-Wook
    • Journal of the Korean Data and Information Science Society
    • /
    • v.17 no.1
    • /
    • pp.205-211
    • /
    • 2006
  • For a linear regression model, the necessary and sufficient condition for the asymptotic consistency of the outlier test statistic is known. An analogous condition for the nonlinear regression model is considered in this paper.

  • PDF

Assessing the Accuracy of Outlier Tests in Nonlinear Regression

  • Kahng, Myung-Wook;Kim, Bu-Yang
    • Communications for Statistical Applications and Methods
    • /
    • v.16 no.1
    • /
    • pp.163-168
    • /
    • 2009
  • Given the specific mean shift outlier model, the standard approaches to obtaining test statistics for outliers are discussed. Accuracy of outlier tests is investigated using subset curvatures. These subset curvatures appear to be reliable indicators of the adequacy of the linearization based test. Also, we consider obtaining graphical summaries of uncertainty in estimating parameters through confidence curves. The results are applied to the problem of assessing the accuracy of outlier tests.

Accuracy of Multiple Outlier Tests in Nonlinear Regression

  • Kahng, Myung-Wook
    • Communications for Statistical Applications and Methods
    • /
    • v.18 no.1
    • /
    • pp.131-136
    • /
    • 2011
  • The original Bates-Watts framework applies only to the complete parameter vector. Thus, guidelines developed in that framework can be misleading when the adequacy of the linear approximation is very different for different subsets. The subset curvature measures appear to be reliable indicators of the adequacy of linear approximation for an arbitrary subset of parameters in nonlinear models. Given the specific mean shift outlier model, the standard approaches to obtaining test statistics for outliers are discussed. The accuracy of outlier tests is investigated using subset curvatures.

Test for an Outlier in Multivariate Regression with Linear Constraints

  • Kim, Myung-Geun
    • Communications for Statistical Applications and Methods
    • /
    • v.9 no.2
    • /
    • pp.473-478
    • /
    • 2002
  • A test for a single outlier in multivariate regression with linear constraints on regression coefficients using a mean shift model is derived. It is shown that influential observations based on case-deletions in testing linear hypotheses are determined by two types of outliers that are mean shift outliers with or without linear constraints, An illustrative example is given.

The Identification Of Multiple Outliers

  • Park, Jin-Pyo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.11 no.2
    • /
    • pp.201-215
    • /
    • 2000
  • The classical method for regression analysis is the least squares method. However, if the data contain significant outliers, the least squares estimator can be broken down by outliers. To remedy this problem, the robust methods are important complement to the least squares method. Robust methods down weighs or completely ignore the outliers. This is not always best because the outliers can contain some very important information about the population. If they can be detected, the outliers can be further inspected and appropriate action can be taken based on the results. In this paper, I propose a sequential outlier test to identify outliers. It is based on the nonrobust estimate and the robust estimate of scatter of a robust regression residuals and is applied in forward procedure, removing the most extreme data at each step, until the test fails to detect outliers. Unlike other forward procedures, the present one is unaffected by swamping or masking effects because the statistics is based on the robust regression residuals. I show the asymptotic distribution of the test statistics and apply the test to several real data and simulated data for the test to be shown to perform fairly well.

  • PDF

Estimation of Design Flood by the Determination of Best Fitting Order of LH-Moments ( I ) (LH-모멘트의 적정 차수 결정에 의한 설계홍수량 추정 ( I ))

  • 맹승진;이순혁
    • Magazine of the Korean Society of Agricultural Engineers
    • /
    • v.44 no.6
    • /
    • pp.49-60
    • /
    • 2002
  • This study was conducted to estimate the design flood by the determination of best fitting order of LH-moments of the annual maximum series at six and nine watersheds in Korea and Australia, respectively. Adequacy for flood flow data was confirmed by the tests of independence, homogeneity, and outliers. Gumbel (GUM), Generalized Extreme Value (GEV), Generalized Pareto (GPA), and Generalized Logistic (GLO) distributions were applied to get the best fitting frequency distribution for flood flow data. Theoretical bases of L, L1, L2, L3 and L4-moments were derived to estimate the parameters of 4 distributions. L, L1, L2, L3 and L4-moment ratio diagrams (LH-moments ratio diagram) were developed in this study. GEV distribution for the flood flow data of the applied watersheds was confirmed as the best one among others by the LH-moments ratio diagram and Kolmogorov-Smirnov test. Best fitting order of LH-moments will be derived by the confidence analysis of estimated design flood in the second report of this study.

Normality Test of the Water Quality Monitoring Data in Harbour (항만 환경자료의 정규분포 적합 검정)

  • Cho, Hong-Yeon
    • Journal of Korean Society of Coastal and Ocean Engineers
    • /
    • v.33 no.2
    • /
    • pp.53-64
    • /
    • 2021
  • Normality test (hereafter NT) is a highly recommended test for statistical estimation because the normality assumption on the data is the basic and essential. NT was carried using the KOEM water quality monitoring data in harbor which are composed of total 3,000 data sets (50 stations, 30 water quality parameters including surface and bottom layers, and two seasons, such as summer and winter). The comparative analysis of the normality are carried out using total 18 methods supported by the R program packages. In addition, the Shapiro-Wilk test method is selected as the references method in this study for the analysis on the data transformation and outliers's effects in detail. The numbers of normality assumption rejection (NAR) are estimated and compared to these cases, before and after applications of the Box-Cox transformation and Rosner's outlier test. The NAR numbers are reduced from 24-28 to 3-4 in the "before and after" BC transformation cases with the no outlier-exclusion condition. On the contrary, the NAR numbers are rapidly diminished from 6-9 to below one in the same case with the outlier exclusion condition. Thus, the Box-Cox transformation based on the outlier test of the coastal water quality monitoring data that are not comes form the normal distribution, is highly recommended for the suitable statistical estimation and inferences.

A Preliminary Study on the Establishment of Background Levels and Management Targets in the Coastal Ecosystem of Korean Peninsula Using Outlier Test (이상치 검증을 이용한 한반도 연안생태계의 배경 농도 및 관리 항목 도출에 대한 예비 연구)

  • CHIN, BYUNG SUN;HWANG, IN SEO;KIM, YOUNG NAM;KOH, BYOUNG SEOL;YOO, JEONG KYU;JUNG, HOE IN;YEO, JUNG WON;WOO, SEUNG;PARK, GYUNG SOO
    • The Sea:JOURNAL OF THE KOREAN SOCIETY OF OCEANOGRAPHY
    • /
    • v.24 no.1
    • /
    • pp.170-186
    • /
    • 2019
  • The marine ecosystem survey investigates and analyzes multi-parameters at various times from various sites. Therefore, it is very difficult to analyze the complex ecological data of multi-items effectively, and it is more difficult to identify the current status and diagnose the problems of ecosystem through data analysis. Therefore, this paper aims to provide an example of interpretation of complex ecological data through analysis of distribution characteristics and outliers of ecological survey data. The main contents of the study are to elucidate the background levels of coastal ecosystem parameters considering the distribution characteristics of data, and to establish ecosystem monitoring indicators and an adaptive management system for the coastal waters in Korean Peninsula. The data used in this paper are based on the coastal ecosystem survey of the National Marine Ecosystem Monitoring Program conducted by the Ministry of Oceans and Fisheries (MOF) and the Korea Marine Environment Management Corporation (KOEM), and the major citations are from year 2015 to 2017. This article is a preliminary study to establish the above processes and the final result will be derived in 2020 when the coastal ecosystem survey is completed three times along the Korean coast.

A Test on a Specific Set of Outlier Candidates in a Linear Model (선형모형에서 특정 이상치 후보군에 대한 검정)

  • Seo, Han Son;Yoon, Min
    • The Korean Journal of Applied Statistics
    • /
    • v.27 no.2
    • /
    • pp.307-315
    • /
    • 2014
  • An exact distribution of the test statistic to test for multiple outlier candidates does not generally exist; therefore, tests of individual outliers (or tests using simulated critical-values) are usually conducted instead of testing for groups of outliers. This article is on procedures to test outlying observations. We suggest a method that can be applied to arbitrary observations or multiple outlier candidates detected by an outlier detecting method. A Monte Carlo study performance is used to compare the proposed method with others.

A sequential outlier detecting method using a clustering algorithm (군집 알고리즘을 이용한 순차적 이상치 탐지법)

  • Seo, Han Son;Yoon, Min
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.4
    • /
    • pp.699-706
    • /
    • 2016
  • Outlier detection methods without performing a test often do not succeed in detecting multiple outliers because they are structurally vulnerable to a masking effect or a swamping effect. This paper considers testing procedures supplemented to a clustering-based method of identifying the group with a minority of the observations as outliers. One of general steps is performing a variety of t-test on individual outlier-candidates. This paper proposes a sequential procedure for searching for outliers by changing cutoff values on a cluster tree and performing a test on a set of outlier-candidates. The proposed method is illustrated and compared to existing methods by an example and Monte Carlo studies.