• Title/Summary/Keyword: multiple outliers

Search Result 79, Processing Time 0.023 seconds

Local Influence Assessment of the Misclassification Probability in Multiple Discriminant Analysis

  • Jung, Kang-Mo
    • Journal of the Korean Statistical Society
    • /
    • v.27 no.4
    • /
    • pp.471-483
    • /
    • 1998
  • The influence of observations on the misclassification probability in multiple discriminant analysis under the equal covariance assumption is investigated by the local influence method. Under an appropriate perturbation we can get information about influential observations and outliers by studying the curvatures and the associated direction vectors of the perturbation-formed surface of the misclassification probability. We show that the influence function method gives essentially the same information as the direction vector of the maximum slope. An illustrative example is given for the effectiveness of the local influence method.

  • PDF

An Outlier Detection Method in Penalized Spline Regression Models (벌점 스플라인 회귀모형에서의 이상치 탐지방법)

  • Seo, Han Son;Song, Ji Eun;Yoon, Min
    • The Korean Journal of Applied Statistics
    • /
    • v.26 no.4
    • /
    • pp.687-696
    • /
    • 2013
  • The detection and the examination of outliers are important parts of data analysis because some outliers in the data may have a detrimental effect on statistical analysis. Outlier detection methods have been discussed by many authors. In this article, we propose to apply Hadi and Simonoff's (1993) method to penalized spline a regression model to detect multiple outliers. Simulated data sets and real data sets are used to illustrate and compare the proposed procedure to a penalized spline regression and a robust penalized spline regression.

A sequential outlier detecting method using a clustering algorithm (군집 알고리즘을 이용한 순차적 이상치 탐지법)

  • Seo, Han Son;Yoon, Min
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.4
    • /
    • pp.699-706
    • /
    • 2016
  • Outlier detection methods without performing a test often do not succeed in detecting multiple outliers because they are structurally vulnerable to a masking effect or a swamping effect. This paper considers testing procedures supplemented to a clustering-based method of identifying the group with a minority of the observations as outliers. One of general steps is performing a variety of t-test on individual outlier-candidates. This paper proposes a sequential procedure for searching for outliers by changing cutoff values on a cluster tree and performing a test on a set of outlier-candidates. The proposed method is illustrated and compared to existing methods by an example and Monte Carlo studies.

Safety Robust Speaker Recognition Against Utterance Variationsed (발성변화에 강인한 화자 인식에 관한 연구)

  • Lee Ki-Yong
    • Journal of Internet Computing and Services
    • /
    • v.5 no.2
    • /
    • pp.69-73
    • /
    • 2004
  • A speaker model In speaker recognition system is to be trained from a large data set gathered in multiple sessions. Large data set requires large amount of memory and computation, and moreover it's practically hard to make users utter the data inseveral sessions. Recently the incremental adaptation methods are proposed to cover the problems, However, the data set gathered from multiple sessions is vulnerable to the outliers from the irregular utterance variations and the presence of noise, which result in inaccurate speaker model. In this paper, we propose an incremental robust adaptation method to minimize the influence of outliers on Gaussian Mixture Madel based speaker model. The robust adaptation is obtained from an incremental version of M-estimation. Speaker model is initially trained from small amount of data and it is adapted recursively with the data available in each session, Experimental results from the data set gathered over seven months show that the proposed method is robust against outliers.

  • PDF

Study of estimated model of drift through real ship (실선에 의한 표류 예측모델에 관한 연구)

  • Chang-Heon LEE;Kwang-Il KIM;Sang-Lok YOO;Min-Son KIM;Seung-Hun HAN
    • Journal of the Korean Society of Fisheries and Ocean Technology
    • /
    • v.60 no.1
    • /
    • pp.57-70
    • /
    • 2024
  • In order to present a predictive drift model, Jeju National University's training ship was tested for about 11 hours and 40 minutes, and 81 samples that selected one of the entire samples at ten-minute intervals were subjected to regression analysis after verifying outliers and influence points. In the outlier and influence point analysis, although there is a part where the wind direction exceeds 1 in the DFBETAS (difference in Betas) value, the CV (cumulative variable) value is 6%, close to 1. Therefore, it was judged that there would be no problem in conducting multiple regression analyses on samples. The standard regression coefficient showed how much current and wind affect the dependent variable. It showed that current speed and direction were the most important variables for drift speed and direction, with values of 47.1% and 58.1%, respectively. The analysis showed that the statistical values indicated the fit of the model at the significance level of 0.05 for multiple regression analysis. The multiple correlation coefficients indicating the degree of influence on the dependent variable were 83.2% and 89.0%, respectively. The determination of coefficients were 69.3% and 79.3%, and the adjusted determination of coefficients were 67.6% and 78.3%, respectively. In this study, a more quantitative prediction model will be presented because it is performed after identifying outliers and influence points of sample data before multiple regression analysis. Therefore, many studies will be active in the future by combining them.

A Test on a Specific Set of Outlier Candidates in a Linear Model (선형모형에서 특정 이상치 후보군에 대한 검정)

  • Seo, Han Son;Yoon, Min
    • The Korean Journal of Applied Statistics
    • /
    • v.27 no.2
    • /
    • pp.307-315
    • /
    • 2014
  • An exact distribution of the test statistic to test for multiple outlier candidates does not generally exist; therefore, tests of individual outliers (or tests using simulated critical-values) are usually conducted instead of testing for groups of outliers. This article is on procedures to test outlying observations. We suggest a method that can be applied to arbitrary observations or multiple outlier candidates detected by an outlier detecting method. A Monte Carlo study performance is used to compare the proposed method with others.

On the Robustness of $L_1$-estimator in Linear Regression Models

  • Bu-Yong Kim
    • Communications for Statistical Applications and Methods
    • /
    • v.2 no.2
    • /
    • pp.277-287
    • /
    • 1995
  • It is well kmown that the $L_1$-estimator is robust with respect to vertical outliers in regression data, even if it is susceptible to bad leverage points. This article is concerned with the robustness of the $L_1$-estimator. To investigate its robustness against vertical outliers we may find intervals for the value of the response variable within which the $L_1$-estimates do not shange. A procedure for constructing those intervals in multiple limear regression is illustrated in the sensitivity analysis context. And then vertical breakdown point of the $L_1$-estimator is defined on the basis of properties related to those intervals.

  • PDF

Least absolute deviation estimator based consistent model selection in regression

  • Shende, K.S.;Kashid, D.N.
    • Communications for Statistical Applications and Methods
    • /
    • v.26 no.3
    • /
    • pp.273-293
    • /
    • 2019
  • We consider the problem of model selection in multiple linear regression with outliers and non-normal error distributions. In this article, the robust model selection criterion is proposed based on the robust estimation method with the least absolute deviation (LAD). The proposed criterion is shown to be consistent. We suggest proposed criterion based algorithms that are suitable for a large number of predictors in the model. These algorithms select only relevant predictor variables with probability one for large sample sizes. An exhaustive simulation study shows that the criterion performs well. However, the proposed criterion is applied to a real data set to examine its applicability. The simulation results show the proficiency of algorithms in the presence of outliers, non-normal distribution, and multicollinearity.

Simultaneous outlier detection and variable selection via difference-based regression model and stochastic search variable selection

  • Park, Jong Suk;Park, Chun Gun;Lee, Kyeong Eun
    • Communications for Statistical Applications and Methods
    • /
    • v.26 no.2
    • /
    • pp.149-161
    • /
    • 2019
  • In this article, we suggest the following approaches to simultaneous variable selection and outlier detection. First, we determine possible candidates for outliers using properties of an intercept estimator in a difference-based regression model, and the information of outliers is reflected in the multiple regression model adding mean shift parameters. Second, we select the best model from the model including the outlier candidates as predictors using stochastic search variable selection. Finally, we evaluate our method using simulations and real data analysis to yield promising results. In addition, we need to develop our method to make robust estimates. We will also to the nonparametric regression model for simultaneous outlier detection and variable selection.

Multiple Homographies Estimation using a Guided Sequential RANSAC (가이드된 순차 RANSAC에 의한 다중 호모그래피 추정)

  • Park, Yong-Hee;Kwon, Oh-Seok
    • The Journal of the Korea Contents Association
    • /
    • v.10 no.7
    • /
    • pp.10-22
    • /
    • 2010
  • This study proposes a new method of multiple homographies estimation between two images. With a large proportion of outliers, RANSAC is a general and very successful robust parameter estimator. However it is limited by the assumption that a single model acounts for all of the data inliers. Therefore, it has been suggested to sequentially apply RANSAC to estimate multiple 2D projective transformations. In this case, because outliers stay in the correspondence data set through the estimation process sequentially, it tends to progress slowly for all models. And, it is difficult to parallelize the sequential process due to the estimation order by the number of inliers for each model. We introduce a guided sequential RANSAC algorithm, using the local model instances that have been obtained from RANSAC procedure, which is able to reduce the number of random samples and deal simultaneously with multiple models.