• 제목/요약/키워드: Cook's distance

검색결과 21건 처리시간 0.023초

A Note on Cook's Distance in the Multivariate Linear Model

  • Bae, Whasoo;Hwang, Hyunmi;Kim, Choongrak
    • Communications for Statistical Applications and Methods
    • /
    • 제20권1호
    • /
    • pp.23-28
    • /
    • 2013
  • We propose a version of Cook's distance (called local distance) in the multivariate linear model. The proposed version is a matrix, while the existing version of Cook's distance (called global distance) is a scalar. The existing Cook's distance is the trace of the proposed Cook's distance. In addition, we argue that the proposed Cook's distance has a more natural extension of the Cook's distance in the univariate linear model than the existing Cook's distance. An illustrative example based on a real data set is given.

Cutoff Values for Cook's Distance

  • Choongrak Kim
    • Communications for Statistical Applications and Methods
    • /
    • 제3권2호
    • /
    • pp.13-19
    • /
    • 1996
  • Cook's distance(Cook, 1997) is one of the most widely used influence measures to assess the influence of single observations or sets of observations in the linear regression model. After computing Cook(1977) suggested guidelines based on a confidence ellipsoid for the regression parameter ${\beta}$. In this paper, we suggest cutoff values for Cook's distance cia Monte Carlo simulation, and compare them with Cook's guidelines. An example based on a real data set is given.

  • PDF

A cautionary note on the use of Cook's distance

  • Kim, Myung Geun
    • Communications for Statistical Applications and Methods
    • /
    • 제24권3호
    • /
    • pp.317-324
    • /
    • 2017
  • An influence measure known as Cook's distance has been used for judging the influence of each observation on the least squares estimate of the parameter vector. The distance does not reflect the distributional property of the change in the least squares estimator of the regression coefficients due to case deletions: the distribution has a covariance matrix of rank one and thus it has a support set determined by a line in the multidimensional Euclidean space. As a result, the use of Cook's distance may fail to correctly provide information about influential observations, and we study some reasons for the failure. Three illustrative examples will be provided, in which the use of Cook's distance fails to give the right information about influential observations or it provides the right information about the most influential observation. We will seek some reasons for the wrong or right provision of information.

A Comparison of Influence Diagnostics in Linear Mixed Models

  • Lee, Jang-Taek
    • Communications for Statistical Applications and Methods
    • /
    • 제10권1호
    • /
    • pp.125-134
    • /
    • 2003
  • Standard estimation methods for linear mixed models are sensitive to influential observations. However, tools and concepts for linear mixed model diagnostics are rudimentary until now and research is heavily demanded in linear mixed models. In this paper, we consider two diagnostics to evaluate the effects of individual observations in the estimation of fixed effects for linear mixed models. Those are Cook's distance and COVRATIO. Results of our limited simulation study suggest that the Cook's distance is not good statistical quantity in linear mixed models. Also calibration point for COVRATIO seems to be quite conservative.

The local influence of LIU type estimator in linear mixed model

  • Zhang, Lili;Baek, Jangsun
    • Journal of the Korean Data and Information Science Society
    • /
    • 제26권2호
    • /
    • pp.465-474
    • /
    • 2015
  • In this paper, we study the local influence analysis of LIU type estimator in the linear mixed models. Using the method proposed by Shi (1997), the local influence of LIU type estimator in three disturbance models are investigated respectively. Furthermore, we give the generalized Cook's distance to assess the influence, and illustrate the efficiency of the proposed method by example.

능형 회귀에서의 민감도 분석에 관한 연구 (A Study on Sensitivity Analysis in Ridge Regression)

  • Kim, Soon-Kwi
    • 품질경영학회지
    • /
    • 제19권1호
    • /
    • pp.1-15
    • /
    • 1991
  • In this paper, we discuss and review various measures which have been presented for studying outliers, high-leverage points, and influential observations when ridge regression estimation is adopted. We derive the influence function for ${\underline{\hat{\beta}}}\small{R}$, the ridge regression estimator, and discuss its various finite sample approximations when ridge regression is postulated. We also study several diagnostic measures such as Welsh-Kuh's distance, Cook's distance etc.

  • PDF

Influential Points in GLMs via Backwards Stepping

  • Jeong, Kwang-Mo;Oh, Hae-Young
    • Communications for Statistical Applications and Methods
    • /
    • 제9권1호
    • /
    • pp.197-212
    • /
    • 2002
  • When assessing goodness-of-fit of a model, a small subset of deviating observations can give rise to a significant lack of fit. It is therefore important to identify such observations and to assess their effects on various aspects of analysis. A Cook's distance measure is usually used to detect influential observation. But it sometimes is not fully effective in identifying truly influential set of observations because there may exist masking or swamping effects. In this paper we confine our attention to influential subset In GLMs such as logistic regression models and loglinear models. We modify a backwards stepping algorithm, which was originally suggested for detecting outlying cells in contingency tables, to detect influential observations in GLMs. The algorithm consists of two steps, the identification step and the testing step. In identification step we Identify influential observations based on influencial measures such as Cook's distances. On the other hand in testing step we test the subset of identified observations to be significant or not Finally we explain the proposed method through two types of dataset related to logistic regression model and loglinear model, respectively.

화학적산소요구량의 총유기탄소 변환을 위한 이상자료의 탐지와 처리 (Outlier Detection and Treatment for the Conversion of Chemical Oxygen Demand to Total Organic Carbon)

  • 조범준;조홍연;김성
    • 한국해안·해양공학회논문집
    • /
    • 제26권4호
    • /
    • pp.207-216
    • /
    • 2014
  • 총유기탄소(TOC)는 해양의 탄소순환 연구분야에서 직접적인 생물학적 지표로 이용되는 중요한 인자다. 가용한 TOC 자료가 상대적으로 화학적산소요구량(COD) 자료 보다 부족하기 때문에 COD 자료를 활용하여 TOC 자료를 추정할 수 있다. COD를 TOC 로의 변환 시 TOC 추정에 직접적으로 영향을 미치는 COD 관측자료에 포함된 이상자료의 탐지와 적절한 처리는 합리적이고 객관적으로 수행되어야 한다. 본 연구에서는 국내 연안해역에서 관측된 염분, COD 및 TOC 자료에 대한 최적회귀모형을 제시하였다. 최적회귀모형은 이상자료와 영향자료를 여러 가지 탐색방법으로 진단하여 제거 전 후의 자료 개수 변화, 변동계수 및 RMS 오차를 비교 및 분석하여 선택하였다. 연구수행 결과, Cook의 진단방법과 SIQR의 boxplot 방법을 조합한 방법이 가장 적절한 것으로 파악되었다. 최적 회귀 함수는 TOC(mg/L) = $0.44{\cdot}COD(mg/L)+1.53$ 이고, 결정계수는 0.47 정도로 나타났으며, RMS 오차는 0.85 mg/L이다. RMS 오차와 지레계수(leverage values)의 변동계수는 이상자료 제거 전에 비하여 각각 31%, 80%로 크게 감소되었다. 본 연구에서 제시된 방법을 통해 COD와 TOC 관측자료에 포함된 이상자료와 영향자료의 과도한 영향을 진단 및 제거하였기 때문에 보다 적절한 회귀곡선식을 제시할 수 있었다.

Local Influence of the Quasi-likelihood Estimators in Generalized Linear Models

  • Jung, Kang-Mo
    • Communications for Statistical Applications and Methods
    • /
    • 제14권1호
    • /
    • pp.229-239
    • /
    • 2007
  • We present a diagnostic method for the quasi-likelihood estimators in generalized linear models. Since these estimators can be usually obtained by iteratively reweighted least squares which are well known to be very sensitive to unusual data, a diagnostic step is indispensable to analysis of data. We extend the local influence approach based on the maximum likelihood function to that on the quasi-likelihood function. Under several perturbation schemes local influence diagnostics are derived. An illustrative example is given and we compare the results provided by local influence and deletion.