• Title/Summary/Keyword: Influential observations

Search Result 73, Processing Time 0.028 seconds

Effect of outliers on the variable selection by the regularized regression

  • Jeong, Junho;Kim, Choongrak
    • Communications for Statistical Applications and Methods
    • /
    • v.25 no.2
    • /
    • pp.235-243
    • /
    • 2018
  • Many studies exist on the influence of one or few observations on estimators in a variety of statistical models under the "large n, small p" setup; however, diagnostic issues in the regression models have been rarely studied in a high dimensional setup. In the high dimensional data, the influence of observations is more serious because the sample size n is significantly less than the number variables p. Here, we investigate the influence of observations on the least absolute shrinkage and selection operator (LASSO) estimates, suggested by Tibshirani (Journal of the Royal Statistical Society, Series B, 73, 273-282, 1996), and the influence of observations on selected variables by the LASSO in the high dimensional setup. We also derived an analytic expression for the influence of the k observation on LASSO estimates in simple linear regression. Numerical studies based on artificial data and real data are done for illustration. Numerical results showed that the influence of observations on the LASSO estimates and the selected variables by the LASSO in the high dimensional setup is more severe than that in the usual "large n, small p" setup.

ESTIMATING NEAR REAL TIME PRECIPITABLE WATER FROM SHORT BASELINE GPS OBSERVATIONS

  • Yang, Den-Ring;Liou, Yuei-An;Tseng, Pei-Li
    • Proceedings of the KSRS Conference
    • /
    • 2007.10a
    • /
    • pp.410-413
    • /
    • 2007
  • Water vapor in the atmosphere is an influential factor of the hydrosphere cycle, which exchanges heat through phase change and is essential to precipitation. Because of its significance in altering weather, the estimation of water vapor amount and distribution is crucial to determine the precision of the weather forecasting and the understanding of regional/local climate. It is shown that it is reliable to measure precipitable water (PW) using long baseline (500-2000km) GPS observations. However, it becomes infeasible to derive absolute PW from GPS observations in Taiwan due to geometric limitation of relatively short-baseline network. In this study, a method of deriving Near-Real-Time PW from short baseline GPS observations is proposed. This method uses a reference station to derive a regression model for wet delay, and to interpolate the difference of wet delay among stations. Then, the precipitable water is obtained by using a conversion factor derived from radiosondes. The method has been tested by using the reference station located on Mt. Ho-Hwan with eleven stations around Taiwan. The result indicates that short baseline GPS observations can be used to precisely estimate the precipitable water in near-real-time.

  • PDF

Graphical Methods for the Sensitivity Analysis in Discriminant Analysis

  • Jang, Dae-Heung;Anderson-Cook, Christine M.;Kim, Youngil
    • Communications for Statistical Applications and Methods
    • /
    • v.22 no.5
    • /
    • pp.475-485
    • /
    • 2015
  • Similar to regression, many measures to detect influential data points in discriminant analysis have been developed. Many follow similar principles as the diagnostic measures used in linear regression in the context of discriminant analysis. Here we focus on the impact on the predicted classification posterior probability when a data point is omitted. The new method is intuitive and easily interpretable compared to existing methods. We also propose a graphical display to show the individual movement of the posterior probability of other data points when a specific data point is omitted. This enables the summaries to capture the overall pattern of the change.

MULTIPLE DELETION MEASURES OF TEST STATISTICS IN MULTIVARIATE REGRESSION

  • Jung, Kang-Mo
    • Journal of applied mathematics & informatics
    • /
    • v.26 no.3_4
    • /
    • pp.679-688
    • /
    • 2008
  • In multivariate regression analysis there exist many influence measures on the regression estimates. However it seems to be few of influence diagnostics on test statistics in hypothesis testing. Case-deletion approach is fundamental for investigating influence of observations on estimates or statistics. Tang and Fung (1997) derived single case-deletion of the Wilks' ratio, Lawley-Hotelling trace, Pillai's trace for testing a general linear hypothesis of the regression coefficients in multivariate regression. In this paper we derived more extended form of those measures to deal with joint influence among observations. A numerical example is given to illustrate the effect of joint influence on the test statistics.

  • PDF

Diagnostics for the Cox model

  • Xue, Yishu;Schifano, Elizabeth D.
    • Communications for Statistical Applications and Methods
    • /
    • v.24 no.6
    • /
    • pp.583-604
    • /
    • 2017
  • The most popular regression model for the analysis of time-to-event data is the Cox proportional hazards model. While the model specifies a parametric relationship between the hazard function and the predictor variables, there is no specification regarding the form of the baseline hazard function. A critical assumption of the Cox model, however, is the proportional hazards assumption: when the predictor variables do not vary over time, the hazard ratio comparing any two observations is constant with respect to time. Therefore, to perform credible estimation and inference, one must first assess whether the proportional hazards assumption is reasonable. As with other regression techniques, it is also essential to examine whether appropriate functional forms of the predictor variables have been used, and whether there are any outlying or influential observations. This article reviews diagnostic methods for assessing goodness-of-fit for the Cox proportional hazards model. We illustrate these methods with a case-study using available R functions, and provide complete R code for a simulated example as a supplement.

CASB-DELETION DIAGNOSTICS FOR TESTING A LINEAR HYPOTHESIS ABOUT REGRESSION COEFFICIENTS

  • Kim, Myung-Geun
    • Journal of applied mathematics & informatics
    • /
    • v.10 no.1_2
    • /
    • pp.111-118
    • /
    • 2002
  • We study the influence of observations on testing a linear hypothesis using single and multiple case-deletions. The change in the F-test statistic due to case-deletions is shown to be completely determined by two externally Studentized residuals. These residuals we used for investigating the outlyingness when there are linear constraints or not. An illustrative example is given. It shows the usefulness of case-deletions.

Influence Assessment in Robust Regression

  • Sohn, Bang-Yong;Huh, Myung-Hoe
    • Communications for Statistical Applications and Methods
    • /
    • v.4 no.1
    • /
    • pp.21-32
    • /
    • 1997
  • Robust regression based on M-estimator reduces and/or bounds the influence of outliers in the y-direction only. Therefore, when several influential observations exist, diagnostics in the robust regression is required in order to detect them. In this paper, we propose influence diagnostics in the robust regression based on M-estimator and its one-step version. Noting that M-estimator can be obtained through iterative weighted least squares regression by using internal weights, we apply the weighted least squares (WLS) regression diagnostics to robust regression.

  • PDF

Multiple Deletions in Logistic Regression Models

  • Jung, Kang-Mo
    • Communications for Statistical Applications and Methods
    • /
    • v.16 no.2
    • /
    • pp.309-315
    • /
    • 2009
  • We extended the results of Roy and Guria (2008) to multiple deletions in logistic regression models. Since single deletions may not exactly detect outliers or influential observations due to swamping effects and masking effects, it needs multiple deletions. We developed conditional deletion diagnostics which are designed to overcome problems of masking effects. We derived the closed forms for several statistics in logistic regression models. They give useful diagnostics on the statistics.

A Study on Sensitivity Analysis in Ridge Regression (능형 회귀에서의 민감도 분석에 관한 연구)

  • Kim, Soon-Kwi
    • Journal of Korean Society for Quality Management
    • /
    • v.19 no.1
    • /
    • pp.1-15
    • /
    • 1991
  • In this paper, we discuss and review various measures which have been presented for studying outliers, high-leverage points, and influential observations when ridge regression estimation is adopted. We derive the influence function for ${\underline{\hat{\beta}}}\small{R}$, the ridge regression estimator, and discuss its various finite sample approximations when ridge regression is postulated. We also study several diagnostic measures such as Welsh-Kuh's distance, Cook's distance etc.

  • PDF

Test for an Outlier in Multivariate Regression with Linear Constraints

  • Kim, Myung-Geun
    • Communications for Statistical Applications and Methods
    • /
    • v.9 no.2
    • /
    • pp.473-478
    • /
    • 2002
  • A test for a single outlier in multivariate regression with linear constraints on regression coefficients using a mean shift model is derived. It is shown that influential observations based on case-deletions in testing linear hypotheses are determined by two types of outliers that are mean shift outliers with or without linear constraints, An illustrative example is given.