• Title/Summary/Keyword: multiple statistical regression

Search Result 1,587, Processing Time 0.028 seconds

MULTIPLE OUTLIER DETECTION IN LOGISTIC REGRESSION BY USING INFLUENCE MATRIX

  • Lee, Gwi-Hyun;Park, Sung-Hyun
    • Journal of the Korean Statistical Society
    • /
    • v.36 no.4
    • /
    • pp.457-469
    • /
    • 2007
  • Many procedures are available to identify a single outlier or an isolated influential point in linear regression and logistic regression. But the detection of influential points or multiple outliers is more difficult, owing to masking and swamping problems. The multiple outlier detection methods for logistic regression have not been studied from the points of direct procedure yet. In this paper we consider the direct methods for logistic regression by extending the $Pe\tilde{n}a$ and Yohai (1995) influence matrix algorithm. We define the influence matrix in logistic regression by using Cook's distance in logistic regression, and test multiple outliers by using the mean shift model. To show accuracy of the proposed multiple outlier detection algorithm, we simulate artificial data including multiple outliers with masking and swamping.

Multiple Deletions in Logistic Regression Models

  • Jung, Kang-Mo
    • Communications for Statistical Applications and Methods
    • /
    • v.16 no.2
    • /
    • pp.309-315
    • /
    • 2009
  • We extended the results of Roy and Guria (2008) to multiple deletions in logistic regression models. Since single deletions may not exactly detect outliers or influential observations due to swamping effects and masking effects, it needs multiple deletions. We developed conditional deletion diagnostics which are designed to overcome problems of masking effects. We derived the closed forms for several statistics in logistic regression models. They give useful diagnostics on the statistics.

Multiple Structural Change-Point Estimation in Linear Regression Models

  • Kim, Jae-Hee
    • Communications for Statistical Applications and Methods
    • /
    • v.19 no.3
    • /
    • pp.423-432
    • /
    • 2012
  • This paper is concerned with the detection of multiple change-points in linear regression models. The proposed procedure relies on the local estimation for global change-point estimation. We propose a multiple change-point estimator based on the local least squares estimators for the regression coefficients and the split measure when the number of change-points is unknown. Its statistical properties are shown and its performance is assessed by simulations and real data applications.

A Technique to Improve the Fit of Linear Regression Models for Successive Sets of Data

  • Park, Sung H.
    • Journal of the Korean Statistical Society
    • /
    • v.5 no.1
    • /
    • pp.19-28
    • /
    • 1976
  • In empirical study for fitting a multiple linear regression model for successive cross-sections data observed on the same set of independent variables over several time periods, one often faces the problem of poor $R^2$, the multiple coefficient of determination, which provides a standard measure of how good a specified regression line fits the sample data.

  • PDF

Computational Methods for Detection of Multiple Outliers in Nonlinear Regression

  • Myung-Wook Kahng
    • Communications for Statistical Applications and Methods
    • /
    • v.3 no.2
    • /
    • pp.1-11
    • /
    • 1996
  • The detection of multiple outliers in nonlinear regression models can be computationally not feasible. As a compromise approach, we consider the use of simulated annealing algorithm, an approximate approach to combinatorial optimization. We show that this method ensures convergence and works well in locating multiple outliers while reducing computational time.

  • PDF

Application of discrete Weibull regression model with multiple imputation

  • Yoo, Hanna
    • Communications for Statistical Applications and Methods
    • /
    • v.26 no.3
    • /
    • pp.325-336
    • /
    • 2019
  • In this article we extend the discrete Weibull regression model in the presence of missing data. Discrete Weibull regression models can be adapted to various type of dispersion data however, it is not widely used. Recently Yoo (Journal of the Korean Data and Information Science Society, 30, 11-22, 2019) adapted the discrete Weibull regression model using single imputation. We extend their studies by using multiple imputation also with several various settings and compare the results. The purpose of this study is to address the merit of using multiple imputation in the presence of missing data in discrete count data. We analyzed the seventh Korean National Health and Nutrition Examination Survey (KNHANES VII), from 2016 to assess the factors influencing the variable, 1 month hospital stay, and we compared the results using discrete Weibull regression model with those of Poisson, negative Binomial and zero-inflated Poisson regression models, which are widely used in count data analyses. The results showed that the discrete Weibull regression model using multiple imputation provided the best fit. We also performed simulation studies to show the accuracy of the discrete Weibull regression using multiple imputation given both under- and over-dispersed distribution, as well as varying missing rates and sample size. Sensitivity analysis showed the influence of mis-specification and the robustness of the discrete Weibull model. Using imputation with discrete Weibull regression to analyze discrete data will increase explanatory power and is widely applicable to various types of dispersion data with a unified model.

A Study on Detection of Influential Observations on A Subset of Regression Parameters in Multiple Regression

  • Park, Sung Hyun;Oh, Jin Ho
    • Communications for Statistical Applications and Methods
    • /
    • v.9 no.2
    • /
    • pp.521-531
    • /
    • 2002
  • Various diagnostic techniques for identifying influential observations are mostly based on the deletion of a single observation. While such techniques can satisfactorily identify influential observations in many cases, they will not always be successful because of some mask effect. It is necessary, therefore, to develop techniques that examine the potentially influential effects of a subset of observations. The partial regression plots can be used to examine an influential observation for a single parameter in multiple linear regression. However, it is often desirable to detect influential observations for a subset of regression parameters when interest centers on a selected subset of independent variables. Thus, we propose a diagnostic measure which deals with detecting influential observations on a subset of regression parameters. In this paper, we propose a measure M, which can be effectively used for the detection of influential observations on a subset of regression parameters in multiple linear regression. An illustrated example is given to show how we can use the new measure M to identify influential observations on a subset of regression parameters.

Robust Estimation and Outlier Detection

  • Myung Geun Kim
    • Communications for Statistical Applications and Methods
    • /
    • v.1 no.1
    • /
    • pp.33-40
    • /
    • 1994
  • The conditional expectation of a random variable in a multivariate normal random vector is a multiple linear regression on its predecessors. Using this fact, the least median of squares estimation method developed in a multiple linear regression is adapted to a multivariate data to identify influential observations. The resulting method clearly detect outliers and it avoids the masking effect.

  • PDF

Multivariate statistical analysis of the comparative antioxidant activity of the total phenolics and tannins in the water and ethanol extracts of dried goji berry (Lycium chinense) fruits

  • Kim, Joo-Shin;Kimm, Haklin Alex
    • Korean Journal of Food Science and Technology
    • /
    • v.51 no.3
    • /
    • pp.227-236
    • /
    • 2019
  • Antioxidant activity in water and ethanol extracts of dried Lycium chinense fruit, as a result of the total phenolic and tannin content, was measured using a number of chemical and biochemical assays for radical scavenging and inhibition of lipid peroxidation, with the analysis being extended by applying a bootstrapping statistical method. Previous statistical analyses mostly provided linear correlation and regression analyses between antioxidant activity and increasing concentrations of phenolics and tannins in a concentration-dependent mode. The present study showed that multiple component or multivariate analysis by applying multiple regression analysis or regression planes proved more informative than linear regression analysis of the relationship between the concentration of individual components and antioxidant activity. In this paper, we represented the multivariate analysis of antioxidant activities of both phenolic and tannin contents combined in the water and ethanol extracts, which revealed the hidden observations that were not evident from linear statistical analysis.