Search | Korea Science

MULTIPLE OUTLIER DETECTION IN LOGISTIC REGRESSION BY USING INFLUENCE MATRIX

Lee, Gwi-Hyun;Park, Sung-Hyun
- Journal of the Korean Statistical Society
- /
- v.36 no.4
- /
- pp.457-469
- /
- 2007
Many procedures are available to identify a single outlier or an isolated influential point in linear regression and logistic regression. But the detection of influential points or multiple outliers is more difficult, owing to masking and swamping problems. The multiple outlier detection methods for logistic regression have not been studied from the points of direct procedure yet. In this paper we consider the direct methods for logistic regression by extending the $Pe\tilde{n}a$ and Yohai (1995) influence matrix algorithm. We define the influence matrix in logistic regression by using Cook's distance in logistic regression, and test multiple outliers by using the mean shift model. To show accuracy of the proposed multiple outlier detection algorithm, we simulate artificial data including multiple outliers with masking and swamping.
PDF KSCI

Multiple Deletions in Logistic Regression Models

Jung, Kang-Mo
- Communications for Statistical Applications and Methods
- /
- v.16 no.2
- /
- pp.309-315
- /
- 2009
We extended the results of Roy and Guria (2008) to multiple deletions in logistic regression models. Since single deletions may not exactly detect outliers or influential observations due to swamping effects and masking effects, it needs multiple deletions. We developed conditional deletion diagnostics which are designed to overcome problems of masking effects. We derived the closed forms for several statistics in logistic regression models. They give useful diagnostics on the statistics.
https://doi.org/10.5351/CKSS.2009.16.2.309 인용 PDF KSCI

Multiple Structural Change-Point Estimation in Linear Regression Models

Kim, Jae-Hee
- Communications for Statistical Applications and Methods
- /
- v.19 no.3
- /
- pp.423-432
- /
- 2012
This paper is concerned with the detection of multiple change-points in linear regression models. The proposed procedure relies on the local estimation for global change-point estimation. We propose a multiple change-point estimator based on the local least squares estimators for the regression coefficients and the split measure when the number of change-points is unknown. Its statistical properties are shown and its performance is assessed by simulations and real data applications.
https://doi.org/10.5351/CKSS.2012.19.3.423 인용 PDF KSCI

A Technique to Improve the Fit of Linear Regression Models for Successive Sets of Data

Park, Sung H.
- Journal of the Korean Statistical Society
- /
- v.5 no.1
- /
- pp.19-28
- /
- 1976
In empirical study for fitting a multiple linear regression model for successive cross-sections data observed on the same set of independent variables over several time periods, one often faces the problem of poor $R^2$, the multiple coefficient of determination, which provides a standard measure of how good a specified regression line fits the sample data.
PDF

Computational Methods for Detection of Multiple Outliers in Nonlinear Regression

Myung-Wook Kahng
- Communications for Statistical Applications and Methods
- /
- v.3 no.2
- /
- pp.1-11
- /
- 1996
The detection of multiple outliers in nonlinear regression models can be computationally not feasible. As a compromise approach, we consider the use of simulated annealing algorithm, an approximate approach to combinatorial optimization. We show that this method ensures convergence and works well in locating multiple outliers while reducing computational time.
PDF

Application of discrete Weibull regression model with multiple imputation

Yoo, Hanna
- Communications for Statistical Applications and Methods
- /
- v.26 no.3
- /
- pp.325-336
- /
- 2019
In this article we extend the discrete Weibull regression model in the presence of missing data. Discrete Weibull regression models can be adapted to various type of dispersion data however, it is not widely used. Recently Yoo (Journal of the Korean Data and Information Science Society, 30, 11-22, 2019) adapted the discrete Weibull regression model using single imputation. We extend their studies by using multiple imputation also with several various settings and compare the results. The purpose of this study is to address the merit of using multiple imputation in the presence of missing data in discrete count data. We analyzed the seventh Korean National Health and Nutrition Examination Survey (KNHANES VII), from 2016 to assess the factors influencing the variable, 1 month hospital stay, and we compared the results using discrete Weibull regression model with those of Poisson, negative Binomial and zero-inflated Poisson regression models, which are widely used in count data analyses. The results showed that the discrete Weibull regression model using multiple imputation provided the best fit. We also performed simulation studies to show the accuracy of the discrete Weibull regression using multiple imputation given both under- and over-dispersed distribution, as well as varying missing rates and sample size. Sensitivity analysis showed the influence of mis-specification and the robustness of the discrete Weibull model. Using imputation with discrete Weibull regression to analyze discrete data will increase explanatory power and is widely applicable to various types of dispersion data with a unified model.
https://doi.org/10.29220/CSAM.2019.26.3.325 인용 PDF KSCI

A Study on Detection of Influential Observations on A Subset of Regression Parameters in Multiple Regression

Park, Sung Hyun;Oh, Jin Ho
- Communications for Statistical Applications and Methods
- /
- v.9 no.2
- /
- pp.521-531
- /
- 2002
Various diagnostic techniques for identifying influential observations are mostly based on the deletion of a single observation. While such techniques can satisfactorily identify influential observations in many cases, they will not always be successful because of some mask effect. It is necessary, therefore, to develop techniques that examine the potentially influential effects of a subset of observations. The partial regression plots can be used to examine an influential observation for a single parameter in multiple linear regression. However, it is often desirable to detect influential observations for a subset of regression parameters when interest centers on a selected subset of independent variables. Thus, we propose a diagnostic measure which deals with detecting influential observations on a subset of regression parameters. In this paper, we propose a measure M, which can be effectively used for the detection of influential observations on a subset of regression parameters in multiple linear regression. An illustrated example is given to show how we can use the new measure M to identify influential observations on a subset of regression parameters.
https://doi.org/10.5351/CKSS.2002.9.2.521 인용 PDF KSCI

Robust Estimation and Outlier Detection

Myung Geun Kim
- Communications for Statistical Applications and Methods
- /
- v.1 no.1
- /
- pp.33-40
- /
- 1994
The conditional expectation of a random variable in a multivariate normal random vector is a multiple linear regression on its predecessors. Using this fact, the least median of squares estimation method developed in a multiple linear regression is adapted to a multivariate data to identify influential observations. The resulting method clearly detect outliers and it avoids the masking effect.
PDF

Multivariate statistical analysis of the comparative antioxidant activity of the total phenolics and tannins in the water and ethanol extracts of dried goji berry (Lycium chinense) fruits

Kim, Joo-Shin;Kimm, Haklin Alex
- Korean Journal of Food Science and Technology
- /
- v.51 no.3
- /
- pp.227-236
- /
- 2019
Antioxidant activity in water and ethanol extracts of dried Lycium chinense fruit, as a result of the total phenolic and tannin content, was measured using a number of chemical and biochemical assays for radical scavenging and inhibition of lipid peroxidation, with the analysis being extended by applying a bootstrapping statistical method. Previous statistical analyses mostly provided linear correlation and regression analyses between antioxidant activity and increasing concentrations of phenolics and tannins in a concentration-dependent mode. The present study showed that multiple component or multivariate analysis by applying multiple regression analysis or regression planes proved more informative than linear regression analysis of the relationship between the concentration of individual components and antioxidant activity. In this paper, we represented the multivariate analysis of antioxidant activities of both phenolic and tannin contents combined in the water and ethanol extracts, which revealed the hidden observations that were not evident from linear statistical analysis.
https://doi.org/10.9721/KJFST.2019.51.3.227 인용 PDF KSCI HTML

Influence Analysis of Constrained Regression Models

Kim, Myung-Geun
- Communications for Statistical Applications and Methods
- /
- v.14 no.2
- /
- pp.281-286
- /
- 2007
Cook's distance is generalized to the multiple linear regression with linear constraints on regression coefficients. It is used for identifying influential observations in constrained regression models. A numerical example is provided for illustration.
https://doi.org/10.5351/CKSS.2007.14.2.281 인용 PDF KSCI

Search Result 1,587, Processing Time 0.028 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)