• Title/Summary/Keyword: influential data point

Search Result 66, Processing Time 0.026 seconds

On Sensitivity Analysis in Principal Component Regression

  • Kim, Soon-Kwi;Park, Sung H.
    • Journal of the Korean Statistical Society
    • /
    • v.20 no.2
    • /
    • pp.177-190
    • /
    • 1991
  • In this paper, we discuss and review various measures which have been presented for studying outliers. high-leverage points, and influential observations when principal component regression is adopted. We suggest several diagnostics measures when principal component regression is used. A numerical example is illustrated. Some individual data points may be flagged as outliers, high-leverage point, or influential points.

  • PDF

Graphical Methods for the Sensitivity Analysis in Discriminant Analysis

  • Jang, Dae-Heung;Anderson-Cook, Christine M.;Kim, Youngil
    • Communications for Statistical Applications and Methods
    • /
    • v.22 no.5
    • /
    • pp.475-485
    • /
    • 2015
  • Similar to regression, many measures to detect influential data points in discriminant analysis have been developed. Many follow similar principles as the diagnostic measures used in linear regression in the context of discriminant analysis. Here we focus on the impact on the predicted classification posterior probability when a data point is omitted. The new method is intuitive and easily interpretable compared to existing methods. We also propose a graphical display to show the individual movement of the posterior probability of other data points when a specific data point is omitted. This enables the summaries to capture the overall pattern of the change.

Influential Factors on Quality of Recovery of Patients Undergone Cardiac Surgery (심장수술 환자의 회복의 질 영향 요인)

  • Kim, Su Youn
    • The Korean Journal of Rehabilitation Nursing
    • /
    • v.17 no.2
    • /
    • pp.64-71
    • /
    • 2014
  • Purpose: The purpose of this study was to identify the quality of recovery and influential factors on the quality of recovery after cardiac surgery. Methods: 198 patients undergone cardiac surgery were asked to fill in a self-reported questionnaire about the quality of recovery, anxiety, depression including social support at discharge. The collected data were analyzed with mean, standard deviation, correlation and stepwised multiple regression. Results: The mean scores of quality of recovery at discharge after cardiac surgery was 2.04 on a 3 point scale. Influential factors on the quality of recovery after cardiac surgery were depression(p=.001) and anxiety(p=.027), which disclosed 44.2% of explanation. Depression was the most influential factor. Conclusion: The influential factors on the quality of recovery at discharge after cardiac surgery were depression and anxiety. More studies will be required to reduce depression and anxiety in patients undergone cardiac surgery.

Firework plot as a graphical exploratory data analysis tool for evaluating the impact of outliers in skewness and kurtosis of univariate data (일변량 자료의 왜도와 첨도에서 특이점의 영향을 평가하기 위한 탐색적 자료분석 그림도구로서의 불꽃그림)

  • Moon, Sungho
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.2
    • /
    • pp.355-368
    • /
    • 2016
  • Outliers and influential data points distort many data analysis measures. Jang and Anderson-Cook (2014) proposed a graphical method called a rework plot for exploratory analysis purpose so that there could be a possible visualization of the trace of the impact of the possible outlying and/or influential data points on the univariate/bivariate data analysis and regression. They developed 3-D plot as well as pairwise plot for the appropriate measures of interest. This paper further extends their approach to identify its strength. We can use rework plots as a graphical exploratory data analysis tool to evaluate the impact of outliers in skewness and kurtosis of univariate data.

Bayesian inference for an ordered multiple linear regression with skew normal errors

  • Jeong, Jeongmun;Chung, Younshik
    • Communications for Statistical Applications and Methods
    • /
    • v.27 no.2
    • /
    • pp.189-199
    • /
    • 2020
  • This paper studies a Bayesian ordered multiple linear regression model with skew normal error. It is reasonable that the kind of inherent information available in an applied regression requires some constraints on the coefficients to be estimated. In addition, the assumption of normality of the errors is sometimes not appropriate in the real data. Therefore, to explain such situations more flexibly, we use the skew-normal distribution given by Sahu et al. (The Canadian Journal of Statistics, 31, 129-150, 2003) for error-terms including normal distribution. For Bayesian methodology, the Markov chain Monte Carlo method is employed to resolve complicated integration problems. Also, under the improper priors, the propriety of the associated posterior density is shown. Our Bayesian proposed model is applied to NZAPB's apple data. For model comparison between the skew normal error model and the normal error model, we use the Bayes factor and deviance information criterion given by Spiegelhalter et al. (Journal of the Royal Statistical Society Series B (Statistical Methodology), 64, 583-639, 2002). We also consider the problem of detecting an influential point concerning skewness using Bayes factors. Finally, concluding remarks are discussed.

A Study on the Relationship between Cyanobacteria and Environmental Factors in Yeongcheon Lake (영천호에서 남조류 발생과 환경요인의 관련성 연구)

  • Lee, Hyeon-Mi;Shin, Ra-Young;Lee, Jung-Ho;Park, Jong-geun
    • Journal of Korean Society on Water Environment
    • /
    • v.35 no.4
    • /
    • pp.352-361
    • /
    • 2019
  • The purpose of this study is to analyze the characteristics and correlations of the Yeongcheon Lake in order to reduce the occurrence of harmful cyanobacteria. In this study, we investigated the water quality and phytoplankton of the lake from May to November in 2017. Correlation and data mining analyses were performed to analyze the relationship between the two factors. The water temperature was lowest at the point where the Yeongcheon Lake inflow occurs at Imha Lake. It was highest at the point where the outflow occurs to Angye Lake. The pH was also highest at the outflow point, but in the case of DO, it was highest at the midpoint between the inflow and outflow. The main cyanobacteria that emerged during the study period were Oscillatorialimosa, Microcysti saeruginosa and Aphanizomenon flos-aquae. As a result of correlation analysis, the water temperature, inflow, COD loading, TOC loading at the inflow point of the Yeongcheon Lake were the items that were related to the harmful cyanobacteria. The data mining analysis indicated that the TP loading and harmful cyanobacteria in the inflow point of the Yeongcheon Lake were influential on the detrimental cyanobacteria in the Yeongcheon Lake outflow point. When the TP loading was less than 39.0 kg/day at the inflow site, it was expected that the amount of harmful cyanobacteria could be maintained below 10,000 cells/mL.

Firework Plot as a Graphical Exploratory Data Analysis Tool to Evaluate the Impact of Outliers in a Mixture Experiment (혼합물 실험에서 특이값의 영향을 평가하기 위한 그래픽 탐색적 자료분석 도구로서의 불꽃그림)

  • Jang, Dae-Heung;Ahn, SoJin;Kim, Youngil
    • The Korean Journal of Applied Statistics
    • /
    • v.27 no.4
    • /
    • pp.629-643
    • /
    • 2014
  • It is common to check the validity of an assumed model with the heavy use of diagnostics tools when conducting data analysis with regression techniques; however, outliers and influential data points often distort the regression output in undesired manner. Jang and Anderson-Cook (2013) proposed a graphical method called a firework plot for exploratory analysis that could visualize the trace of the impact of possible outlying and/or influential data points on individual regression coefficients and the overall residual sum of squares(SSE) measure. They developed 3-D plot as well as pair-wise plot for the appropriate measures of interest. In this paper, the approach was extended further to tell the strength of their approach; in addition, a more meaningful interpretation was possible by adding a measure not mentioned in their paper. This approach was applied to the mixture experiment because we felt that a detailed analysis of statistical measure sensitivity is required in a small experiment.

MULTIPLE OUTLIER DETECTION IN LOGISTIC REGRESSION BY USING INFLUENCE MATRIX

  • Lee, Gwi-Hyun;Park, Sung-Hyun
    • Journal of the Korean Statistical Society
    • /
    • v.36 no.4
    • /
    • pp.457-469
    • /
    • 2007
  • Many procedures are available to identify a single outlier or an isolated influential point in linear regression and logistic regression. But the detection of influential points or multiple outliers is more difficult, owing to masking and swamping problems. The multiple outlier detection methods for logistic regression have not been studied from the points of direct procedure yet. In this paper we consider the direct methods for logistic regression by extending the $Pe\tilde{n}a$ and Yohai (1995) influence matrix algorithm. We define the influence matrix in logistic regression by using Cook's distance in logistic regression, and test multiple outliers by using the mean shift model. To show accuracy of the proposed multiple outlier detection algorithm, we simulate artificial data including multiple outliers with masking and swamping.

The Influential Factors on Premenstrual Syndrome College Female Students (여대생의 월경전증후군에 영향을 미치는 요인)

  • Jung, Geum-Sook;Oh, Hyun-Mi;Choi, In-Ryoung
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.15 no.5
    • /
    • pp.3025-3036
    • /
    • 2014
  • This study was conducted to figure out the influential factors on premenstrual syndrome(PMS) of college female students which are to be utilized as the basic data to develop and apply programs for preventing and controlling such symptom. The subjects were 330 college female students. The data were collected from April 2, 2012 to April 6, 2012. From the results, There has been significant correlation between stress and PMS(r=.36, p<.001) and the attitude to menstruation has appeared to have significant positive correlation with PMS as well(r=.34, p<.001). Multiple regression analysis has been employed to identify the influential factors on PMS and the result has shown that menstrual attitude, grade point average for stress, smoking and dysmenorrhea have been the most significant influential factors with 27% of explanatory power. The level of significance has been high in menstrual attitude(${\beta}$=.28, p<.001), grade point average for stress(${\beta}$=.27, p<.001), smoking(${\beta}$=.20, p<.001) and dysmenorrhea(${\beta}$=.15, p<.001) respectively. In conclusion, it needs to find nursing interventions for PMS related to psychosocial factors and suggest a narrative study for improving quality of life of women with PMS.

A Study on the Creation of Slope Instability Map Using Geographic Information Systems. (GIS를 이용한 사면위험도 작성기법 연구)

  • 유명환
    • Economic and Environmental Geology
    • /
    • v.33 no.2
    • /
    • pp.129-138
    • /
    • 2000
  • The various types of geohazards like landslides resulted from civil construction (i.e. highway construction) must of analysed considering all the possible influential factor systematically. Thus, by using GIS, slope stability can be evaluated, and it can be used as a data for further detailed investigation. So the aim of this study is to present a data for decision making in selecting suitable point for remediation. For analysing slope instability, through appropriate definition and classification, landslide mechanism must be understood. In building GIS model, the selection of appropriate factors and their rating system should be made. For this, the characteristics and the mechanism of landslide have to be understood. And suitable coverage should be chosen for the model considering the slope conditions. In this study, field investigation in lst and 2nd Section, Chung-ang highway was carried out. From the field data, GIS model on slope instability was created. 5 coverages were used for it. From the result of this study, 12 unstable sections were found out and more detailed investigation is needed there.

  • PDF