• Title/Summary/Keyword: Data normality

Search Result 323, Processing Time 0.024 seconds

A Test for Multivariate Normality Focused on Elliptical Symmetry Using Mahalanobis Distances

  • Park, Cheol-Yong
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 2006.04a
    • /
    • pp.203-212
    • /
    • 2006
  • A chi-squared test of multivariate normality is suggested which is mainly focused on detecting deviations from elliptical symmetry. This test uses Mahalanobis distances of observations to have some power for deviations from multivariate normality. We derive the limiting distribution of the test statistic by a conditional limit theorem. A simulation study is conducted to study the accuracy of the limiting distribution in finite samples. Finally, we compare the power of our method with those of other popular tests of multivariate normality under two non-normal distributions.

  • PDF

A Test of Multivariate Normality Oriented for Testing Elliptical Symmetry

  • Park, Cheol-Yong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.17 no.1
    • /
    • pp.221-231
    • /
    • 2006
  • A chi-squared test of multivariate normality is suggested which is oriented for detecting deviations from elliptical symmetry. We derive the limiting distribution of the test statistic via a central limit theorem on empirical processes. A simulation study is conducted to study the accuracy of the limiting distribution in finite samples. Finally, we compare the power of our method with those of other popular tests of multivariate normality under a non-normal distribution.

  • PDF

Comprehensive comparison of normality tests: Empirical study using many different types of data

  • Lee, Chanmi;Park, Suhwi;Jeong, Jaesik
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.5
    • /
    • pp.1399-1412
    • /
    • 2016
  • We compare many normality tests consisting of different sources of information extracted from the given data: Anderson-Darling test, Kolmogorov-Smirnov test, Cramervon Mises test, Shapiro-Wilk test, Shaprio-Francia test, Lilliefors, Jarque-Bera test, D'Agostino' D, Doornik-Hansen test, Energy test and Martinzez-Iglewicz test. For the purpose of comparison, those tests are applied to the various types of data generated from skewed distribution, unsymmetric distribution, and distribution with different length of support. We then summarize comparison results in terms of two things: type I error control and power. The selection of the best test depends on the shape of the distribution of the data, implying that there is no test which is the most powerful for all distributions.

Numerical study on Jarque-Bera normality test for innovations of ARMA-GARCH models

  • Lee, Tae-Wook
    • Journal of the Korean Data and Information Science Society
    • /
    • v.20 no.2
    • /
    • pp.453-458
    • /
    • 2009
  • In this paper, we consider Jarque-Bera (JB) normality test for the innovations of ARMA-GARCH models. In financial applications, JB test based on the residuals are routinely used for the normality of ARMA-GARCH innovations without a justification. However, the validity of JB test should be justified in advance of the actual practice (Lee et al., 2009). Through the simulation study, it is found that the validity of JB test depends on the shape of test statistic. Specifically, when the constant term is involved in ARMA model, a certain type of residual based JB test produces severe size distortions.

  • PDF

Effects of Non-normality on the Performance of Univariate and Multivariate CUSUM Control Charts (비정규 모집단에 대한 일변량 및 다변량 누적합 관리도의 성능 분석)

  • Chang, Young-Soon
    • Journal of Korean Society for Quality Management
    • /
    • v.34 no.4
    • /
    • pp.102-109
    • /
    • 2006
  • This paper investigates the effects of non-normality on the performance of univariate and multivariate cumulative sum(CUSUM) control charts for monitoring the process mean. In-control and out-of-control average run lengths of the charts are examined for the univariate/multivariate lognormal and t distributions. The effects of the reference value and the correlation coefficient under the non-normal distributions are also studied. Simulation results show that the CUSUM charts with small reference values are robust to non-normality but those with moderate or large reference values are sensitive to non-normal data especially to process data from skewed distributions. The performance of the chart to detect mean shift of a process is not invariant to the direction of the shift for skewed distributions.

Improvement of generalization of linear model through data augmentation based on Central Limit Theorem (데이터 증가를 통한 선형 모델의 일반화 성능 개량 (중심극한정리를 기반으로))

  • Hwang, Doohwan
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.2
    • /
    • pp.19-31
    • /
    • 2022
  • In Machine learning, we usually divide the entire data into training data and test data, train the model using training data, and use test data to determine the accuracy and generalization performance of the model. In the case of models with low generalization performance, the prediction accuracy of newly data is significantly reduced, and the model is said to be overfit. This study is about a method of generating training data based on central limit theorem and combining it with existed training data to increase normality and using this data to train models and increase generalization performance. To this, data were generated using sample mean and standard deviation for each feature of the data by utilizing the characteristic of central limit theorem, and new training data was constructed by combining them with existed training data. To determine the degree of increase in normality, the Kolmogorov-Smirnov normality test was conducted, and it was confirmed that the new training data showed increased normality compared to the existed data. Generalization performance was measured through differences in prediction accuracy for training data and test data. As a result of measuring the degree of increase in generalization performance by applying this to K-Nearest Neighbors (KNN), Logistic Regression, and Linear Discriminant Analysis (LDA), it was confirmed that generalization performance was improved for KNN, a non-parametric technique, and LDA, which assumes normality between model building.

A note on Box-Cox transformation and application in microarray data

  • Rahman, Mezbahur;Lee, Nam-Yong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.22 no.5
    • /
    • pp.967-976
    • /
    • 2011
  • The Box-Cox transformation is a well known family of power transformations that brings a set of data into agreement with the normality assumption of the residuals and hence the response variable of a postulated model in regression analysis. Normalization (studentization) of the regressors is a common practice in analyzing microarray data. Here, we implement Box-Cox transformation in normalizing regressors in microarray data. Pridictabilty of the model can be improved using data transformation compared to studentization.

An Analysis of Panel Count Data from Multiple random processes

  • Park, You-Sung;Kim, Hee-Young
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2002.11a
    • /
    • pp.265-272
    • /
    • 2002
  • An Integer-valued autoregressive integrated (INARI) model is introduced to eliminate stochastic trend and seasonality from time series of count data. This INARI extends the previous integer-valued ARMA model. We show that it is stationary and ergodic to establish asymptotic normality for conditional least squares estimator. Optimal estimating equations are used to reflect categorical and serial correlations arising from panel count data and variations arising from three random processes for obtaining observation into estimation. Under regularity conditions for martingale sequence, we show asymptotic normality for estimators from the estimating equations. Using cancer mortality data provided by the U.S. National Center for Health Statistics (NCHS), we apply our results to estimate the probability of cells classified by 4 causes of death and 6 age groups and to forecast death count of each cell. We also investigate impact of three random processes on estimation.

  • PDF

A Note on the Chi-Square Test for Multivariate Normality Based on the Sample Mahalanobis Distances

  • Park, Cheolyong
    • Journal of the Korean Statistical Society
    • /
    • v.28 no.4
    • /
    • pp.479-488
    • /
    • 1999
  • Moore and Stubblebine(1981) suggested a chi-square test for multivariate normality based on cell counts calculated from the sample Mahalanobis distances. They derived the limiting distribution of the test statistic only when equiprobable cells are employed. Using conditional limit theorems, we derive the limiting distribution of the statistic as well as the asymptotic normality of the cell counts. These distributions are valid even when equiprobable cells are not employed. We finally apply this method to a real data set.

  • PDF

The Rao-Robson Chi-Squared Test for Multivariate Structure

  • Park, Cheol-Yong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.14 no.4
    • /
    • pp.1013-1021
    • /
    • 2003
  • Huffer and Park (2002) proposed a chi-squared test for multivariate structure. Their test detects the deviation of data from mutual independence or multivariate normality. We will compute the Rao-Robson chi-squared version of the test, which is easy to apply in practice since it has a limiting chi-squared distribution. We will provide a self-contained argument that it has a limiting chi-squared distribution. We study the accuracy in finite samples of the limiting distribution. We finally compare the power of our test with those of other popular normality tests in an application to a real data.

  • PDF