• Title/Summary/Keyword: diagnostic statistic

Search Result 44, Processing Time 0.029 seconds

A simple diagnostic statistic for determining the size of random forest (랜덤포레스트의 크기 결정을 위한 간편 진단통계량)

  • Park, Cheolyong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.4
    • /
    • pp.855-863
    • /
    • 2016
  • In this study, a simple diagnostic statistic for determining the size of random forest is proposed. This method is based on MV (margin of victory), a scaled difference in the votes at the infinite forest between the first and second most popular categories of the current random forest. We can note that if MV is negative then there is discrepancy between the current and infinite forests. More precisely, our method is based on the proportion of cases that -MV is greater than a fixed small positive number (say, 0.03). We derive an appropriate diagnostic statistic for our method and then calculate the distribution of the statistic. A simulation study is performed to compare our method with a recently proposed diagnostic statistic.

Testing Homogeneity for Random Effects in Linear Mixed Model

  • Ahn, Chul H.
    • Communications for Statistical Applications and Methods
    • /
    • v.7 no.2
    • /
    • pp.403-414
    • /
    • 2000
  • A diagnostic tool for testing homogeneity for random effects is proposed in unbalanced linear mixed model based on score statistic. The finite sample behavior of the test statistic is examined using Monte Carlo experiments examine the chi-square approximation of the test statistic under the null hypothesis.

  • PDF

A case-by-case version of CB statistic in biased estimation

  • Ahn, Byoung Jin
    • Journal of Korean Society for Quality Management
    • /
    • v.19 no.2
    • /
    • pp.40-51
    • /
    • 1991
  • The $C_B$ statistic, a generalization of Mallows's $C_L$ statistic, is developed to determine the shrinkage parameter. Since not all cases in a data set play an equal role in forming $C_B$, a subdivision of $C_B$ into individual components for each case is developed. This subdivision is useful both as an aid in understanding $C_B$ and as a diagnostic procedure.

  • PDF

A study of a new statistic for detection of outliers and/or influential observations in regression diagnostics (회귀진단에서 이상치와 영향관측치를 동시에 발견하는 새로운 통계량에 관한 연구)

  • 강은미
    • The Korean Journal of Applied Statistics
    • /
    • v.6 no.1
    • /
    • pp.67-78
    • /
    • 1993
  • A new diagnostic statistic for detecting outliers and influential observations in linear models is suggested and studied in this paper. The proposed statistic is a weighted sum of two measures; one is for detecting outliers and the other is for detecting influential observations. The merit of this statistic is that it is possible to distinguish outliers from influential observations. We have done some Monte-Carlo Simulation to find the probability distribution of this statistic.

  • PDF

A Study on Detection of Outliers and Influential Observations in Linear Models

  • Kang, Eun M.;Park, Sung H.
    • Journal of Korean Society for Quality Management
    • /
    • v.16 no.2
    • /
    • pp.18-33
    • /
    • 1988
  • A new diagnostic statistic for detecting outliers and influential observations in linear models is suggested and studied in this paper. The proposed statistic is a weighted sum of two measures ; one is for detecting outliers and the other is for detecting influential ovservations. The merit of this statistic is that it is possible to distinguish outliers from influential observations. This statistic can be used for not only regression models but also factorial design models. A Monte Carlo simulation study is reported to suggest critical values for detecting outliers and influential observations for simple regression models when the number of observations is 11. 21, 31, 41 or 51.

  • PDF

Deletion diagnostics in fitting a given regression model to a new observation

  • Kim, Myung Geun
    • Communications for Statistical Applications and Methods
    • /
    • v.23 no.3
    • /
    • pp.231-239
    • /
    • 2016
  • A graphical diagnostic method based on multiple case deletions in a regression context is introduced by using the sampling distribution of the difference between two least squares estimators with and without multiple cases. Principal components analysis plays a key role in deriving this diagnostic method. Multiple case deletions of test statistic are also considered when a new observation is fitted to a given regression model. The result is useful for detecting influential observations in econometric data analysis, for example in checking whether the consumption pattern at a later time is the same as the one found before or not, as well as for investigating the influence of cases in the usual regression model. An illustrative example is given.

Goodness-of Fit Tests in Regression via Nonparametric Function Techniques

  • Kim, Jong-Tae;Moon, Gyoung-Ae
    • Journal of the Korean Data and Information Science Society
    • /
    • v.5 no.2
    • /
    • pp.95-106
    • /
    • 1994
  • A proposed test statistic is obtained by multiplying constant weights by the Neumann smooth type statistic discussed by Eubank and Hart(1993) in order to observe the effect of weight. It has very good results of power studies. Another advantage of this test is that it simultaneously provides an important diagnostic tools that can be used in many cases to determine how the model should be adjusted.

  • PDF

Diagnostics for Heteroscedasticity in Mixed Linear Models

  • Ahn, Chul-Hwan
    • Journal of the Korean Statistical Society
    • /
    • v.19 no.2
    • /
    • pp.171-175
    • /
    • 1990
  • A diagnostic test for detecting nonconstant variance in mixed linear models based on the score statistic is derived through the technique of model expansion, and compared to the log likelihood ratio test.

  • PDF

A measure of discrepancy based on margin of victory useful for the determination of random forest size (랜덤포레스트의 크기 결정에 유용한 승리표차에 기반한 불일치 측도)

  • Park, Cheolyong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.3
    • /
    • pp.515-524
    • /
    • 2017
  • In this study, a measure of discrepancy based on MV (margin of victory) has been suggested that might be useful in determining the size of random forest for classification. Here MV is a scaled difference in the votes, at infinite random forest, of two most popular classes of current random forest. More specifically, max(-MV,0) is proposed as a reasonable measure of discrepancy by noting that negative MV values mean a discrepancy in two most popular classes between the current and infinite random forests. We propose an appropriate diagnostic statistic based on this measure that might be useful for the determination of random forest size, and then we derive its asymptotic distribution. Finally, a simulation study has been conducted to compare the performances, in finite samples, between this proposed statistic and other recently proposed diagnostic statistics.

Influence Measures for the Likelihood Ratio Test on Independence of Two Random Vectors

  • Jung, Kang-Mo
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 2001.10a
    • /
    • pp.13-16
    • /
    • 2001
  • We compare methods for detecting influential observations that have a large influence on the likelihood ratio test statistics that the two sets of variables are uncorrelated with one another. For this purpose we derive results of the deletion diagnostic, the influence function, the standardized influence matrix and the local influence. An illustrative example is given.

  • PDF