• Title/Summary/Keyword: Statistical Selection Method

Search Result 498, Processing Time 0.027 seconds

Threshold Selection Method in Gray Images Based on Interval-Valued Fuzzy Sets (구간값 퍼지집합을 이용한 그레이 영상에서의 임계값 선택방법)

  • Son, Chang-S.;Chung, Hwan-M.;Seo, Suk-T.;Kwon, Soon-H.
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.17 no.4
    • /
    • pp.443-450
    • /
    • 2007
  • In this paper, we propose a novel threshold selection method based on statistical information on gray-levels of given images and interval-valued fuzzy sets. In the proposed threshold selection method, the interval-valued fuzzy set is used to represent more definitely the relationship between a pixel and its belonging region, that is, the object and the background. Also the statistical information on gray-level is used to determine the rules and partitions of interval-valued fuzzy sets. To show the validity of the proposed method, we compared the performance of the proposed with those of conventional methods such as Otsu's method, Huang and Wang's method applied to 5 test images with various types of histograms.

Two-stage imputation method to handle missing data for categorical response variable

  • Jong-Min Kim;Kee-Jae Lee;Seung-Joo Lee
    • Communications for Statistical Applications and Methods
    • /
    • v.30 no.6
    • /
    • pp.577-587
    • /
    • 2023
  • Conventional categorical data imputation techniques, such as mode imputation, often encounter issues related to overestimation. If the variable has too many categories, multinomial logistic regression imputation method may be impossible due to computational limitations. To rectify these limitations, we propose a two-stage imputation method. During the first stage, we utilize the Boruta variable selection method on the complete dataset to identify significant variables for the target categorical variable. Then, in the second stage, we use the important variables for the target categorical variable for logistic regression to impute missing data in binary variables, polytomous regression to impute missing data in categorical variables, and predictive mean matching to impute missing data in quantitative variables. Through analysis of both asymmetric and non-normal simulated and real data, we demonstrate that the two-stage imputation method outperforms imputation methods lacking variable selection, as evidenced by accuracy measures. During the analysis of real survey data, we also demonstrate that our suggested two-stage imputation method surpasses the current imputation approach in terms of accuracy.

A Study on Bandwith Selection Based on ASE for Nonparametric Regression Estimator

  • Kim, Tae-Yoon
    • Journal of the Korean Statistical Society
    • /
    • v.30 no.1
    • /
    • pp.21-30
    • /
    • 2001
  • Suppose we observe a set of data (X$_1$,Y$_1$(, …, (X$_{n}$,Y$_{n}$) and use the Nadaraya-Watson regression estimator to estimate m(x)=E(Y│X=x). in this article bandwidth selection problem for the Nadaraya-Watson regression estimator is investigated. In particular cross validation method based on average square error(ASE) is considered. Theoretical results here include a central limit theorem that quantifies convergence rates of the bandwidth selector.tor.

  • PDF

Efficient Controlled Selection

  • Ryu, Jea-Bok;Lee, Seung-Joo
    • Communications for Statistical Applications and Methods
    • /
    • v.4 no.1
    • /
    • pp.151-159
    • /
    • 1997
  • In sample surveys, we expect preferred samples that reduce the survey cost and increase the precision of estimators will be selected. Goodman and Kish (1950) introduced controlled selection as a method of sample selection that increases the probability of drawing preferred samples, while decreases the probability of drawing nonpreferred samples. In this paper, we obtain the controlled plans using the maximum entropy principle, and when the order of nonpreferred samples is considered, we propose the algorithm to obtain a controlled plan.

  • PDF

Bias Reduction in Split Variable Selection in C4.5

  • Shin, Sung-Chul;Jeong, Yeon-Joo;Song, Moon Sup
    • Communications for Statistical Applications and Methods
    • /
    • v.10 no.3
    • /
    • pp.627-635
    • /
    • 2003
  • In this short communication we discuss the bias problem of C4.5 in split variable selection and suggest a method to reduce the variable selection bias among categorical predictor variables. A penalty proportional to the number of categories is applied to the splitting criterion gain of C4.5. The results of empirical comparisons show that the proposed modification of C4.5 reduces the size of classification trees.

A LOWER BOUND ON THE PROBABILITY OF CORRECT SELECTIONFOR TWO-STAGE SELECTION PROCEDURE

  • Kim, Soon-Ki
    • Journal of the Korean Statistical Society
    • /
    • v.21 no.1
    • /
    • pp.27-34
    • /
    • 1992
  • This paper provides a method of obtaining a lower bound on the probability of correct selection for a two-stage selection procedure. The resulting lower bound sharpens that by Tamhane and Bechhofer (1979) for the normal means problem with a common known variance. The design constants associated with the lower bound are computed and the results of the performance comparisons are given.

  • PDF

Computation and Smoothing Parameter Selection In Penalized Likelihood Regression

  • Kim Young-Ju
    • Communications for Statistical Applications and Methods
    • /
    • v.12 no.3
    • /
    • pp.743-758
    • /
    • 2005
  • This paper consider penalized likelihood regression with data from exponential family. The fast computation method applied to Gaussian data(Kim and Gu, 2004) is extended to non Gaussian data through asymptotically efficient low dimensional approximations and corresponding algorithm is proposed. Also smoothing parameter selection is explored for various exponential families, which extends the existing cross validation method of Xiang and Wahba evaluated only with Bernoulli data.

Interval Regression Models Using Variable Selection

  • Choi Seung-Hoe
    • Communications for Statistical Applications and Methods
    • /
    • v.13 no.1
    • /
    • pp.125-134
    • /
    • 2006
  • This study confirms that the regression model of endpoint of interval outputs is not identical with that of the other endpoint of interval outputs in interval regression models proposed by Tanaka et al. (1987) and constructs interval regression models using the best regression model given by variable selection. Also, this paper suggests a method to minimize the sum of lengths of a symmetric difference among observed and predicted interval outputs in order to estimate interval regression coefficients in the proposed model. Some examples show that the interval regression model proposed in this study is more accuracy than that introduced by Inuiguchi et al. (2001).

Hierarchical Bayesian Inference of Binomial Data with Nonresponse

  • Han, Geunshik;Nandram, Balgobin
    • Journal of the Korean Statistical Society
    • /
    • v.31 no.1
    • /
    • pp.45-61
    • /
    • 2002
  • We consider the problem of estimating binomial proportions in the presence of nonignorable nonresponse using the Bayesian selection approach. Inference is sampling based and Markov chain Monte Carlo (MCMC) methods are used to perform the computations. We apply our method to study doctor visits data from the Korean National Family Income and Expenditure Survey (NFIES). The ignorable and nonignorable models are compared to Stasny's method (1991) by measuring the variability from the Metropolis-Hastings (MH) sampler. The results show that both models work very well.

Variable Selection Based on Direction Vectors

  • Kyungmee Choi
    • Communications for Statistical Applications and Methods
    • /
    • v.5 no.1
    • /
    • pp.25-33
    • /
    • 1998
  • We review a multivariate version of Kendall's tau based on direction vectors of observations. And with this statistic we propose an analog of the forward variable selection method which selects a set of independent variables for further studies to build the eventual predicting model. This method does not assume the distributions of observations and the linear model and it is strong to the outliers with high asymptotic efficiencies relative to the parametric Pearson's correlation coefficient.

  • PDF