• Title/Summary/Keyword: informative sampling

Search Result 26, Processing Time 0.021 seconds

Mean estimation of small areas using penalized spline mixed-model under informative sampling

  • Chytrasari, Angela N.R.;Kartiko, Sri Haryatmi;Danardono, Danardono
    • Communications for Statistical Applications and Methods
    • /
    • v.27 no.3
    • /
    • pp.349-363
    • /
    • 2020
  • Penalized spline is a suitable nonparametric approach in estimating mean model in small area. However, application of the approach in informative sampling in a published article is uncommon. We propose a semiparametric mixed-model using penalized spline under informative sampling to estimate mean of small area. The response variable is explained in terms of mean model, informative sample effect, area random effect and unit error. We approach the mean model by penalized spline and utilize a penalized spline function of the inclusion probability to account for the informative sample effect. We determine the best and unbiased estimators for coefficient model and derive the restricted maximum likelihood estimators for the variance components. A simulation study shows a decrease in the average absolute bias produced by the proposed model. A decrease in the root mean square error also occurred except in some quadratic cases. The use of linear and quadratic penalized spline to approach the function of the inclusion probability provides no significant difference distribution of root mean square error, except for few smaller samples.

Re-SSS: Rebalancing Imbalanced Data Using Safe Sample Screening

  • Shi, Hongbo;Chen, Xin;Guo, Min
    • Journal of Information Processing Systems
    • /
    • v.17 no.1
    • /
    • pp.89-106
    • /
    • 2021
  • Different samples can have different effects on learning support vector machine (SVM) classifiers. To rebalance an imbalanced dataset, it is reasonable to reduce non-informative samples and add informative samples for learning classifiers. Safe sample screening can identify a part of non-informative samples and retain informative samples. This study developed a resampling algorithm for Rebalancing imbalanced data using Safe Sample Screening (Re-SSS), which is composed of selecting Informative Samples (Re-SSS-IS) and rebalancing via a Weighted SMOTE (Re-SSS-WSMOTE). The Re-SSS-IS selects informative samples from the majority class, and determines a suitable regularization parameter for SVM, while the Re-SSS-WSMOTE generates informative minority samples. Both Re-SSS-IS and Re-SSS-WSMOTE are based on safe sampling screening. The experimental results show that Re-SSS can effectively improve the classification performance of imbalanced classification problems.

A Study on the Role of Pivots in Bayesian Statistics

  • Hwang, Hyungtae
    • Communications for Statistical Applications and Methods
    • /
    • v.9 no.1
    • /
    • pp.221-227
    • /
    • 2002
  • The concept of pivot has been widely used in various classical inferences. In this paper, it is proved by use of pivotal quantities that the Bayesian inferences can be arrived at the same results of classical inferences for the location-scale parameters models under the assumption of non-informative prior distributions. Some theorems are proposed in which the posterior distribution and the sampling distribution of a pivotal quantity coincide. The theorems are applied illustratively to some statistical models.

Estimation using informative sampling technique when response rate follows exponential function of variable of interest (응답률이 관심변수의 지수함수를 따를 경우 정보적 표본설계 기법을 이용한 모수추정)

  • Chung, Hee Young;Shin, Key-Il
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.6
    • /
    • pp.993-1004
    • /
    • 2017
  • A stratified sampling method is generally used with a sample selected using the same sample weight in each stratum in order to improve the accuracy of the sampling survey estimation. However, the weight should be adjusted to reflect the response rate if the response rate is affected by the value of the variable of interest. It may be also more effective to adjust the weights by subdividing the stratum rather than using the same weight if the variable of interest has a linear relationship with the continuous auxiliary variables. In this study, we propose a method to increase the accuracy of estimation using an informative sampling design technique when the response rate is an exponential function of the variable of interest and the variable of interest has a linear relationship with the auxiliary variable. Simulation results show the superiority of the proposed method.

A study on the determination of substrata using the information of exponential response rate by simulation studies (모의실험을 기반으로 지수형 응답률 보정을 위한 세부 층 결정에 관한 연구)

  • Min, Joo-Won;Shin, Key-Il
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.5
    • /
    • pp.621-636
    • /
    • 2018
  • Research on the application of informative sampling technique has been conducted in order to reduce the influence of non-response. Chung and Shin (Korean Journal of Applied Statistics, 30, 993-1004, 2017) showed that the estimation accuracy improved when using exponential response rate information for the parameter estimation if the distribution of errors included in the super population model follows normal distribution. However this method divides the stratum into equally spaced substrata to obtain the sample weight of the informative sampling technique and shows that the accuracy of the estimation improves as the number of substrata increases. In this study, with the given number of total sample size, the optimal substratum boundary points are calculated using equal space, quantile, and LH algorithm; consequently, the results using those methods are compared through simulation. We also studied the criteria to determine the number of substrata and substratum boundaries that can be used in practice with various types of auxiliary variable distributions.

Removing non-informative features weakening of class separability (클래스 구분력이 없는 특징 소거법)

  • Lee, Jae-Seong;Kim, Dae-Won
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2007.11a
    • /
    • pp.59-62
    • /
    • 2007
  • 본 논문에서는 불균형 및 Under-sampling된 바이오 데이터에 대하여 클래스 구분력이 없는 특징의 소거를 통해 이후 이어질 FLDA 둥 다양한 방법론올 적용할 수 있는 방법을 제안하고자 한다. 제안하는 알고리즘은 평균과 분산을 통해 클래스의 형태를 결정하는 기존 방법론의 문제점을 회피할 수 있는 방법을 제공하며, 클래스 구분력에 중점을 두어 특정을 선별하였을 경우 선별된 특정들의 상관 계수가 높은 문제를 극복할 수 있도록 한다. 이에 따라 알고리즘이 선택한 특정집합은 서로의 특징에 대해 상관계수가 낮으며, 클래스의 구분력이 높은 특정을 갖게 된다.

  • PDF

A Study on Incremental Learning Model for Naive Bayes Text Classifier (Naive Bayes 문서 분류기를 위한 점진적 학습 모델 연구)

  • 김제욱;김한준;이상구
    • The Journal of Information Technology and Database
    • /
    • v.8 no.1
    • /
    • pp.95-104
    • /
    • 2001
  • In the text classification domain, labeling the training documents is an expensive process because it requires human expertise and is a tedious, time-consuming task. Therefore, it is important to reduce the manual labeling of training documents while improving the text classifier. Selective sampling, a form of active learning, reduces the number of training documents that needs to be labeled by examining the unlabeled documents and selecting the most informative ones for manual labeling. We apply this methodology to Naive Bayes, a text classifier renowned as a successful method in text classification. One of the most important issues in selective sampling is to determine the criterion when selecting the training documents from the large pool of unlabeled documents. In this paper, we propose two measures that would determine this criterion : the Mean Absolute Deviation (MAD) and the entropy measure. The experimental results, using Renters 21578 corpus, show that this proposed learning method improves Naive Bayes text classifier more than the existing ones.

  • PDF

Impact of Resourcefulness and Communication Style on Nursing Performance in Hospital Nurses (간호사의 자원동원성, 의사소통유형이 간호업무성과에 미치는 영향)

  • Lee, Hea-Shoon;Oak, Ji-Won
    • Journal of Korean Academy of Fundamentals of Nursing
    • /
    • v.19 no.2
    • /
    • pp.253-260
    • /
    • 2012
  • Purpose: This study was done to identify the impact of resourcefulness and communication style on nursing performance in nurses working in hospitals. Method: Though a convenience sampling method 312 nurses were recruited between from July 4 and 17, 2011. Data were collected using a questionnaire, which included items on work related characteristics, resourcefulness, communication style, and nursing performance. Data were analyzed using t-test, ANOVA, Scheffe test, Pearson correlation coefficient and hierarchical regression analysis. Results: The major findings of this study were as follow; 1) There were significant relationships between nursing performance and resourcefulness ($p$<.001), informative communication style ($p$<.001), affiliativeness communication style ($p$<.001), and dominance communication style ($p$<.001). 2) The nursing performance was significantly associated with career in current department, resourcefulness, informative communication style, affiliativeness communication style, and dominance communication style in capability which explained 45.6% of variance in nursing performance. Conclusion: The results of this study demonstrate a relationship between resourcefulness, communication style, and nursing performance in hospital nurses indicating the need to use the study results to plan programs to prompt nurses in their use of resourcefulness and communication style in nursing care.

A study to improve the accuracy of the naive propensity score adjusted estimator using double post-stratification method (나이브 성향점수보정 추정량의 정확성 향상을 위한 이중 사후층화 방법 연구)

  • Leesu Yeo;Key-Il Shin
    • The Korean Journal of Applied Statistics
    • /
    • v.36 no.6
    • /
    • pp.547-559
    • /
    • 2023
  • Proper handling of nonresponse in sample survey improves the accuracy of the parameter estimation. Various studies have been conducted to properly handle MAR (missing at random) nonresponse or MCAR (missing completely at random) nonresponse. When nonresponse occurs, the PSA (propensity score adjusted) estimator is commonly used as a mean estimator. The PSA estimator is known to be unbiased when known sample weights and properly estimated response probabilities are used. However, for MNAR (missing not at random) nonresponse, which is affected by the value of the study variable, since it is very difficult to obtain accurate response probabilities, bias may occur in the PSA estimator. Chung and Shin (2017, 2022) proposed a post-stratification method to improve the accuracy of mean estimation when MNAR nonresponse occurs under a non-informative sample design. In this study, we propose a double post-stratification method to improve the accuracy of the naive PSA estimator for MNAR nonresponse under an informative sample design. In addition, we perform simulation studies to confirm the superiority of the proposed method.

Bayesian Inference on Variance Components Using Gibbs Sampling with Various Priors

  • Lee, C.;Wang, C.D.
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.14 no.8
    • /
    • pp.1051-1056
    • /
    • 2001
  • Data for teat number for Landrace (L), Yorkshire (Y), crossbred of Landrace and Yorkshire (LY), and crossbred of Landrace, Yorkshire and Chinese indigenous Min Pig (LYM) were analyzed using Gibbs sampling. In Bayesian inference, flat priors and some informative priors were used to examine their influence on posterior estimates. The posterior mean estimates of heritabilities with flat priors were $0.661{\pm}0.035$ for L, $0.540{\pm}0.072$ for Y, $0.789{\pm}0.074$ for LY, and $0.577{\pm}0.058$ for LYM, and they did not differ (p>0.05) from their corresponding estimates of REML. When inverse Gamma densities for variance components were used as priors with the shape parameter of 4, the posterior estimates were still corresponding (p>0.05) to REML estimates and mean estimates using Gibbs sampling with flat priors. However, when the inverse Gamma densities with the shape parameter of 10 were utilized, some posterior estimates differed (p<0.10) from REML estimates and/or from other Gibbs mean estimates. The use of moderate degree of belief was influential to the posterior estimates, especially for Y and for LY where data sizes were small. When the data size is small, REML estimates of variance components have unknown distributions. On the other hand, Bayesian approach gives exact posterior densities of variance components. However, when the data size is small and prior knowledge is lacked, researchers should be careful with even moderate priors.