• Title/Summary/Keyword: multivariate stratification

Search Result 29, Processing Time 0.018 seconds

Multivariate Stratification under Consideration of Outliers (이상점을 고려한 다변량 층화)

  • Park, Jin-Woo;Yun, Seok-Hoon
    • The Korean Journal of Applied Statistics
    • /
    • v.21 no.3
    • /
    • pp.377-385
    • /
    • 2008
  • Most of the sample surveys conducted by several statistics preparation agencies are multipurpose surveys inquiring into several distinguishing items through a single sample. In a multipurpose sample design, the stratification tends to be very complex since the stratification variables which are both multivariate and heterogeneous must be considered collectively. In this paper we point out an outlier effect in a multivariate stratification to which the K-means clustering method is applied and propose to consider outliers prior to the stratification step. We also show an empirical stratification effect under consideration of outliers through a case study of sample design for The Rural Living Indicators.

Simple Compromise Strategies in Multivariate Stratification

  • Park, Inho
    • Communications for Statistical Applications and Methods
    • /
    • v.20 no.2
    • /
    • pp.97-105
    • /
    • 2013
  • Stratification (among other applications) is a popular technique used in survey practice to improve the accuracy of estimators. Its full potential benefit can be gained by the effective use of auxiliary variables in stratification related to survey variables. This paper focuses on the problem of stratum formation when multiple stratification variables are available. We first review a variance reduction strategy in the case of univariate stratification. We then discuss its use for multivariate situations in convenient and efficient ways using three methods: compromised measures of size, principal components analysis and a K-means clustering algorithm. We also consider three types of compromising factors to data when using these three methods. Finally, we compare their efficiency using data from MU281 Swedish municipality population.

Multivariate Stratification Method for the Multipurpose Sample Survey : A Case Study of the Sample Design for Fisher Production Survey (다목적 표본조사를 위한 다변량 층화 : 어업비계통생산량조사를 위한 표본설계 사례)

  • Park, Jin-Woo;Kim, Young-Won;Lee, Seok-Hoon;Shin, Ji-Eun
    • Survey Research
    • /
    • v.9 no.1
    • /
    • pp.69-85
    • /
    • 2008
  • Stratification is a feature of the majority of field sample design. This paper considers the multivariate stratification strategy for multipurpose sample survey with several auxiliary variables. In a multipurpose survey, stratification procedure is very complicated because we have to simultaneously consider the efficiencies of stratification for several variables of interest. We propose stratification strategy based on factor analysis and cluster analysis using several stratification variables. To improve the efficiency of stratification, we first select the stratification variables by factor analysis, and then apply the K-means clustering algorithm to the formation of strata. An application of the stratification strategy in the sampling design for the Fisher Production Survey is discussed, and it turns out that the variances of estimators are significantly less than those obtained by simple random sampling.

  • PDF

A Study on the Use of Cluster Analysis for Multivariate and Multipurpose Stratification (군집분석을 이용한 다목적 조사의 층화에 관한 연구)

  • Park, Jin-Woo;Yun, Seok-Hoon;Kim, Jin-Heum;Jeong, Hyeong-Chul
    • The Korean Journal of Applied Statistics
    • /
    • v.20 no.2
    • /
    • pp.387-394
    • /
    • 2007
  • This paper considers several stratification strategies for multivariate and multipurpose survey with several quantitative stratification variables. We propose three methods of stratification based on, respectively, the method of cumulative frequency square root which is the most popular one in univariate stratification, cluster analysis, and factor analysis followed by cluster analysis. We then compare the efficiency of those methods using the Dong-Eup-Myun data of the holding numbers of farming machines, extracted from the 2001 Agricultural Census. It turned out that the method based on cluster analysis with factor analysis would be a relatively satisfactory strategy.

A Post-stratified Estimation in Multivariate Stratified Sampling Surveys

  • Park, Jinwoo
    • Communications for Statistical Applications and Methods
    • /
    • v.6 no.3
    • /
    • pp.755-760
    • /
    • 1999
  • In multivariate stratified sampling surveys it is general to use a few stratification variables which are highly correlated with the important variables at design stage. But there might be some secondary study variables which are not so highly correlated with those stratification variables. In that case it is not efficient to use the same type of estimator due to the secondary variables as the one base on the important variables. A post-stratified estimation is proposed to increase the efficiency of the estimator with existence of secondary variables. The proposed method is illustrated with a set of fishery household population survey data.

  • PDF

Feedwater Flowrate Estimation Based on the Two-step De-noising Using the Wavelet Analysis and an Autoassociative Neural Network

  • Gyunyoung Heo;Park, Seong-Soo;Chang, Soon-Heung
    • Nuclear Engineering and Technology
    • /
    • v.31 no.2
    • /
    • pp.192-201
    • /
    • 1999
  • This paper proposes an improved signal processing strategy for accurate feedwater flowrate estimation in nuclear power plants. It is generally known that ∼2% thermal power errors occur due to fouling Phenomena in feedwater flowmeters. In the strategy Proposed, the noises included in feedwater flowrate signal are classified into rapidly varying noises and gradually varying noises according to the characteristics in a frequency domain. The estimation precision is enhanced by introducing a low pass filter with the wavelet analysis against rapidly varying noises, and an autoassociative neural network which takes charge of the correction of only gradually varying noises. The modified multivariate stratification sampling using the concept of time stratification and MAXIMIN criteria is developed to overcome the shortcoming of a general random sampling. In addition the multi-stage robust training method is developed to increase the quality and reliability of training signals. Some validations using the simulated data from a micro-simulator were carried out. In the validation tests, the proposed methodology removed both rapidly varying noises and gradually varying noises respectively in each de-noising step, and 5.54% root mean square errors of initial noisy signals were decreased to 0.674% after de-noising. These results indicate that it is possible to estimate the reactor thermal power more elaborately by adopting this strategy.

  • PDF

Stratification Method Using κ-Spatial Medians Clustering (κ-공간중위 군집방법을 활용한 층화방법)

  • Son, Soon-Chul;Jhun, Myoung-Shic
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.4
    • /
    • pp.677-686
    • /
    • 2009
  • Stratification of population is widely used to improve the efficiency of the estimation in a sample survey. However, it causes several problems when there are some variables containing outliers. To overcome these problems, Park and Yun (2008) proposed a rather subjective method, which finds outliers before $\kappa$-means clustering for stratification. In this study, we propose the $\kappa$-spatial medians clustering method which is more robust than $\kappa$-means clustering method and also does not need the process of finding outliers in advance. We investigate the characteristics of the proposed method through a case study used in Park and Yun (2008) and confirm the efficiency of the proposed method.

Complex sample design effects and inference for Korea National Health and Nutrition Examination Survey data (국민건강영양조사 자료의 복합표본설계효과와 통계적 추론)

  • Chung, Chin-Eun
    • Journal of Nutrition and Health
    • /
    • v.45 no.6
    • /
    • pp.600-612
    • /
    • 2012
  • Nutritional researchers world-wide are using large-scale sample survey methods to study nutritional health epidemiology and services utilization in general, non-clinical populations. This article provides a review of important statistical methods and software that apply to descriptive and multivariate analysis of data collected in sample surveys, such as national health and nutrition examination survey. A comparative data analysis of the Korea National Health and Nutrition Examination Survey (KNHANES) was used to illustrate analytical procedures and design effects for survey estimates of population statistics, model parameters, and test statistics. This article focused on the following points, method of approach to analyze of the sample survey data, right software tools available to perform these analyses, and correct survey analysis methods important to interpretation of survey data. It addresses the question of approaches to analysis of complex sample survey data. The latest developments in software tools for analysis of complex sample survey data are covered, and empirical examples are presented that illustrate the impact of survey sample design effects on the parameter estimates, test statistics, and significance probabilities (p values) for univariate and multivariate analyses.

Feedback on Baseline Use of Staging Images is Important to Improve Image Overuse with Newly Diagnosed Prostate Cancer Patients

  • Sawazaki, Harutake;Sengiku, Atsushi;Imamura, Masaaki;Takahashi, Takeshi;Kobayashi, Hisato;Ogura, Keiji
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.15 no.4
    • /
    • pp.1707-1710
    • /
    • 2014
  • Background: The objective of this study was to evaluate baseline use and positive rates of staging images (bone scan, CT) in newly diagnosed patients with prostate cancer (PCa) and to improve staging image overuse. Materials and Methods: This retrospective study covered a consecutive series of patients with PCa who underwent stage imaging at our institution between 2006 and 2011. Various clinical and pathological variables (age, PSA, biopsy Gleason score, clinical T stage, positive biopsy core rate) were evaluated by multivariate logistic regression analysis for their ability to predict a positive staging image. All patients were stratified according to the NCCN risk stratification and positive rates were compared in each risk group. Results: 410 patients (100%) underwent a bone scan and 315 patients (76.8%) underwent a CT scan. Some 51 patients (12.4%) had a positive bone scan, clinical T3 and T4 being significant independent predictors. Positive bone scan rates for low-, intermediate-, high-, and very high-risk groups were 0%, 0%, 8.25%, and 56.6%. Some 59 (18.7%) patients had a positive CT scan, with elevated PSA and clinical T3, T4 as significant independent predictors. Low-, intermediate-, high- and very high-risk group rates were 0%, 0%, 13.8% and 80.0%. Conclusions: The incidences of positive staging image in low- and intermediate- risk group were reasonably low. Following feedback on these results, staging in low- and intermediate- risk groups could be omitted.

A New Inflammatory Prognostic Index, Based on C-reactive Protein, the Neutrophil to Lymphocyte Ratio and Serum Albumin is Useful for Predicting Prognosis in Non-Small Cell Lung Cancer Cases

  • Dirican, Nigar;Dirican, Ahmet;Anar, Ceyda;Atalay, Sule;Ozturk, Onder;Bircan, Ahmet;Akkaya, Ahmet;Cakir, Munire
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.17 no.12
    • /
    • pp.5101-5106
    • /
    • 2016
  • Purpose: We aimed to establish an inflammatory prognostic index (IPI) in early and advanced non-small cell lung cancer (NSCLC) patients based on hematologic and biochemical parameters and to analyze its predictive value for NSCLC survival. Materials and Methods: A retrospective review of 685 patients with early and advanced NSCLC diagnosed between 2009 and 2014 was conducted with collection of clinical, and laboratory data. The IPI was calculated as C-reactive protein ${\times}$ NLR (neutrophil/ lymphocyte ratio)/serum albumin. Univariate and multivariate analyses were performed to assess the prognostic value of relevant factors. Results: The optimal cut-off value of IPI for overall survival (OS) stratification was determined to be 15. Totals of 334 (48.8%) and 351 (51.2%) patients were assigned to high and low IPI groups, respectively. Compared with low IPI, high IPI was associated with older age, greater tumor size, high lymph node involvement, distant metastases, advanced stage and poor performance status. Median OS was worse in the high IPI group (low vs high, 8.0 vs 34.0 months; HR, 3.5; p<0.001). Progression free survival values of the patients who had high vs low IPI were determined 6 months (95% CI:5.3-6.6) and 14 months (95% CI:12.1-15.8), respectively (HR; 2.4, P<0.001). On multivariate analysis, stage, performance status, lactate dehydrogenase and IPI were independent prognostic factors for OS. Subgroup analysis showed IPI was generally a significant prognostic factor in all clinical variables. Conclusion: The described IPI may be an inexpensive, easily accessible and independent prognostic index for NSCLC patients, useful for clinical practice.