• 제목/요약/키워드: multivariate stratification

검색결과 31건 처리시간 0.018초

이상점을 고려한 다변량 층화 (Multivariate Stratification under Consideration of Outliers)

  • 박진우;윤석훈
    • 응용통계연구
    • /
    • 제21권3호
    • /
    • pp.377-385
    • /
    • 2008
  • 여러 통계작성기관에서 실시하는 대부분의 표본조사들은 하나의 표본을 통해 서로 다른 여러 항목들을 조사하는 다목적조사이다. 다목적표본설계에서 층화변수들은 다변량이고 또한 서로 이질적인 속성을 지니는 관심변수들을 종합적으로 고려해야 하므로 층화는 매우 복잡한 양상을 띤다. 본 연구는 K-평균군집법을 적용한 다변량 층화에서 이상점의 효과를 지적하고, 층화 단계에서 사전에 이상점을 고려할 것을 제안하는 연구이다. 농촌생활지표조사를 위한 표본설계의 사례를 통해 이상점을 고려한 층화의 효과를 실증적으로 보인다.

Simple Compromise Strategies in Multivariate Stratification

  • Park, Inho
    • Communications for Statistical Applications and Methods
    • /
    • 제20권2호
    • /
    • pp.97-105
    • /
    • 2013
  • Stratification (among other applications) is a popular technique used in survey practice to improve the accuracy of estimators. Its full potential benefit can be gained by the effective use of auxiliary variables in stratification related to survey variables. This paper focuses on the problem of stratum formation when multiple stratification variables are available. We first review a variance reduction strategy in the case of univariate stratification. We then discuss its use for multivariate situations in convenient and efficient ways using three methods: compromised measures of size, principal components analysis and a K-means clustering algorithm. We also consider three types of compromising factors to data when using these three methods. Finally, we compare their efficiency using data from MU281 Swedish municipality population.

다목적 표본조사를 위한 다변량 층화 : 어업비계통생산량조사를 위한 표본설계 사례 (Multivariate Stratification Method for the Multipurpose Sample Survey : A Case Study of the Sample Design for Fisher Production Survey)

  • 박진우;김영원;이석훈;신지은
    • 한국조사연구학회지:조사연구
    • /
    • 제9권1호
    • /
    • pp.69-85
    • /
    • 2008
  • 층화는 표본설계 단계에서 예비정보를 활용하는 대표적인 방법으로 대부분의 전국 단위의 표본설계에서 널리 활용된다. 층화의 효율을 극대화시키기 위해서는 조사목적에 부합되는 적절한 층화변수를 선택하는 것이 매우 중요하다. 하나의 표본을 통해 여러 개의 관심변수를 동시에 조사하는 다목적조사에서 다변량 층화변수가 있을 때 층화 전략을 세우는 것은 매우 복잡한 양상을 띤다. 본 연구에서는 관심변수의 수가 매우 많은 다목적조사를 위한 층화전략을 다룬다. 층화를 위해 구체적으로 사용하는 통계적 도구는 요인분석과 군집분석 등의 다변량 통계기법인데, 먼저 요인분석을 통해 적절한 층화변수들을 선정한 후 그 변수들을 이용하여 군집분석을 통해 층화를 하는 전략을 소개한다. 본 연구에서는 구체적으로 해양수산부의 어업비계통생산량조사를 위한 표본설계에서의 층화과정을 다룬다.

  • PDF

군집분석을 이용한 다목적 조사의 층화에 관한 연구 (A Study on the Use of Cluster Analysis for Multivariate and Multipurpose Stratification)

  • 박진우;윤석훈;김진흠;정형철
    • 응용통계연구
    • /
    • 제20권2호
    • /
    • pp.387-394
    • /
    • 2007
  • 본 연구는 여러 가지의 양적변수들을 조사하는 다목적, 다변량조사 표본설계에서 층화 문제를 다룬다. 다변량 층화변수를 사용하는 층화 방법으로 일변량 층화변수가 있을 때 사용하는 누적도수제곱근법을 독립적으로 여러 층화변수에 적용하는 방법, 군집분석을 이용하는 방법, 인자분석과 군집분석을 함께 이용하는 방법 등 세 가지 방법을 제시한다. 한편, 2001년 농업총조사 자료에 나타난 동 읍 면의 농기계별 보유대수 정보를 층화변수로 활용하여 세 가지 층화 방안의 효율을 실증적으로 비교하게 되는데 그 결과 인자분석과 군집분석을 함께 고려한 층화방법이 비교적 효율적인 것으로 나타났다.

A Post-stratified Estimation in Multivariate Stratified Sampling Surveys

  • Park, Jinwoo
    • Communications for Statistical Applications and Methods
    • /
    • 제6권3호
    • /
    • pp.755-760
    • /
    • 1999
  • In multivariate stratified sampling surveys it is general to use a few stratification variables which are highly correlated with the important variables at design stage. But there might be some secondary study variables which are not so highly correlated with those stratification variables. In that case it is not efficient to use the same type of estimator due to the secondary variables as the one base on the important variables. A post-stratified estimation is proposed to increase the efficiency of the estimator with existence of secondary variables. The proposed method is illustrated with a set of fishery household population survey data.

  • PDF

Feedwater Flowrate Estimation Based on the Two-step De-noising Using the Wavelet Analysis and an Autoassociative Neural Network

  • Gyunyoung Heo;Park, Seong-Soo;Chang, Soon-Heung
    • Nuclear Engineering and Technology
    • /
    • 제31권2호
    • /
    • pp.192-201
    • /
    • 1999
  • This paper proposes an improved signal processing strategy for accurate feedwater flowrate estimation in nuclear power plants. It is generally known that ∼2% thermal power errors occur due to fouling Phenomena in feedwater flowmeters. In the strategy Proposed, the noises included in feedwater flowrate signal are classified into rapidly varying noises and gradually varying noises according to the characteristics in a frequency domain. The estimation precision is enhanced by introducing a low pass filter with the wavelet analysis against rapidly varying noises, and an autoassociative neural network which takes charge of the correction of only gradually varying noises. The modified multivariate stratification sampling using the concept of time stratification and MAXIMIN criteria is developed to overcome the shortcoming of a general random sampling. In addition the multi-stage robust training method is developed to increase the quality and reliability of training signals. Some validations using the simulated data from a micro-simulator were carried out. In the validation tests, the proposed methodology removed both rapidly varying noises and gradually varying noises respectively in each de-noising step, and 5.54% root mean square errors of initial noisy signals were decreased to 0.674% after de-noising. These results indicate that it is possible to estimate the reactor thermal power more elaborately by adopting this strategy.

  • PDF

κ-공간중위 군집방법을 활용한 층화방법 (Stratification Method Using κ-Spatial Medians Clustering)

  • 손순철;전명식
    • 응용통계연구
    • /
    • 제22권4호
    • /
    • pp.677-686
    • /
    • 2009
  • 표본조사에서 널리 쓰이는 모집단의 층화는 추정의 효율을 높이는 방법 중의 하나지만, 이상점을 포함하는 변수가 있는 경우에 여러 가지 문제점을 유발시킬 수 있다. 특히, 이상점이 존재하는 다변량 자료의 경우, 층화를 위한 $\kappa$-평균 군집방법은 이상점에 매우 민감하여 추정의 효율을 떨어뜨릴 수 있다. 본 연구에서는 이상점이 존재하는 다변량 자료의 층화를 위해 $\kappa$-평균 군집방법보다 강건하며 이상점을 따로 식별하는 과정이 배제된 $\kappa$-공간중위수 군집방법을 제안한다. 기존 관련연구인 박진우와 윤석훈 (2008)과 동일한 자료에 대한 사례분석을 통해 층화과정들을 비교, 검토하였으며 이들의 효율성을 추정량의 분산을 통해 비교하였다.

국민건강영양조사 자료의 복합표본설계효과와 통계적 추론 (Complex sample design effects and inference for Korea National Health and Nutrition Examination Survey data)

  • 정진은
    • Journal of Nutrition and Health
    • /
    • 제45권6호
    • /
    • pp.600-612
    • /
    • 2012
  • Nutritional researchers world-wide are using large-scale sample survey methods to study nutritional health epidemiology and services utilization in general, non-clinical populations. This article provides a review of important statistical methods and software that apply to descriptive and multivariate analysis of data collected in sample surveys, such as national health and nutrition examination survey. A comparative data analysis of the Korea National Health and Nutrition Examination Survey (KNHANES) was used to illustrate analytical procedures and design effects for survey estimates of population statistics, model parameters, and test statistics. This article focused on the following points, method of approach to analyze of the sample survey data, right software tools available to perform these analyses, and correct survey analysis methods important to interpretation of survey data. It addresses the question of approaches to analysis of complex sample survey data. The latest developments in software tools for analysis of complex sample survey data are covered, and empirical examples are presented that illustrate the impact of survey sample design effects on the parameter estimates, test statistics, and significance probabilities (p values) for univariate and multivariate analyses.

Feedback on Baseline Use of Staging Images is Important to Improve Image Overuse with Newly Diagnosed Prostate Cancer Patients

  • Sawazaki, Harutake;Sengiku, Atsushi;Imamura, Masaaki;Takahashi, Takeshi;Kobayashi, Hisato;Ogura, Keiji
    • Asian Pacific Journal of Cancer Prevention
    • /
    • 제15권4호
    • /
    • pp.1707-1710
    • /
    • 2014
  • Background: The objective of this study was to evaluate baseline use and positive rates of staging images (bone scan, CT) in newly diagnosed patients with prostate cancer (PCa) and to improve staging image overuse. Materials and Methods: This retrospective study covered a consecutive series of patients with PCa who underwent stage imaging at our institution between 2006 and 2011. Various clinical and pathological variables (age, PSA, biopsy Gleason score, clinical T stage, positive biopsy core rate) were evaluated by multivariate logistic regression analysis for their ability to predict a positive staging image. All patients were stratified according to the NCCN risk stratification and positive rates were compared in each risk group. Results: 410 patients (100%) underwent a bone scan and 315 patients (76.8%) underwent a CT scan. Some 51 patients (12.4%) had a positive bone scan, clinical T3 and T4 being significant independent predictors. Positive bone scan rates for low-, intermediate-, high-, and very high-risk groups were 0%, 0%, 8.25%, and 56.6%. Some 59 (18.7%) patients had a positive CT scan, with elevated PSA and clinical T3, T4 as significant independent predictors. Low-, intermediate-, high- and very high-risk group rates were 0%, 0%, 13.8% and 80.0%. Conclusions: The incidences of positive staging image in low- and intermediate- risk group were reasonably low. Following feedback on these results, staging in low- and intermediate- risk groups could be omitted.

A New Inflammatory Prognostic Index, Based on C-reactive Protein, the Neutrophil to Lymphocyte Ratio and Serum Albumin is Useful for Predicting Prognosis in Non-Small Cell Lung Cancer Cases

  • Dirican, Nigar;Dirican, Ahmet;Anar, Ceyda;Atalay, Sule;Ozturk, Onder;Bircan, Ahmet;Akkaya, Ahmet;Cakir, Munire
    • Asian Pacific Journal of Cancer Prevention
    • /
    • 제17권12호
    • /
    • pp.5101-5106
    • /
    • 2016
  • Purpose: We aimed to establish an inflammatory prognostic index (IPI) in early and advanced non-small cell lung cancer (NSCLC) patients based on hematologic and biochemical parameters and to analyze its predictive value for NSCLC survival. Materials and Methods: A retrospective review of 685 patients with early and advanced NSCLC diagnosed between 2009 and 2014 was conducted with collection of clinical, and laboratory data. The IPI was calculated as C-reactive protein ${\times}$ NLR (neutrophil/ lymphocyte ratio)/serum albumin. Univariate and multivariate analyses were performed to assess the prognostic value of relevant factors. Results: The optimal cut-off value of IPI for overall survival (OS) stratification was determined to be 15. Totals of 334 (48.8%) and 351 (51.2%) patients were assigned to high and low IPI groups, respectively. Compared with low IPI, high IPI was associated with older age, greater tumor size, high lymph node involvement, distant metastases, advanced stage and poor performance status. Median OS was worse in the high IPI group (low vs high, 8.0 vs 34.0 months; HR, 3.5; p<0.001). Progression free survival values of the patients who had high vs low IPI were determined 6 months (95% CI:5.3-6.6) and 14 months (95% CI:12.1-15.8), respectively (HR; 2.4, P<0.001). On multivariate analysis, stage, performance status, lactate dehydrogenase and IPI were independent prognostic factors for OS. Subgroup analysis showed IPI was generally a significant prognostic factor in all clinical variables. Conclusion: The described IPI may be an inexpensive, easily accessible and independent prognostic index for NSCLC patients, useful for clinical practice.