• 제목/요약/키워드: stratified cluster sampling

검색결과 122건 처리시간 0.022초

Unbiased Balanced Half-Sample Variance Estimation in Stratified Two-stage Sampling

  • Kim, Kyu-Seong
    • Journal of the Korean Statistical Society
    • /
    • 제27권4호
    • /
    • pp.459-469
    • /
    • 1998
  • Balanced half sample method is a simple variance estimation method for complex sampling designs. Since it is simple and flexible, it has been widely used in large scale sample surveys. However, the usual BHS method overestimate the true variance in without replacement sampling and two-stage cluster sampling. Focusing on this point , we proposed an unbiased BHS variance estimator in a stratified two-stage cluster sampling and then described an implementation method of the proposed estimator. Finally, partially BHS design is explained as a tool of reducing the number of replications of the proposed estimator.

  • PDF

층화 2-단 표본 추출시 최적 집락의 크기 결정 (A Optimal Cluster Size in Stratified Two-Stage Cluster Sampling)

  • 신민웅;신기일
    • 응용통계연구
    • /
    • 제13권2호
    • /
    • pp.207-224
    • /
    • 2000
  • 모집단을 집략화하여 층화 2-단 표본 추출을 할 때에 일반적으로 집락의 크기는 정해져 있다. 그러나 집락이 아파트 단지 등과 같은 경우에 집락의 크기는 큰 차이를 보인다. 이 경우 집락을 합치거나 또는 분할할 필요가 생긴다. 대 표본조사(large sample survey)에서 행정상 또는 조사 편의상 동질의 원소들이 집락화 되어 있고 집락의 크기를 결정할 필요가 있을 경우가 고려되었으며 본 논문에서는 집락의 최적크기를 결정하는 문제를 다루었다. 또한 주어진 비용 하에서 최적의 일차 추출 단위 수와 최적의 이차 추출 단위 수를 구하였다.

  • PDF

A composite estimator for stratified two stage cluster sampling

  • Lee, Sang Eun;Lee, Pu Reum;Shin, Key-Il
    • Communications for Statistical Applications and Methods
    • /
    • 제23권1호
    • /
    • pp.47-55
    • /
    • 2016
  • Stratified cluster sampling has been widely used for effective parameter estimations due to reductions in time and cost. The probability proportional to size (PPS) sampling method is used when the number of cluster element are significantly different. However, simple random sampling (SRS) is commonly used for simplicity if the number of cluster elements are almost the same. Also it is known that the ratio estimator produces a good performance when the total number of population elements is known. However, the two stage cluster estimator should be used if the total number of elements in population is neither known nor accurate. In this study we suggest a composite estimator by combining the ratio estimator and the two stage cluster estimator to obtain a better estimate under a certain population circumstance. Simulation studies are conducted to compare the superiority of the suggested estimator with two other estimators.

지속가능한 산림경영에 적합한 표본조사 방법의 개발 (Development of a Forest Inventory System for the Sustainable Forest Management)

  • 신만용;한원성
    • 한국산림과학회지
    • /
    • 제95권3호
    • /
    • pp.370-377
    • /
    • 2006
  • 본 연구는 지속가능한 산림경영에 적합한 표본조사 방법을 제시하기 위해 계통적 추출법, 계통적 집락추출법, 그리고 층화집락추출법을 이용하여 경기도 양평군의 산림을 대상으로 자료를 수집한 후 통계검증을 실시하였다. 표본조사 방법은 계통적 집락추출법이 가장 효율적인 것으로 분석되었는데, 계통적 집락추출법을 적용할 경우 집락의 형태와 집락 내의 표본점 간의 거리를 결정하기 위해 5가지 집락의 형태와 표본점간의 거리 4가지에 대하여 통계검증을 실시하였다. 그 결과 집락의 형태는 삼각형 그리고 집락 내의 표본점 간의 거리는 50m가 가장 적합할 것으로 평가되었다.

Variance estimation for distribution rate in stratified cluster sampling with missing values

  • Heo, Sunyeong
    • Journal of the Korean Data and Information Science Society
    • /
    • 제28권2호
    • /
    • pp.443-449
    • /
    • 2017
  • Estimation of population proportion like the distribution rate of LED TV and the prevalence of a disease are often estimated based on survey sample data. Population proportion is generally considered as a special form of population mean. In complex sampling like stratified multistage sampling with unequal probability sampling, the denominator of mean may be random variable and it is estimated like ratio estimator. In this research, we examined the estimation of distribution rate based on stratified multistage sampling, and determined some numerical outcomes using stratified random sample data with about 25% of missing observations. In the data used for this research, the survey weight was determined by deterministic way. So, the weights are not random variable, and the population distribution rate and its variance estimator can be estimated like population mean estimation. When the weights are not random variable, if one estimates the variance of proportion estimator using ratio method, then the variances may be inflated. Therefore, in estimating variance for population proportion, we need to examine the structure of data and survey design before making any decision for estimation methods.

층화 집락 반복계통 무관질문모형에 관한 연구 (A Study on the Stratified Cluster Replicated Systematic Unrelated Question Model)

  • 이기성
    • 응용통계연구
    • /
    • 제26권2호
    • /
    • pp.209-222
    • /
    • 2013
  • 본 논문에서는 대규모 표본조사에서 많이 나타나는 모집단이 층으로 형성되어 있고, 각 층들이 집락으로 구성되어 있을 때 사용 가능한 층화 집락추출법을 얻고자 하는 정보가 민감할 때 반복계통 무관질문모형에 적용하였다. 먼저 모집단이 집락으로 구성되어 있고, 추출된 집락으로부터 계통표본을 반복적으로 추출하여 민감한 정보를 얻는 데 무관질문모형을 사용한 집락 반복계통 무관질문모형을 제안하였다. 다음으로 제안한 모형을 층화된 모집단에서도 사용할 수 있도록 층화집락 반복계통추출법에 의한 무관질문모형으로 발전시켰으며, 각 층의 집락을 확률비례복원추출 또는 확률비례비복원추출하는 층화 확률비례 반복계통 무관질문모형을 제안하였다. 또한 제안한 층화집락 반복계통 추출법에 의한 무관질문모형에서 각 층의 표본배분하는 문제를 비례배분과 최적배분 측면에서 다루었다. 마지막으로 제안한 층화집락 반복계통추출법에 의한 무관질문모형과 집락 반복계통추출법에 의한 무관질문모형과의 효율성을 비교하였다.

이단계표본추출을 이용한 소결핵병 유병률 추정 (Two-stage Sampling for Estimation of Prevalence of Bovine Tuberculosis)

  • 박선일
    • 한국임상수의학회지
    • /
    • 제28권4호
    • /
    • pp.422-426
    • /
    • 2011
  • For a national survey in which wide geographic region or an entire country is targeted, multi-stage sampling approach is widely used to overcome the problem of simple random sampling, to consider both herd- and animallevel factors associated with disease occurrence, and to adjust clustering effect of disease in the population in the calculation of sample size. The aim of this study was to establish sample size for estimating bovine tuberculosis (TB) in Korea using stratified two-stage sampling design. The sample size was determined by taking into account the possible clustering of TB-infected animals on individual herds to increase the reliability of survey results. In this study, the country was stratified into nine provinces (administrative unit) and herd, the primary sampling unit, was considered as a cluster. For all analyses, design effect of 2, between-cluster prevalence of 50% to yield maximum sample size, and mean herd size of 65 were assumed due to lack of information available. Using a two-stage sampling scheme, the number of cattle sampled per herd was 65 cattle, regardless of confidence level, prevalence, and mean herd size examined. Number of clusters to be sampled at a 95% level of confidence was estimated to be 296, 74, 33, 19, 12, and 9 for desired precision of 0.01, 0.02, 0.03, 0.04, 0.05, and 0.06, respectively. Therefore, the total sample size with a 95% confidence level was 172,872, 43,218, 19,224, 10,818, 6,930, and 4,806 for desired precision ranging from 0.01 to 0.06. The sample size was increased with desired precision and design effect. In a situation where the number of cattle sampled per herd is fixed ranging from 5 to 40 with a 5-head interval, total sample size with a 95% confidence level was estimated to be 6,480, 10,080, 13,770, 17,280, 20.925, 24,570, 28,350, and 31,680, respectively. The percent increase in total sample size resulting from the use of intra-cluster correlation coefficient of 0.3 was 22.2, 32.1, 36.3, 39.6, 41.9, 42.9, 42,2, and 44.3%, respectively in comparison to the use of coefficient of 0.2.

우리나라 당뇨병의 역학적 규모와 당뇨병 관리현황 파악을 위한 표본설계의 평가 (An Evaluation of Sampling Design for Estimating an Epidemiologic Volume of Diabetes and for Assessing Present Status of Its Control in Korea)

  • 이지성;김재용;백세현;박이병;이준영
    • Journal of Preventive Medicine and Public Health
    • /
    • 제42권2호
    • /
    • pp.135-142
    • /
    • 2009
  • Objectives : An appropriate sampling strategy for estimating an epidemiologic volume of diabetes has been evaluated through a simulation. Methods : We analyzed about 250 million medical insurance claims data submitted to the Health Insurance Review & Assessment Service with diabetes as principal or subsequent diagnoses, more than or equal to once per year, in 2003. The database was re-constructed to a 'patient-hospital profile' that had 3,676,164 cases, and then to a 'patient profile' that consisted of 2,412,082 observations. The patient profile data was then used to test the validity of a proposed sampling frame and methods of sampling to develop diabetic-related epidemiologic indices. Results : Simulation study showed that a use of a stratified two-stage cluster sampling design with a total sample size of 4,000 will provide an estimate of 57.04%(95% prediction range, 49.83 - 64.24%) for a treatment prescription rate of diabetes. The proposed sampling design consists, at first, stratifying the area of the nation into "metropolitan/city/county" and the types of hospital into "tertiary/secondary/primary/clinic" with a proportion of 5:10:10:75. Hospitals were then randomly selected within the strata as a primary sampling unit, followed by a random selection of patients within the hospitals as a secondly sampling unit. The difference between the estimate and the parameter value was projected to be less than 0.3%. Conclusions : The sampling scheme proposed will be applied to a subsequent nationwide field survey not only for estimating the epidemiologic volume of diabetes but also for assessing the present status of nationwide diabetes control.

남성복(男性服)의 치수규격을 위한 하체부(下體部)의 체형분류(II) (Classification of Bodytype of Lower Part on Adult Male for the Apparel Sizing System)

  • 김구자
    • 한국의류학회지
    • /
    • 제17권4호
    • /
    • pp.602-607
    • /
    • 1993
  • Concept of the comfort and fitness becomes a major concern in the basic function of the ready-made clothes. This research was performed to classify and characterize Korean adult males anthropometrically. Sample size was 1290 subjects and their age range was from 19 to 54 years old. Sampling was carried out by the stratified sampling method. 75 variables in total were applied to classify the bodytypes. Data were analyzed by the multivariate method, especially factor and cluster analysis. The high factor loading items extracted by factor analysis were based to determine the variables of the cluster analysis for the similar bodytypes respectively. In the part of the lower body, 14 variables from the data were applied to classify the bodytypes of lower part by Ward's minimum variance method. The group fanning a cluster were subdivided into 5 sets by cross-tabulation extracted by the hierarchical cluster analysis. Type 3 and 4 in lower body were composed of the majority of 53.1% of the subjects. The Korean adult males had relatively well-balanced in lower body.

  • PDF

Empirical Analysis on Rao-Scott First Order Adjustment for Two Population Homogeneity test Based on Stratified Three-Stage Cluster Sampling with PPS

  • Heo, Sunyeong
    • 통합자연과학논문집
    • /
    • 제7권3호
    • /
    • pp.208-213
    • /
    • 2014
  • National-wide and/or large scale sample surveys generally use complex sample design. Traditional Pearson chi-square test is not appropriate for the categorical complex sample data. Rao-Scott suggested an adjustment method for Pearson chi-square test, which uses the average of eigenvalues of design matrix of cell probabilities. This study is to compare the efficiency of Rao-Scott first order adjusted test to Wald test for homogeneity between two populations using 2009 Gyeongnam regional education offices's customer satisfaction survey (2009 GREOCSS) data. The 2009 GREOCSS data were collected based on stratified three-stage cluster sampling with probability proportional to size. The empirical results show that the Rao-Scott adjusted test statistic using only the variances of cell probabilities is very close to the Wald test statistic, which uses the covariance matrix of cell probabilities, under the 2009 GREOCSS data based. However it is necessary to be cautious to use the Rao-Scott first order adjusted test statistic in the place of Wald test because its efficiency is decreasing as the relative variance of eigenvalues of the design matrix of cell probabilities is increasing, specially more when the number of degrees of freedom is small.