• Title/Summary/Keyword: stratified random sampling

Search Result 266, Processing Time 0.024 seconds

EFFICIENT ESTIMATION OF POPULATION MEAN IN STRATIFIED SAMPLING USING REGRESSION TYPE ESTIMATOR

  • Grover Lovleen Kumar
    • Journal of the Korean Statistical Society
    • /
    • v.35 no.4
    • /
    • pp.441-452
    • /
    • 2006
  • Here an efficient regression type estimator for a stratified population mean is proposed under the two-phase sampling scheme. While constructing the proposed estimator, it is assumed that the first auxiliary variable x is directly and highly correlated with the study variable y, and the second auxiliary variable z is directly and highly correlated with the first auxiliary variable x. However the variable z is not directly correlated with the variable y, but they are just correlated with each other only due to their direct and high correlation with the variable x. The proposed regression type estimator is found to be always more efficient than the existing estimators defined under the same situation.

A composite estimator for stratified two stage cluster sampling

  • Lee, Sang Eun;Lee, Pu Reum;Shin, Key-Il
    • Communications for Statistical Applications and Methods
    • /
    • v.23 no.1
    • /
    • pp.47-55
    • /
    • 2016
  • Stratified cluster sampling has been widely used for effective parameter estimations due to reductions in time and cost. The probability proportional to size (PPS) sampling method is used when the number of cluster element are significantly different. However, simple random sampling (SRS) is commonly used for simplicity if the number of cluster elements are almost the same. Also it is known that the ratio estimator produces a good performance when the total number of population elements is known. However, the two stage cluster estimator should be used if the total number of elements in population is neither known nor accurate. In this study we suggest a composite estimator by combining the ratio estimator and the two stage cluster estimator to obtain a better estimate under a certain population circumstance. Simulation studies are conducted to compare the superiority of the suggested estimator with two other estimators.

A Study on Efficiency of the Cut-off Systematic Sampling (절사계통추출법의 효율성에 관한 연구)

  • 이계오;최정배;석영우
    • The Korean Journal of Applied Statistics
    • /
    • v.14 no.1
    • /
    • pp.111-120
    • /
    • 2001
  • Either systematic sampling or stratified sampling is usually applied to the business conditions survey when companies don't have much difference in their size. But the cutoff systematic sampling is an efficient method when only a few companies are so large that the total of them almost equals to the total of whole companies. Throughout this paper, three estimators of total and their variance estimations depending on three kinds of sampling schemes are discussed, and are compared with them via their variances. It is proved that the cut-off systematic sampling is most efficient by using a real data of the logging business conditions survey.

  • PDF

A Study on Sample Allocation for Stratified Sampling (층화표본에서의 표본 배분에 대한 연구)

  • Lee, Ingue;Park, Mingue
    • The Korean Journal of Applied Statistics
    • /
    • v.28 no.6
    • /
    • pp.1047-1061
    • /
    • 2015
  • Stratified random sampling is a powerful sampling strategy to reduce variance of the estimators by incorporating useful auxiliary information to stratify the population. Sample allocation is the one of the important decisions in selecting a stratified random sample. There are two common methods, the proportional allocation and Neyman allocation if we could assume data collection cost for different observation units equal. Theoretically, Neyman allocation considering the size and standard deviation of each stratum, is known to be more effective than proportional allocation which incorporates only stratum size information. However, if the information on the standard deviation is inaccurate, the performance of Neyman allocation is in doubt. It has been pointed out that Neyman allocation is not suitable for multi-purpose sample survey that requires the estimation of several characteristics. In addition to sampling error, non-response error is another factor to evaluate sampling strategy that affects the statistical precision of the estimator. We propose new sample allocation methods using the available information about stratum response rates at the designing stage to improve stratified random sampling. The proposed methods are efficient when response rates differ considerably among strata. In particular, the method using population sizes and response rates improves the Neyman allocation in multi-purpose sample survey.

A Study for Time Standard Estimation with Activity Sampling Method (가동샘플링기법에 의한 표준시간추정에 관한 연구)

  • 이근희
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.6 no.9
    • /
    • pp.1-5
    • /
    • 1983
  • This study takes over the application of survey sampling theory to activity sampling and the application of activity sampling to time standard estimation. Cluster, stratified, and multistage sampling are studied in conjunction with random and systematic sampling. Estimation procedures that will maximize the information obtained per cost expended on the study and specification of the procedure to be used to estimate the accuracy of the estimates for the adopted procedure are considered. The use of multiple regression md linear programming to estimate standard element performance time from typical job lot production data is also considered.

  • PDF

Comparison of Sampling Techniques for Passive Internet Measurement: An Inspection using An Empirical Study (수동적 인터넷 측정을 위한 샘플링 기법 비교: 사례 연구를 통한 검증)

  • Kim, Jung-Hyun;Won, You-Jip;Ahn, Soo-Han
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • v.45 no.6
    • /
    • pp.34-51
    • /
    • 2008
  • Today, the Internet is a part of our life. For that reason, we regard revealing characteristics of Internet traffic as an important research theme. However, Internet traffic cannot be easily manipulated because it usually occupy huge capacity. This problem is a serious obstacle to analyze Internet traffic. Many researchers use various sampling techniques to reduce capacity of Internet traffic. In this paper, we compare several famous sampling techniques, and propose efficient sampling scheme. We chose some sampling techniques such as Systematic Sampling, Simple Random Sampling and Stratified Sampling with some sampling intensities such as 1/10, 1/100 and 1/1000. Our observation focused on Traffic Volume, Entropy Analysis and Packet Size Analysis. Both the simple random sampling and the count-based systematic sampling is proper to general case. On the other hand, time-based systematic sampling exhibits relatively bad results. The stratified sampling on Transport Layer Protocols, e.g.. TCP, UDP and so on, shows superior results. Our analysis results suggest that efficient sampling techniques satisfactorily maintain variation of traffic stream according to time change. The entropy analysis endures various sampling techniques well and fits detecting anomalous traffic. We found that a traffic volume diminishment caused by bottleneck could induce wrong results on the entropy analysis. We discovered that Packet Size Distribution perfectly tolerate any packet sampling techniques and intensities.

An Evaluation of Sampling Design for Estimating an Epidemiologic Volume of Diabetes and for Assessing Present Status of Its Control in Korea (우리나라 당뇨병의 역학적 규모와 당뇨병 관리현황 파악을 위한 표본설계의 평가)

  • Lee, Ji-Sung;Kim, Jai-Yong;Baik, Sei-Hyun;Park, Ie-Byung;Lee, June-Young
    • Journal of Preventive Medicine and Public Health
    • /
    • v.42 no.2
    • /
    • pp.135-142
    • /
    • 2009
  • Objectives : An appropriate sampling strategy for estimating an epidemiologic volume of diabetes has been evaluated through a simulation. Methods : We analyzed about 250 million medical insurance claims data submitted to the Health Insurance Review & Assessment Service with diabetes as principal or subsequent diagnoses, more than or equal to once per year, in 2003. The database was re-constructed to a 'patient-hospital profile' that had 3,676,164 cases, and then to a 'patient profile' that consisted of 2,412,082 observations. The patient profile data was then used to test the validity of a proposed sampling frame and methods of sampling to develop diabetic-related epidemiologic indices. Results : Simulation study showed that a use of a stratified two-stage cluster sampling design with a total sample size of 4,000 will provide an estimate of 57.04%(95% prediction range, 49.83 - 64.24%) for a treatment prescription rate of diabetes. The proposed sampling design consists, at first, stratifying the area of the nation into "metropolitan/city/county" and the types of hospital into "tertiary/secondary/primary/clinic" with a proportion of 5:10:10:75. Hospitals were then randomly selected within the strata as a primary sampling unit, followed by a random selection of patients within the hospitals as a secondly sampling unit. The difference between the estimate and the parameter value was projected to be less than 0.3%. Conclusions : The sampling scheme proposed will be applied to a subsequent nationwide field survey not only for estimating the epidemiologic volume of diabetes but also for assessing the present status of nationwide diabetes control.

An Estimation Procedure Using Updated Stratification Sample in Panel Survery (패널표본조사에서 층간변동을 고려한 추정방법)

  • 김영원;오명신
    • The Korean Journal of Applied Statistics
    • /
    • v.11 no.2
    • /
    • pp.461-475
    • /
    • 1998
  • In panel survey in which the sample is selected by stratified random sampling, if the sampling units shift from a stratum to others in time, then the movement should be incorporated in the estimation procedures. Dealing with the problem caused by the movement of units across stratum in the updated stratification sample, the bias of the conventional estimator neglecting the movement is investigated, arid the bias-adjusted estimators are proposed. The variance estimator of the suggested estimators is also derived. It is illustrated via a simulation study that the proposed estimators beat the conventional estimator in the sense of bias and mean squared error In particular, when the Neyman allocation is applied in stratified sampling, the proposed estimator is shown much more effective to this end.

  • PDF

Two-stage Sampling for Estimation of Prevalence of Bovine Tuberculosis (이단계표본추출을 이용한 소결핵병 유병률 추정)

  • Pak, Son-Il
    • Journal of Veterinary Clinics
    • /
    • v.28 no.4
    • /
    • pp.422-426
    • /
    • 2011
  • For a national survey in which wide geographic region or an entire country is targeted, multi-stage sampling approach is widely used to overcome the problem of simple random sampling, to consider both herd- and animallevel factors associated with disease occurrence, and to adjust clustering effect of disease in the population in the calculation of sample size. The aim of this study was to establish sample size for estimating bovine tuberculosis (TB) in Korea using stratified two-stage sampling design. The sample size was determined by taking into account the possible clustering of TB-infected animals on individual herds to increase the reliability of survey results. In this study, the country was stratified into nine provinces (administrative unit) and herd, the primary sampling unit, was considered as a cluster. For all analyses, design effect of 2, between-cluster prevalence of 50% to yield maximum sample size, and mean herd size of 65 were assumed due to lack of information available. Using a two-stage sampling scheme, the number of cattle sampled per herd was 65 cattle, regardless of confidence level, prevalence, and mean herd size examined. Number of clusters to be sampled at a 95% level of confidence was estimated to be 296, 74, 33, 19, 12, and 9 for desired precision of 0.01, 0.02, 0.03, 0.04, 0.05, and 0.06, respectively. Therefore, the total sample size with a 95% confidence level was 172,872, 43,218, 19,224, 10,818, 6,930, and 4,806 for desired precision ranging from 0.01 to 0.06. The sample size was increased with desired precision and design effect. In a situation where the number of cattle sampled per herd is fixed ranging from 5 to 40 with a 5-head interval, total sample size with a 95% confidence level was estimated to be 6,480, 10,080, 13,770, 17,280, 20.925, 24,570, 28,350, and 31,680, respectively. The percent increase in total sample size resulting from the use of intra-cluster correlation coefficient of 0.3 was 22.2, 32.1, 36.3, 39.6, 41.9, 42.9, 42,2, and 44.3%, respectively in comparison to the use of coefficient of 0.2.

A Study on the Sampling of Ocean Meteorological Data to Analyze Signature of Naval Ships (함정 신호해석 연구에 필요한 해양기상환경 자료의 표본추출에 관한 연구)

  • Cho, Yong-Jin
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.23 no.2
    • /
    • pp.19-28
    • /
    • 2018
  • In this paper, we studied on the sampling of ocean meteorological data to analyze signature of naval ships. The newest ocean meteorological data, that was quality controled by the Korea Meteorological Administration(KMA), was collected. Outliers were removed from the data by setting the usable range of data. After that, the data size was reduced through the random sampling method, taking geopolitical significance and effective area of buoy, for probabilistic analysis. Moreover, the sample sizes were set at 100, 200, and 400 by considering the population size and a 95% confidence level. The final sample was obtained using the two-dimensional stratified sampling method based on highly correlated water temperature and air temperature. The sum of the squared errors and the confidence interval was calculated to compare the result of sampling. As a result, this study proposed reasonable sample size for infra­red signature analysis of naval ships.