• Title/Summary/Keyword: Sampling studies

Search Result 1,241, Processing Time 0.026 seconds

Efficient Use of Auxiliary Variables in Estimating Finite Population Variance in Two-Phase Sampling

  • Singh, Housila P.;Singh, Sarjinder;Kim, Jong-Min
    • Communications for Statistical Applications and Methods
    • /
    • v.17 no.2
    • /
    • pp.165-181
    • /
    • 2010
  • This paper presents some chain ratio-type estimators for estimating finite population variance using two auxiliary variables in two phase sampling set up. The expressions for biases and mean squared errors of the suggested c1asses of estimators are given. Asymptotic optimum estimators(AOE's) in each class are identified with their approximate mean squared error formulae. The theoretical and empirical properties of the suggested classes of estimators are investigated. In the simulation study, we took a real dataset related to pulmonary disease available on the CD with the book by Rosner, (2005).

On efficient estimation of population mean under non-response

  • Bhushan, Shashi;Pandey, Abhay Pratap
    • Communications for Statistical Applications and Methods
    • /
    • v.26 no.1
    • /
    • pp.11-25
    • /
    • 2019
  • The present paper utilizes auxiliary information to neutralize the effect of non-response for estimating the population mean. Improved ratio type estimators for population mean have been proposed and their properties are studied. These estimators are suggested for both single phase sampling and two phase sampling in presence of non-response. Empirical studies are conducted to validate the theoretical results and demonstrate the performance of the proposed estimators. The proposed estimators are shown to perform better than those used by Cochran (Sampling Techniques (3rd ed), John Wiley & Sons, 1977), Khare and Srivastava (In Proceedings-National Academy Science, India, Section A, 65, 195-203, 1995), Rao (Randomization Approach in Incomplete Data in Sample Surveys, Academic Press, 1983; Survey Methodology 12, 217-230, 1986), and Singh and Kumar (Australian & New Zealand Journal of Statistics, 50, 395-408, 2008; Statistical Papers, 51, 559-582, 2010) under the derived optimality condition. Suitable recommendations are put forward for survey practitioners.

THE STUDIES FOR CAUSES OF THE TOOTH MORTALITY (치아발거 문제에 관하여)

  • Yang, Dong-Kyu;Kim, Soo-Nam
    • The Journal of the Korean dental association
    • /
    • v.9 no.7
    • /
    • pp.448-450
    • /
    • 1971
  • The authores had made the studies for causes of the tooth mortality. Sampling of studies was 5711 patients coming in department of oral surgery, infirmary of dental college, S.N.U. between year from 1965 to 1969. The results were obtained as follow. 1) The most freguency of tooth extraction due to inflammation and dental caries. 2) Male was heigher frequency than female due to fracture by accidental trouble.

  • PDF

A Hybrid Under-sampling Approach for Better Bankruptcy Prediction (부도예측 개선을 위한 하이브리드 언더샘플링 접근법)

  • Kim, Taehoon;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.173-190
    • /
    • 2015
  • The purpose of this study is to improve bankruptcy prediction models by using a novel hybrid under-sampling approach. Most prior studies have tried to enhance the accuracy of bankruptcy prediction models by improving the classification methods involved. In contrast, we focus on appropriate data preprocessing as a means of enhancing accuracy. In particular, we aim to develop an effective sampling approach for bankruptcy prediction, since most prediction models suffer from class imbalance problems. The approach proposed in this study is a hybrid under-sampling method that combines the k-Reverse Nearest Neighbor (k-RNN) and one-class support vector machine (OCSVM) approaches. k-RNN can effectively eliminate outliers, while OCSVM contributes to the selection of informative training samples from majority class data. To validate our proposed approach, we have applied it to data from H Bank's non-external auditing companies in Korea, and compared the performances of the classifiers with the proposed under-sampling and random sampling data. The empirical results show that the proposed under-sampling approach generally improves the accuracy of classifiers, such as logistic regression, discriminant analysis, decision tree, and support vector machines. They also show that the proposed under-sampling approach reduces the risk of false negative errors, which lead to higher misclassification costs.

Study on Sampling Frame and Methods for Analyzing Political Attitudes : A Comparison of RDD and Direct Sampling (표집틀 설정과 표본추출방법에 따른 정치성향 분석의 문제점: 임의번호걸기(Random Digit Dialing)과 전화번호부 추출방법 비교)

  • Woo, Jung-Yeop;Kim, Ji-Yoon;Moon, Jong-Bae
    • Survey Research
    • /
    • v.12 no.1
    • /
    • pp.153-174
    • /
    • 2011
  • This research aims to discuss the causes of inaccuracy in public opinion polls currently conducted in Korea. In particular, identifying the problems in sampling frame and sampling methods in political and social public opinion polls is an important question. Currently, most polling organizations operating in Korea are using phone number directories provided by Korea Telecom(KT) as its sampling frame for conducting most political polls. A critical problem of using a phone number directory as a sampling frame is that unlisted phone numbers can never be included in the sample. If a systematic difference in socio-demographic or politico-economic characteristics exists between the listed number using group and the unlisted group, using a phone number directory as a sampling frame cannot produce a sample that can represent the whole adult population in Korea. According to the poll result commissioned by the Asan Institute for Policy Studies in January 2011, there are statistically significant differences in socio-demographic and politico-economic characteristics between those two groups, and those differences led to the differences in the presidential job approval rating and party support. Our findings include that the listed number using group is more pro-Grand National Party and show stronger support for the president than the unlisted group.

  • PDF

Study on the Effect of Training Data Sampling Strategy on the Accuracy of the Landslide Susceptibility Analysis Using Random Forest Method (Random Forest 기법을 이용한 산사태 취약성 평가 시 훈련 데이터 선택이 결과 정확도에 미치는 영향)

  • Kang, Kyoung-Hee;Park, Hyuck-Jin
    • Economic and Environmental Geology
    • /
    • v.52 no.2
    • /
    • pp.199-212
    • /
    • 2019
  • In the machine learning techniques, the sampling strategy of the training data affects a performance of the prediction model such as generalizing ability as well as prediction accuracy. Especially, in landslide susceptibility analysis, the data sampling procedure is the essential step for setting the training data because the number of non-landslide points is much bigger than the number of landslide points. However, the previous researches did not consider the various sampling methods for the training data. That is, the previous studies selected the training data randomly. Therefore, in this study the authors proposed several different sampling methods and assessed the effect of the sampling strategies of the training data in landslide susceptibility analysis. For that, total six different scenarios were set up based on the sampling strategies of landslide points and non-landslide points. Then Random Forest technique was trained on the basis of six different scenarios and the attribute importance for each input variable was evaluated. Subsequently, the landslide susceptibility maps were produced using the input variables and their attribute importances. In the analysis results, the AUC values of the landslide susceptibility maps, obtained from six different sampling strategies, showed high prediction rates, ranges from 70 % to 80 %. It means that the Random Forest technique shows appropriate predictive performance and the attribute importance for the input variables obtained from Random Forest can be used as the weight of landslide conditioning factors in the susceptibility analysis. In addition, the analysis results obtained using specific sampling strategies for training data show higher prediction accuracy than the analysis results using the previous random sampling method.

A study on the determination of substrata using the information of exponential response rate by simulation studies (모의실험을 기반으로 지수형 응답률 보정을 위한 세부 층 결정에 관한 연구)

  • Min, Joo-Won;Shin, Key-Il
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.5
    • /
    • pp.621-636
    • /
    • 2018
  • Research on the application of informative sampling technique has been conducted in order to reduce the influence of non-response. Chung and Shin (Korean Journal of Applied Statistics, 30, 993-1004, 2017) showed that the estimation accuracy improved when using exponential response rate information for the parameter estimation if the distribution of errors included in the super population model follows normal distribution. However this method divides the stratum into equally spaced substrata to obtain the sample weight of the informative sampling technique and shows that the accuracy of the estimation improves as the number of substrata increases. In this study, with the given number of total sample size, the optimal substratum boundary points are calculated using equal space, quantile, and LH algorithm; consequently, the results using those methods are compared through simulation. We also studied the criteria to determine the number of substrata and substratum boundaries that can be used in practice with various types of auxiliary variable distributions.

Studies on the Heavy Metal Contamination in the Sediment of the Han River (한강으로 유입된 저질중의 중금속오염도 조사)

  • 신정식;박상현
    • Journal of environmental and Sanitary engineering
    • /
    • v.6 no.1
    • /
    • pp.83-93
    • /
    • 1991
  • For the survey of water pollution, several heavy metals were analyzed in the sediment of the Han River from March 20 to April 22, 1989. The results were as follows : 1. The respective ranges of heavy metal concentrations of Cadimium, Lead, Copper, Zinc and Manganese found in the sediments of the Han River were 0.32!2.41 $\mu g/g$, 15.80~129.64 $\mu g/g$, 13.82~372.36 $\mu g/g$, 58.40~925.40 $\mu g/g$, 271.50~668.30 $\mu g/g$. 2. In the sediment of inflow site Jung Rang Chon the contents of Lead, Copper, Zinc were the highest among other sampling points and An Yang Chon, the contents of Cadmium, was the highest among other sampling points and Wang Sook Chon, the contents of Manganese, was the highest among other sampling points. 3. Through all sampling points general trend of heavy metal contamination showed the highest in Zinc, the next Manganese, Copper, Lead and Cadmium respectively. 4. The higher amount of heavy metal was found in the finer particles of sediment. 5. The amount of Cadmium and Lead of the Han River water was below the standard of environment.

  • PDF

Calibration for Spatial Stratified Sampling Design (공간층화표본설계에 대한 보정)

  • Byun, Jong-Seok;Son, Chang-Kyoon;Kim, Jong-Min
    • Communications for Statistical Applications and Methods
    • /
    • v.17 no.1
    • /
    • pp.9-16
    • /
    • 2010
  • The sampling design for the spatial population studies needs a model assumption of a dependent relationship, where the interesting parameters can be the population mean, proportion and area. We know that the study of an interested spatial population, which is stratified by a geographical condition or shape, and the degree of distort of an estimation area is much useful. In light of this, if auxiliary information of the target variable such as wasted area contaminated by some material and the degree of distribution of animal or plants is available, then the spatial estimator might be improved through the calibration procedure. In this research, we propose the calibration procedure for the spatial stratified sampling in which we consider the one and two-dimensional auxiliary information.

Self-adaptive sampling for sequential surrogate modeling of time-consuming finite element analysis

  • Jin, Seung-Seop;Jung, Hyung-Jo
    • Smart Structures and Systems
    • /
    • v.17 no.4
    • /
    • pp.611-629
    • /
    • 2016
  • This study presents a new approach of surrogate modeling for time-consuming finite element analysis. A surrogate model is widely used to reduce the computational cost under an iterative computational analysis. Although a variety of the methods have been widely investigated, there are still difficulties in surrogate modeling from a practical point of view: (1) How to derive optimal design of experiments (i.e., the number of training samples and their locations); and (2) diagnostics of the surrogate model. To overcome these difficulties, we propose a sequential surrogate modeling based on Gaussian process model (GPM) with self-adaptive sampling. The proposed approach not only enables further sampling to make GPM more accurate, but also evaluates the model adequacy within a sequential framework. The applicability of the proposed approach is first demonstrated by using mathematical test functions. Then, it is applied as a substitute of the iterative finite element analysis to Monte Carlo simulation for a response uncertainty analysis under correlated input uncertainties. In all numerical studies, it is successful to build GPM automatically with the minimal user intervention. The proposed approach can be customized for the various response surfaces and help a less experienced user save his/her efforts.