• Title/Summary/Keyword: Number of Sample Size

Search Result 584, Processing Time 0.025 seconds

Estimation of Gini-Simpson index for SNP data

  • Kang, Joonsung
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.6
    • /
    • pp.1557-1564
    • /
    • 2017
  • We take genomic sequences of high-dimensional low sample size (HDLSS) without ordering of response categories into account. When constructing an appropriate test statistics in this model, the classical multivariate analysis of variance (MANOVA) approach might not be useful owing to very large number of parameters and very small sample size. For these reasons, we present a pseudo marginal model based upon the Gini-Simpson index estimated via Bayesian approach. In view of small sample size, we consider the permutation distribution by every possible n! (equally likely) permutation of the joined sample observations across G groups of (sizes $n_1,{\ldots}n_G$). We simulate data and apply false discovery rate (FDR) and positive false discovery rate (pFDR) with associated proposed test statistics to the data. And we also analyze real SARS data and compute FDR and pFDR. FDR and pFDR procedure along with the associated test statistics for each gene control the FDR and pFDR respectively at any level ${\alpha}$ for the set of p-values by using the exact conditional permutation theory.

Cumulative Sequential Control Charts with Sample Size Bound (표본크기에 제약이 있는 누적 축차관리도)

  • Chang, Young-Soon;Bai, Do-Sun
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.25 no.4
    • /
    • pp.448-458
    • /
    • 1999
  • This paper proposes sequential control charts with an upper bound on sample size. Existing sequential control charts have no restriction on the number of observations at a sampling point. For situations where sampling and testing an item is time-consuming or expensive, sequential control charts may not be directly applied. When the number of observations in a sampling point reaches the upper bound and there is no out-of-control signal, the proposed cumulative sequential control chart defers the decision to the next sampling point of which starting value is the value of the current statistic. Two Markov chains, inner and outer chains, are used to derive the formulas for evaluating the performance of the proposed chart. It is compared with $\bar{X}$ and cumulative sum control charts with fixed and variable sample sizes. The fast initial response (FIR) feature is studied. Guidelines for the design of the proposed charts are also given.

  • PDF

Sensitivity analysis of serological tests for detection of disease in cattle (소 질병 검출을 위한 혈청학적 검사의 민감도 평가)

  • Lee, Sang-Jin;Moon, Oun-Kyong;Pak, Son-Il
    • Korean Journal of Veterinary Research
    • /
    • v.50 no.1
    • /
    • pp.43-48
    • /
    • 2010
  • Animal disease surveillance system, defined as the continuous investigation of a given population to detect the occurrence of disease or infection for control purposes, has been key roles to assess the health status of an animal population and, more recently, in international trade of animal and animal products with regard to risk assessment. Especially, for a system aiming to determine whether or not a disease is present in a population sensitivity of the system should be maintained high enough not to miss an infected animal. Therefore, when planning the implementation of surveillance system a number of factors that affecting surveillance sensitivity should be taken into account. Of these parameters sample size is of important, and different approaches are used to calculate sample size, usually depending on the objective of surveillance systems. The purpose of this study was to evaluate the sensitivity of the current national serological surveillance programs for four selected bovine diseases assuming a specified sampling plan, to examine factors affecting the probability of detection, and to provide sample sizes required for achieving surveillance goal of detecting at least an infection in a given population. Our results showed that, for example, detecting low level of prevalence (0.2% for bovine tuberculosis) requires selection of all animals per typical Korean cattle farm (n = 17), and thus risk-based target surveillance for high risk groups can be an alternative strategy to increase sensitivity while not increasing overall sampling efforts. The minimum sample size required for detecting at least one positive animal was sharply increased as the disease prevalence is low. More importantly, high reliability of prevalence estimation was expected with increased sampling fraction even when zero-infected animal was identified. The effect of sample size is also discussed in terms of the maximum prevalence when zero-infected animals were identified and on the probability of failure to detect an infection. We suggest that for many serological surveillance systems, diagnostic performance of the testing method, sample size, prevalence, population size, and statistical confidence need to be considered to correctly interpret results of the system.

A Study of Sample Size for Two-Stage Cluster Sampling (이단계 집락추출에서의 표본크기에 대한 연구)

  • Song, Jong-Ho;Jea, Hea-Sung;Park, Min-Gue
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.2
    • /
    • pp.393-400
    • /
    • 2011
  • In a large scale survey, cluster sampling design in which a set of observation units called clusters are selected is often used to satisfy practical restrictions on time and cost. Especially, a two stage cluster sampling design is preferred when a strong intra-class correlation exists among observation units. The sample Primary Sampling Unit(PSU) and Secondary Sampling Unit(SSU) size for a two stage cluster sample is determined by the survey cost and precision of the estimator calculated. For this study, we derive the optimal sample PSU and SSU size when the population SSU size across the PSU are di erent by extending the result obtained under the assumption that all PSU have the same number of SSU. The results on the sample size are then applied to the $4^{th}$ Korea Hospital Discharge results and is compared to the conventional method. We also propose the optimal sample SSU (discharged patients) size for the $7^{th}$ Korea Hospital Discharge Survey.

Adjustment of Control Limits for Geometric Charts

  • Kim, Byung Jun;Lee, Jaeheon
    • Communications for Statistical Applications and Methods
    • /
    • v.22 no.5
    • /
    • pp.519-530
    • /
    • 2015
  • The geometric chart has proven more effective than Shewhart p or np charts to monitor the proportion nonconforming in high-quality processes. Implementing a geometric chart commonly requires the assumption that the in-control proportion nonconforming is known or accurately estimated. However, accurate parameter estimation is very difficult and may require a larger sample size than that available in practice in high-quality process where the proportion of nonconforming items is very small. Thus, the error in the parameter estimation increases and may lead to deterioration in the performance of the control chart if a sample size is inadequate. We suggest adjusting the control limits in order to improve the performance when a sample size is insufficient to estimate the parameter. We propose a linear function for the adjustment constant, which is a function of the sample size, the number of nonconforming items in a sample, and the false alarm rate. We also compare the performance of the geometric charts without and with adjustment using the expected value of the average run length (ARL) and the standard deviation of the ARL (SDARL).

A Study on Minimum Number of Ship-handling Simulation Required for Evaluating Vessel's Proximity Measure

  • Jeong, Tae-Gweon;Pan, Bao-Feng
    • Journal of Navigation and Port Research
    • /
    • v.38 no.6
    • /
    • pp.689-694
    • /
    • 2014
  • The Korean government has introduced and enforced maritime traffic safety assessment to secure traffic safety since 2010. The maritime traffic safety assessment is needed by law to design a new port or modify an existing one. According to Korea Maritime Safety Act, in the assessment the propriety of marine traffic system consists of the safety of channel transit and berthing/unberthing maneuver, safety of mooring, and safety of marine traffic flow. The safety of channel transit and berthing/unberthing maneuver can be evaluated only by ship-handling simulation. The ship-handling simulation is carried out by sea pilots working with the port concerned. The vessel's proximity measure is an important factor to evaluate traffic safety. The proximity measure is composed of vessel's closest distance to channel boundary and probability of grounding/collision. What is more, the probability of grounding becomes important. According to central limit theorem, a sample has a normal distribution on condition that its size is more than 30. However, more than 30 simulation runs bring about the increase of assessment period and difficulty of employing sea pilots. Therefore this paper is to find out minimum sample size for evaluating vessel's proximity. First sample sets of size of 3, 5, 7, 9 etc. are selected randomly on the basis of normal distribution. And then KS test for goodness of fit and t-test for confidence interval are applied to each sample set. Finally this paper decides the minimum sample size. As a result this paper suggests the minimum sample size of 5, that is, the simulation of more than five times.

Smoothed Local PC0A by BYY data smoothing learning

  • Liu, Zhiyong;Xu, Lei
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2001.10a
    • /
    • pp.109.3-109
    • /
    • 2001
  • The so-called curse of dimensionality arises when Gaussian mixture is used on high-dimensional small-sample-size data, since the number of free elements that needs to be specied in each covariance matrix of Gaussian mixture increases exponentially with the number of dimension d. In this paper, by constraining the covariance matrix in its decomposed orthonormal form we get a local PCA model so as to reduce the number of free elements needed to be specified. Moreover, to cope with the small sample size problem, we adopt BYY data smoothing learning which is a regularization over maximum likelihood learning obtained from BYY harmony learning to implement this local PCA model.

  • PDF

Probabilistic analysis of efficiencies for sorting algorithms with a finite number of records based on an asymptotic algorithm analysis (점근적 분석 모형에 기초한 유한개 레코드 정렬 알고리즘 효율성의 확률적 분석)

  • 김숙영
    • Journal of the Korea Computer Industry Society
    • /
    • v.5 no.2
    • /
    • pp.325-330
    • /
    • 2004
  • The Big O notation of a sorting algorithm analysis is an asymptotic algorithm analysis which gives information of a rough mathematical function with an infinite increase of a sample size, without any specification of a probabilistic model. Hence. in an application with a limited finite number of data, it is necessary to test efficiencies of sorting algorithms. I estimated probabilistic models which analyze the number of exchanges varying input sizes to sort. The estimated models to explain the relationship of sorting efficiency on the sample size (N of the sample size and S of the number of exchange of elements) are S=0.9305 $N^{1.339}$ for Quick sort algorithm with O(nlogn) time complexity, and S=0.2232 $N^{2.0130}$ for Insertion sort algorithm with O( $n^2$) time complexity. Furthermore, there are strongly supports that more than 99% of the above relationship could be explained by the estimated models (p<0.001). These findings suggest it is necessary to analyze sorting algorithm efficiency in applications with a finite number of data or a newly developed sorting algorithm.

  • PDF

A note on the sample size determination of sequential and multistage procedures

  • Choi, Kiheon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.23 no.6
    • /
    • pp.1279-1287
    • /
    • 2012
  • We particularly emphasized how to determine the number of replications with sequential and multistage procedures. So, the t-test is used to achieve some predetermined level of accuracy efficiently with loss function in the case of normal, chi-squared, an exponential distributions. We provided that the relevance of procedures are sequential procedure, two-stage procedure, modified two-stage procedure, three-stage procedure and accelerated sequential procedure. Monte Carlo simulation is carried out to obtain the stopping sample size that minimizes the risk.

A Sampling Design for the livestock (Korean Native Beef Cattle, Milk Cow, Pig, Chicken) Statistics (가축통계 표본조사설계)

  • 윤기중;박상언
    • The Korean Journal of Applied Statistics
    • /
    • v.11 no.2
    • /
    • pp.233-246
    • /
    • 1998
  • We made a sample design for next 5 years, based on the population as of 1995, for livestock statistics. In the sample design, we used the stratified one stage sampling method where the sample size depends on the prefixed coefficient of variation. In stratifying the population, we considered the complete linkage method, and decided the number of strata to be the one which yields the minimum sample size. We listed here some difficulties we had for the better sample design in the future.

  • PDF