• Title/Summary/Keyword: Ordinary statistical data analysis

Search Result 69, Processing Time 0.023 seconds

Inappropriate Survey Design Analysis of the Korean National Health and Nutrition Examination Survey May Produce Biased Results

  • Kim, Yangho;Park, Sunmin;Kim, Nam-Soo;Lee, Byung-Kook
    • Journal of Preventive Medicine and Public Health
    • /
    • v.46 no.2
    • /
    • pp.96-104
    • /
    • 2013
  • Objectives: The inherent nature of the Korean National Health and Nutrition Examination Survey (KNHANES) design requires special analysis by incorporating sample weights, stratification, and clustering not used in ordinary statistical procedures. Methods: This study investigated the proportion of research papers that have used an appropriate statistical methodology out of the research papers analyzing the KNHANES cited in the PubMed online system from 2007 to 2012. We also compared differences in mean and regression estimates between the ordinary statistical data analyses without sampling weight and design-based data analyses using the KNHANES 2008 to 2010. Results: Of the 247 research articles cited in PubMed, only 19.8% of all articles used survey design analysis, compared with 80.2% of articles that used ordinary statistical analysis, treating KNHANES data as if it were collected using a simple random sampling method. Means and standard errors differed between the ordinary statistical data analyses and design-based analyses, and the standard errors in the design-based analyses tended to be larger than those in the ordinary statistical data analyses. Conclusions: Ignoring complex survey design can result in biased estimates and overstated significance levels. Sample weights, stratification, and clustering of the design must be incorporated into analyses to ensure the development of appropriate estimates and standard errors of these estimates.

Spatial Data Analysis using the Kriging Method

  • Jang, Jihui;Hong, Taekyong;NamKung, Pyong
    • Communications for Statistical Applications and Methods
    • /
    • v.10 no.2
    • /
    • pp.423-432
    • /
    • 2003
  • The data observed at different positions are called the estimate of interested variable at new observation point on the Kriging utilize the space estimate technique, in which case there is correlation spatially. In this paper we provide the estimate for Variogram and Kriging methods as a field of kriging theory and dealt with actually measured data. And at the same time we forecast the amount of ozone that was not measured at this point by Kriging method and compared Ordinary Kriging method with Inverse Distance Kriging method.

Analysis of periodontal data using mixed effects models

  • Cho, Young Il;Kim, Hae-Young
    • Journal of Periodontal and Implant Science
    • /
    • v.45 no.1
    • /
    • pp.2-7
    • /
    • 2015
  • A fundamental problem in analyzing complex multilevel-structured periodontal data is the violation of independency among the observations, which is an assumption in traditional statistical models (e.g., analysis of variance and ordinary least squares regression). In many cases, aggregation (i.e., mean or sum scores) has been employed to overcome this problem. However, the aggregation approach still exhibits certain limitations, such as a loss of power and detailed information, no cross-level relationship analysis, and the potential for creating an ecological fallacy. In order to handle multilevel-structured data appropriately, mixed effects models have been introduced and employed in dental research using periodontal data. The use of mixed effects models might account for the potential bias due to the violation of the independency assumption as well as provide accurate estimates.

Fuzzy k-Means Local Centers of the Social Networks

  • Woo, Won-Seok;Huh, Myung-Hoe
    • Communications for Statistical Applications and Methods
    • /
    • v.19 no.2
    • /
    • pp.213-217
    • /
    • 2012
  • Fuzzy k-means clustering is an attractive alternative to the ordinary k-means clustering in analyzing multivariate data. Fuzzy versions yield more natural output by allowing overlapped k groups. In this study, we modify a fuzzy k-means clustering algorithm to be used for undirected social networks, apply the algorithm to both real and simulated cases, and report the results.

Veri cation of Improving a Clustering Algorith for Microarray Data with Missing Values

  • Kim, Su-Young
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.2
    • /
    • pp.315-321
    • /
    • 2011
  • Gene expression microarray data often include multiple missing values. Most gene expression analysis (including gene clustering analysis); however, require a complete data matric as an input. In ordinary clustering methods, just a single missing value makes one abandon the whole data of a gene even if the rest of data for that gene was intact. The quality of analysis may decrease seriously as the missing rate is increased. In the opposite aspect, the imputation of missing value may result in an artifact that reduces the reliability of the analysis. To clarify this contradiction in microarray clustering analysis, this paper compared the accuracy of clustering with and without imputation over several microarray data having different missing rates. This paper also tested the clustering efficiency of several imputation methods including our propose algorithm. The results showed it is worthwhile to check the clustering result in this alternative way without any imputed data for the imperfect microarray data.

Forecasting Symbolic Candle Chart-Valued Time Series

  • Park, Heewon;Sakaori, Fumitake
    • Communications for Statistical Applications and Methods
    • /
    • v.21 no.6
    • /
    • pp.471-486
    • /
    • 2014
  • This study introduces a new type of symbolic data, a candle chart-valued time series. We aggregate four stock indices (i.e., open, close, highest and lowest) as a one data point to summarize a huge amount of data. In other words, we consider a candle chart, which is constructed by open, close, highest and lowest stock indices, as a type of symbolic data for a long period. The proposed candle chart-valued time series effectively summarize and visualize a huge data set of stock indices to easily understand a change in stock indices. We also propose novel approaches for the candle chart-valued time series modeling based on a combination of two midpoints and two half ranges between the highest and the lowest indices, and between the open and the close indices. Furthermore, we propose three types of sum of square for estimation of the candle chart valued-time series model. The proposed methods take into account of information from not only ordinary data, but also from interval of object, and thus can effectively perform for time series modeling (e.g., forecasting future stock index). To evaluate the proposed methods, we describe real data analysis consisting of the stock market indices of five major Asian countries'. We can see thorough the results that the proposed approaches outperform for forecasting future stock indices compared with classical data analysis.

Improved Statistical Testing of Two-class Microarrays with a Robust Statistical Approach

  • Oh, Hee-Seok;Jang, Dong-Ik;Oh, Seung-Yoon;Kim, Hee-Bal
    • Interdisciplinary Bio Central
    • /
    • v.2 no.2
    • /
    • pp.4.1-4.6
    • /
    • 2010
  • The most common type of microarray experiment has a simple design using microarray data obtained from two different groups or conditions. A typical method to identify differentially expressed genes (DEGs) between two conditions is the conventional Student's t-test. The t-test is based on the simple estimation of the population variance for a gene using the sample variance of its expression levels. Although empirical Bayes approach improves on the t-statistic by not giving a high rank to genes only because they have a small sample variance, the basic assumption for this is same as the ordinary t-test which is the equality of variances across experimental groups. The t-test and empirical Bayes approach suffer from low statistical power because of the assumption of normal and unimodal distributions for the microarray data analysis. We propose a method to address these problems that is robust to outliers or skewed data, while maintaining the advantages of the classical t-test or modified t-statistics. The resulting data transformation to fit the normality assumption increases the statistical power for identifying DEGs using these statistics.

ImprovementofMLLRAlgorithmforRapidSpeakerAdaptationandReductionofComputation (빠른 화자 적응과 연산량 감소를 위한 MLLR알고리즘 개선)

  • Kim, Ji-Un;Chung, Jae-Ho
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.1C
    • /
    • pp.65-71
    • /
    • 2004
  • We improved the MLLR speaker adaptation algorithm with reduction of the order of HMM parameters using PCA(Principle Component Analysis) or ICA(Independent Component Analysis). To find a smaller set of variables with less redundancy, we adapt PCA(principal component analysis) and ICA(independent component analysis) that would give as good a representation as possible, minimize the correlations between data elements, and remove the axis with less covariance or higher-order statistical independencies. Ordinary MLLR algorithm needs more than 30 seconds adaptation data to represent higher word recognition rate of SD(Speaker Dependent) models than of SI(Speaker Independent) models, whereas proposed algorithm needs just more than 10 seconds adaptation data. 10 components for ICA and PCA represent similar performance with 36 components for ordinary MLLR framework. So, compared with ordinary MLLR algorithm, the amount of total computation requested in speaker adaptation is reduced by about 1/167 in proposed MLLR algorithm.

Sensitivity Analysis in Latent Root Regression

  • Shin, Jae-Kyoung;Tomoyuki Tarumi;Yutaka Tanaka
    • Communications for Statistical Applications and Methods
    • /
    • v.1 no.1
    • /
    • pp.102-111
    • /
    • 1994
  • We Propose a method of sensitivity analysis in latent root regression analysis (LRRA). For this purpose we derive the quantities ${\beta\limits^\wedge \;_{LRR}}^{(1)}$, which correspond to the theoretical influence function $I(x, y \;;\;\beta\limits^\wedge \;_{LRR})$ for the regression coefficient ${\beta\limits^\wedge}_{LRR}$ based on LRRA. We give a numerical example for illustration and also investigate numerically the relationship between the estimated values of ${\beta\limits^\wedge \;_{LRR}}^{(1)}$ with the values of the other measures called sample influence curve(SIC) based on the recomputation for the data with a single observation deleted. We also discuss the comparision among the results of LRRA, ordinary least square regression analysis (OLSRA) and ridge regression analysis(RRA).

  • PDF

Analysis of Quasi-Likelihood Models using SAS/IML

  • Ha, Il-Do
    • Journal of the Korean Data and Information Science Society
    • /
    • v.8 no.2
    • /
    • pp.247-260
    • /
    • 1997
  • The quasi-likelihood models which greatly widened the scope of generalized linear models are widely used in data analysis where a likelihood is not available. Since a quasi-likelihood may not appear to be an ordinary likelihood for any known distribution in the natural exponential family, to fit the quasi-likelihood models the standard statistical packages such as GLIM, GENSTAT, S-PLUS and so on may not directly applied. SAS/IML is very useful for fitting of such models. In this paper, we present simple SAS/IML(version 6.11) program which helps to fit and analyze the quasi-likelihood models applied to the leaf-blotch data introduced by Wedderburn(1974), and the problem with deviance useful generally to model checking is pointed out, and then its solution method is mention through the data analysis based on this quasi-likelihood models checking.

  • PDF