• Title/Summary/Keyword: statistical data analysis

Search Result 9,252, Processing Time 0.038 seconds

Inappropriate Survey Design Analysis of the Korean National Health and Nutrition Examination Survey May Produce Biased Results

  • Kim, Yangho;Park, Sunmin;Kim, Nam-Soo;Lee, Byung-Kook
    • Journal of Preventive Medicine and Public Health
    • /
    • v.46 no.2
    • /
    • pp.96-104
    • /
    • 2013
  • Objectives: The inherent nature of the Korean National Health and Nutrition Examination Survey (KNHANES) design requires special analysis by incorporating sample weights, stratification, and clustering not used in ordinary statistical procedures. Methods: This study investigated the proportion of research papers that have used an appropriate statistical methodology out of the research papers analyzing the KNHANES cited in the PubMed online system from 2007 to 2012. We also compared differences in mean and regression estimates between the ordinary statistical data analyses without sampling weight and design-based data analyses using the KNHANES 2008 to 2010. Results: Of the 247 research articles cited in PubMed, only 19.8% of all articles used survey design analysis, compared with 80.2% of articles that used ordinary statistical analysis, treating KNHANES data as if it were collected using a simple random sampling method. Means and standard errors differed between the ordinary statistical data analyses and design-based analyses, and the standard errors in the design-based analyses tended to be larger than those in the ordinary statistical data analyses. Conclusions: Ignoring complex survey design can result in biased estimates and overstated significance levels. Sample weights, stratification, and clustering of the design must be incorporated into analyses to ensure the development of appropriate estimates and standard errors of these estimates.

On Nonparametric Estimation of Data Edges

  • Park, Byeong U.
    • Journal of the Korean Statistical Society
    • /
    • v.30 no.2
    • /
    • pp.265-280
    • /
    • 2001
  • Estimation of the edge of a distribution has many important applications. It is related to classification, cluster analysis, neural network, and statistical image recovering. The problem also arises in measuring production efficiency in economic systems. Three most promising nonparametric estimators in the existing literature are introduced. Their statistical properties are provided, some of which are new. Themes of future study are also discussed.

  • PDF

Statistical Errors of Articles Published in the Journal of Oriental Rehabilitation Medicine(I) (한방재활의학과학회지의 통계적 오류에 관한 고찰(I))

  • Park, Tae-Yong;Heo, Tae-Young;Shin, Byung-Cheul
    • Journal of Korean Medicine Rehabilitation
    • /
    • v.20 no.4
    • /
    • pp.105-130
    • /
    • 2010
  • Objectives : The purpose of this study was to assess the statistical methods errors used in the journal of Oriental Rehabilitation Medicine(JORM) and to identify the types of errors in statistical analysis. Methods : We reviewed quantitative articles that were published in the JORM from January 2005 through October 2009. Those were not used by statistical analysis such as literature studies, case study, review articles were not included in this analysis. A total of 296 articles was reviewed. We evaluated the adequacy and the validity of the statistical techniques with our checklist established be modified Lee's checklist, and three statistical evaluators assessed together to minimize bias. Results : Of the 222 articles, 213 were used in inferential and descriptive statistics. Of those 80% of articles adopting descriptive and inferential statistics were detected having statistical errors. One articles used 1.7 statistical method unit generally. Most frequently employed statistics were student t-test, one way ANOVA. pearson correlation analysis, Mann-whitney U test, paired t-test, and chi-square test in their order. However, most frequent statistics having errors were similar in order. The most common statistic errors were as follow: 1. absence of normality test, 2. misuse between paired test and unpaired test, 3. wrong choice of repeated measures analysis without consideration of time variables, 4, increase of Type I error by using inappropriate multiple test, 5. inappropriate application of discrete or categorical data instead of continuous data in correlation analysis, 6. poor consideration of basic consumption in chi-square test, 7. confusion between frequency comparison and average comparison, 8. mentioning the statistical technique without using it. Conclusions : We found various mistake or misuses in the applications of statistical methodologies in the articles published in the JORM. Careful consideration of statistical use and review from the specialist of statistics are warranted for improving the quality of JORM.

Comparing Data Access Methods in Statistical Packages (통계 패키지에서의 데이터 접근 방식 비교)

  • Kang, Gun-Seog
    • Communications for Statistical Applications and Methods
    • /
    • v.16 no.3
    • /
    • pp.437-447
    • /
    • 2009
  • Recently, in addition to analyzing data with appropriate statistical methods, statistical analysts in the industrial fields face difficulties that they have to compose proper datasets for analysis objectives via extracting or generating processes from diverse data storage devices. In this paper we survey and compare many state-of-the-art data access technologies adopted by several commonly used statistical packages. More understanding of these technologies will help to reduce the costs occurring when analyzing large size of datasets in especially data mining works, and so to allow more time in applying statistical analysis methods.

A Historical Study on Statistical Packages in Cluster Analysis

  • 이승우
    • Journal for History of Mathematics
    • /
    • v.11 no.1
    • /
    • pp.52-57
    • /
    • 1998
  • Since cluster analysis encompasses many diverse techniques for discovering structure within complex bodies of data, it has been employed as an effective tool in scientific inquiry. Recent works on cluster analysis softwares carried out by SAS, SPSS, S-PLUS and BMDP are briefly summarized and investigated in this paper. The inferred statistical package for windows executing a nay for data analysis in modern statistical techniques has several merits superior to other packages. Especially, S-PLUS can be designed and tried out much faster than other statistical packages. S-PLUS provides a graphic which is interactive, informative, flexible ways of looking at data. Also, if a statistical computation time is long and programs are complex, these can be shorten by providing interfaces to the UNIX systems (or C, Fortran).

  • PDF

R programming: Language and Environment for Statistical Computing and Data Visualization (R 프로그래밍: 통계 계산과 데이터 시각화를 위한 환경)

  • Lee, D.H.;Ren, Ye
    • Electronics and Telecommunications Trends
    • /
    • v.28 no.1
    • /
    • pp.42-51
    • /
    • 2013
  • The R language is an open source programming language and a software environment for statistical computing and data visualization. The R language is widely used among a lot of statisticians and data scientists to develop statistical software and data analysis. The R language provides a variety of statistical and graphical techniques, including basic descriptive statistics, linear or nonlinear modeling, conventional or advanced statistical tests, time series analysis, clustering, simulation, and others. In this paper, we first introduce the R language and investigate its features as a data analytics tool. As results, we may explore the application possibility of the R language in the field of data analytics.

  • PDF

Comparative Study on Statistical Packages for using Multivariate Q-technique

  • Choi, Yong-Seok;Moon, Hee-jung
    • Communications for Statistical Applications and Methods
    • /
    • v.10 no.2
    • /
    • pp.433-443
    • /
    • 2003
  • In this study, we provide a comparison of multivariate Q-techniques in the up-to-date versions of SAS, SPSS, Minitab and S-plus well known to those who study statistics. We can analyze data through the direct Input method(command) in SAS and use of menu method in SPSS, Minitab and S-plus. The analysis performance method is chosen by the high frequency of use. Widely we compare with each Q-techniques form according to input data, input option, statistical chart and statistical output.

A study on statistical data analysis by microcomputers (마이크로 컴퓨터에 의한 통계자료분석(統計資料分析)에 관한 연구(硏究))

  • Park, Seong-Hyeon
    • Journal of Korean Society for Quality Management
    • /
    • v.13 no.1
    • /
    • pp.12-19
    • /
    • 1985
  • First of all, the necessity of statistical packages, and the strengths and weaknesses of microcomputers for statistical data ana!ysis are examined in this paper. Secondly, some statistical packages available for microcomputers in the international market are introduced, and the contents of two statistical packages developed by the author are presented.

  • PDF

Applications of response dimension reduction in large p-small n problems

  • Minjee Kim;Jae Keun Yoo
    • Communications for Statistical Applications and Methods
    • /
    • v.31 no.2
    • /
    • pp.191-202
    • /
    • 2024
  • The goal of this paper is to show how multivariate regression analysis with high-dimensional responses is facilitated by the response dimension reduction. Multivariate regression, characterized by multi-dimensional response variables, is increasingly prevalent across diverse fields such as repeated measures, longitudinal studies, and functional data analysis. One of the key challenges in analyzing such data is managing the response dimensions, which can complicate the analysis due to an exponential increase in the number of parameters. Although response dimension reduction methods are developed, there is no practically useful illustration for various types of data such as so-called large p-small n data. This paper aims to fill this gap by showcasing how response dimension reduction can enhance the analysis of high-dimensional response data, thereby providing significant assistance to statistical practitioners and contributing to advancements in multiple scientific domains.

Exploratory Methods for Joint Distribution Valued Data and Their Application

  • Igarashi, Kazuto;Minami, Hiroyuki;Mizuta, Masahiro
    • Communications for Statistical Applications and Methods
    • /
    • v.22 no.3
    • /
    • pp.265-276
    • /
    • 2015
  • In this paper, we propose hierarchical cluster analysis and multidimensional scaling for joint distribution valued data. Information technology is increasing the necessity of statistical methods for large and complex data. Symbolic Data Analysis (SDA) is an attractive framework for the data. In SDA, target objects are typically represented by aggregated data. Most methods on SDA deal with objects represented as intervals and histograms. However, those methods cannot consider information among variables including correlation. In addition, objects represented as a joint distribution can contain information among variables. Therefore, we focus on methods for joint distribution valued data. We expanded the two well-known exploratory methods using the dissimilarities adopted Hall Type relative projection index among joint distribution valued data. We show a simulation study and an actual example of proposed methods.