• Title/Summary/Keyword: Performance-based Statistics

Search Result 1,048, Processing Time 0.021 seconds

Multivariable Bayesian curve-fitting under functional measurement error model

  • Hwang, Jinseub;Kim, Dal Ho
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.6
    • /
    • pp.1645-1651
    • /
    • 2016
  • A lot of data, particularly in the medical field, contain variables that have a measurement error such as blood pressure and body mass index. On the other hand, recently smoothing methods are often used to solve a complex scientific problem. In this paper, we study a Bayesian curve-fitting under functional measurement error model. Especially, we extend our previous model by incorporating covariates free of measurement error. In this paper, we consider penalized splines for non-linear pattern. We employ a hierarchical Bayesian framework based on Markov Chain Monte Carlo methodology for fitting the model and estimating parameters. For application we use the data from the fifth wave (2012) of the Korea National Health and Nutrition Examination Survey data, a national population-based data. To examine the convergence of MCMC sampling, potential scale reduction factors are used and we also confirm a model selection criteria to check the performance.

Signal Reconstruction by Synchrosqueezed Wavelet Transform

  • Park, Minsu;Oh, Hee-Seok;Kim, Donghoh
    • Communications for Statistical Applications and Methods
    • /
    • v.22 no.2
    • /
    • pp.159-172
    • /
    • 2015
  • This paper considers the problem of reconstructing an underlying signal from noisy data. This paper presents a reconstruction method based on synchrosqueezed wavelet transform recently developed for multiscale representation. Synchrosqueezed wavelet transform based on continuous wavelet transform is efficient to estimate the instantaneous frequency of each component that consist of a signal and to reconstruct components. However, an objective selection method for the optimal number of intrinsic mode type functions is required. The proposed method is obtained by coupling the synchrosqueezed wavelet transform with cross-validation scheme. Simulation studies and musical instrument sounds are used to compare the empirical performance of the proposed method with existing methods.

Software Reliability for Order Statistic of Burr XII Distribution

  • Lee, Jae-Un;Yoon, Sang-Chul
    • Journal of the Korean Data and Information Science Society
    • /
    • v.19 no.4
    • /
    • pp.1361-1369
    • /
    • 2008
  • The analysis of software reliability model provides the means to analysts, software engineers, and systems analysts and developers who want to predict, estimate, and measure failure rate of occurrences in software. In this paper, reliability growth model, in which the operating time between successive failure is a continuous random variable, is proposed. This model is based on order statistics of two parameters Burr type XII distribution. We propose the measure based on U-plot. Also the performance of the suggested model is tested on real data set.

  • PDF

Monitoring social networks based on transformation into categorical data

  • Lee, Joo Weon;Lee, Jaeheon
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.4
    • /
    • pp.487-498
    • /
    • 2022
  • Social network analysis (SNA) techniques have recently been developed to monitor and detect abnormal behaviors in social networks. As a useful tool for process monitoring, control charts are also useful for network monitoring. In this paper, the degree and closeness centrality measures, in which each has global and local perspectives, respectively, are applied to an exponentially weighted moving average (EWMA) chart and a multinomial cumulative sum (CUSUM) chart for monitoring undirected weighted networks. In general, EWMA charts monitor only one variable in a single chart, whereas multinomial CUSUM charts can monitor a categorical variable, in which several variables are transformed through classification rules, in a single chart. To monitor both degree centrality and closeness centrality simultaneously, we categorize them based on the average of each measure and then apply to the multinomial CUSUM chart. In this case, the global and local attributes of the network can be monitored simultaneously with a single chart. We also evaluate the performance of the proposed procedure through a simulation study.

Comparison of tree-based ensemble models for regression

  • Park, Sangho;Kim, Chanmin
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.5
    • /
    • pp.561-589
    • /
    • 2022
  • When multiple classifications and regression trees are combined, tree-based ensemble models, such as random forest (RF) and Bayesian additive regression trees (BART), are produced. We compare the model structures and performances of various ensemble models for regression settings in this study. RF learns bootstrapped samples and selects a splitting variable from predictors gathered at each node. The BART model is specified as the sum of trees and is calculated using the Bayesian backfitting algorithm. Throughout the extensive simulation studies, the strengths and drawbacks of the two methods in the presence of missing data, high-dimensional data, or highly correlated data are investigated. In the presence of missing data, BART performs well in general, whereas RF provides adequate coverage. The BART outperforms in high dimensional, highly correlated data. However, in all of the scenarios considered, the RF has a shorter computation time. The performance of the two methods is also compared using two real data sets that represent the aforementioned situations, and the same conclusion is reached.

Selecting Ordering Policy and Items Classification Based on Canonical Correlation and Cluster Analysis

  • Nagasawa, Keisuke;Irohara, Takashi;Matoba, Yosuke;Liu, Shuling
    • Industrial Engineering and Management Systems
    • /
    • v.11 no.2
    • /
    • pp.134-141
    • /
    • 2012
  • It is difficult to find an appropriate ordering policy for a many types of items. One of the reasons for this difficulty is that each item has a different demand trend. We will classify items by shipment trend and then decide the ordering policy for each item category. In this study, we indicate that categorizing items from their statistical characteristics leads to an ordering policy suitable for that category. We analyze the ordering policy and shipment trend and propose a new method for selecting the ordering policy which is based on finding the strongest relation between the classification of the items and the ordering policy. In our numerical experiment, from actual shipment data of about 5,000 items over the past year, we calculated many statistics that represent the trend of each item. Next, we applied the canonical correlation analysis between the evaluations of ordering policies and the various statistics. Furthermore, we applied the cluster analysis on the statistics concerning the performance of ordering policies. Finally, we separate items into several categories and show that the appropriate ordering policies are different for each category.

Multiple Group Testing Procedures for Analysis of High-Dimensional Genomic Data

  • Ko, Hyoseok;Kim, Kipoong;Sun, Hokeun
    • Genomics & Informatics
    • /
    • v.14 no.4
    • /
    • pp.187-195
    • /
    • 2016
  • In genetic association studies with high-dimensional genomic data, multiple group testing procedures are often required in order to identify disease/trait-related genes or genetic regions, where multiple genetic sites or variants are located within the same gene or genetic region. However, statistical testing procedures based on an individual test suffer from multiple testing issues such as the control of family-wise error rate and dependent tests. Moreover, detecting only a few of genes associated with a phenotype outcome among tens of thousands of genes is of main interest in genetic association studies. In this reason regularization procedures, where a phenotype outcome regresses on all genomic markers and then regression coefficients are estimated based on a penalized likelihood, have been considered as a good alternative approach to analysis of high-dimensional genomic data. But, selection performance of regularization procedures has been rarely compared with that of statistical group testing procedures. In this article, we performed extensive simulation studies where commonly used group testing procedures such as principal component analysis, Hotelling's $T^2$ test, and permutation test are compared with group lasso (least absolute selection and shrinkage operator) in terms of true positive selection. Also, we applied all methods considered in simulation studies to identify genes associated with ovarian cancer from over 20,000 genetic sites generated from Illumina Infinium HumanMethylation27K Beadchip. We found a big discrepancy of selected genes between multiple group testing procedures and group lasso.

A Penalized Spline Based Method for Detecting the DNA Copy Number Alteration in an Array-CGH Experiment

  • Kim, Byung-Soo;Kim, Sang-Cheol
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.1
    • /
    • pp.115-127
    • /
    • 2009
  • The purpose of statistical analyses of array-CGH experiment data is to divide the whole genome into regions of equal copy number, to quantify the copy number in each region and finally to evaluate its significance of being different from two. Several statistical procedures have been proposed which include the circular binary segmentation, and a Gaussian based local regression for detecting break points (GLAD) by estimating a piecewise constant function. We propose in this note a penalized spline regression and its simultaneous confidence band(SCB) approach to evaluate the statistical significance of regions of genetic gain/loss. The region of which the simultaneous confidence band stays above 0 or below 0 can be considered as a region of genetic gain or loss. We compare the performance of the SCB procedure with GLAD and hidden Markov model approaches through a simulation study in which the data were generated from AR(1) and AR(2) models to reflect spatial dependence of the array-CGH data in addition to the independence model. We found that the SCB method is more sensitive in detecting the low level copy number alterations.

Predicting Survival of DLBCL Patients in Pathway-Based Microarray Analysis (DLBCL 환자의 대사경로 정보를 이용한 생존예측)

  • Lee, Kwang-Hyun;Lee, Sun-Ho
    • The Korean Journal of Applied Statistics
    • /
    • v.23 no.4
    • /
    • pp.705-713
    • /
    • 2010
  • Predicting survival from microarray data is not easy due to the problem of high dimensionality of data and the existence of censored observations. Also the limitation of individual gene analysis causes the shift of focus to the level of gene sets with functionally related genes. For developing a survival prediction model based on pathway information, the methods for selecting a supergene using principal component analysis and testing its significance for each pathway are discussed. Besides, the performance of gene filtering is compared.

A study on solar energy forecasting based on time series models (시계열 모형과 기상변수를 활용한 태양광 발전량 예측 연구)

  • Lee, Keunho;Son, Heung-gu;Kim, Sahm
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.1
    • /
    • pp.139-153
    • /
    • 2018
  • This paper investigates solar power forecasting based on several time series models. First, we consider weather variables that influence forecasting procedures as well as compare forecasting accuracies between time series models such as ARIMAX, Holt-Winters and Artificial Neural Network (ANN) models. The results show that ten models forecasting 24hour data have better performance than single models for 24 hours.