• Title/Summary/Keyword: Markov Chain Monte Carlo Method

Search Result 149, Processing Time 0.025 seconds

Bayesian logit models with auxiliary mixture sampling for analyzing diabetes diagnosis data (보조 혼합 샘플링을 이용한 베이지안 로지스틱 회귀모형 : 당뇨병 자료에 적용 및 분류에서의 성능 비교)

  • Rhee, Eun Hee;Hwang, Beom Seuk
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.1
    • /
    • pp.131-146
    • /
    • 2022
  • Logit models are commonly used to predicting and classifying categorical response variables. Most Bayesian approaches to logit models are implemented based on the Metropolis-Hastings algorithm. However, the algorithm has disadvantages of slow convergence and difficulty in ensuring adequacy for the proposal distribution. Therefore, we use auxiliary mixture sampler proposed by Frühwirth-Schnatter and Frühwirth (2007) to estimate logit models. This method introduces two sequences of auxiliary latent variables to make logit models satisfy normality and linearity. As a result, the method leads that logit model can be easily implemented by Gibbs sampling. We applied the proposed method to diabetes data from the Community Health Survey (2020) of the Korea Disease Control and Prevention Agency and compared performance with Metropolis-Hastings algorithm. In addition, we showed that the logit model using auxiliary mixture sampling has a great classification performance comparable to that of the machine learning models.

Uncertainty Assessment of Single Event Rainfall-Runoff Model Using Bayesian Model (Bayesian 모형을 이용한 단일사상 강우-유출 모형의 불확실성 분석)

  • Kwon, Hyun-Han;Kim, Jang-Gyeong;Lee, Jong-Seok;Na, Bong-Kil
    • Journal of Korea Water Resources Association
    • /
    • v.45 no.5
    • /
    • pp.505-516
    • /
    • 2012
  • The study applies a hydrologic simulation model, HEC-1 developed by Hydrologic Engineering Center to Daecheong dam watershed for modeling hourly inflows of Daecheong dam. Although the HEC-1 model provides an automatic optimization technique for some of the parameters, the built-in optimization model is not sufficient in estimating reliable parameters. In particular, the optimization model often fails to estimate the parameters when a large number of parameters exist. In this regard, a main objective of this study is to develop Bayesian Markov Chain Monte Carlo simulation based HEC-1 model (BHEC-1). The Clark IUH method for transformation of precipitation excess to runoff and the soil conservation service runoff curve method for abstractions were used in Bayesian Monte Carlo simulation. Simulations of runoff at the Daecheong station in the HEC-1 model under Bayesian optimization scheme allow the posterior probability distributions of the hydrograph thus providing uncertainties in rainfall-runoff process. The proposed model showed a powerful performance in terms of estimating model parameters and deriving full uncertainties so that the model can be applied to various hydrologic problems such as frequency curve derivation, dam risk analysis and climate change study.

Identifying Copy Number Variants under Selection in Geographically Structured Populations Based on F-statistics

  • Song, Hae-Hiang;Hu, Hae-Jin;Seok, In-Hae;Chung, Yeun-Jun
    • Genomics & Informatics
    • /
    • v.10 no.2
    • /
    • pp.81-87
    • /
    • 2012
  • Large-scale copy number variants (CNVs) in the human provide the raw material for delineating population differences, as natural selection may have affected at least some of the CNVs thus far discovered. Although the examination of relatively large numbers of specific ethnic groups has recently started in regard to inter-ethnic group differences in CNVs, identifying and understanding particular instances of natural selection have not been performed. The traditional $F_{ST}$ measure, obtained from differences in allele frequencies between populations, has been used to identify CNVs loci subject to geographically varying selection. Here, we review advances and the application of multinomial-Dirichlet likelihood methods of inference for identifying genome regions that have been subject to natural selection with the $F_{ST}$ estimates. The contents of presentation are not new; however, this review clarifies how the application of the methods to CNV data, which remains largely unexplored, is possible. A hierarchical Bayesian method, which is implemented via Markov Chain Monte Carlo, estimates locus-specific $F_{ST}$ and can identify outlying CNVs loci with large values of FST. By applying this Bayesian method to the publicly available CNV data, we identified the CNV loci that show signals of natural selection, which may elucidate the genetic basis of human disease and diversity.

A Sparse Data Preprocessing Using Support Vector Regression (Support Vector Regression을 이용한 희소 데이터의 전처리)

  • Jun, Sung-Hae;Park, Jung-Eun;Oh, Kyung-Whan
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.14 no.6
    • /
    • pp.789-792
    • /
    • 2004
  • In various fields as web mining, bioinformatics, statistical data analysis, and so forth, very diversely missing values are found. These values make training data to be sparse. Largely, the missing values are replaced by predicted values using mean and mode. We can used the advanced missing value imputation methods as conditional mean, tree method, and Markov Chain Monte Carlo algorithm. But general imputation models have the property that their predictive accuracy is decreased according to increase the ratio of missing in training data. Moreover the number of available imputations is limited by increasing missing ratio. To settle this problem, we proposed statistical learning theory to preprocess for missing values. Our statistical learning theory is the support vector regression by Vapnik. The proposed method can be applied to sparsely training data. We verified the performance of our model using the data sets from UCI machine learning repository.

Bayesian Approaches to Zero Inflated Poisson Model (영 과잉 포아송 모형에 대한 베이지안 방법 연구)

  • Lee, Ji-Ho;Choi, Tae-Ryon;Wo, Yoon-Sung
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.4
    • /
    • pp.677-693
    • /
    • 2011
  • In this paper, we consider Bayesian approaches to zero inflated Poisson model, one of the popular models to analyze zero inflated count data. To generate posterior samples, we deal with a Markov Chain Monte Carlo method using a Gibbs sampler and an exact sampling method using an Inverse Bayes Formula(IBF). Posterior sampling algorithms using two methods are compared, and a convergence checking for a Gibbs sampler is discussed, in particular using posterior samples from IBF sampling. Based on these sampling methods, a real data analysis is performed for Trajan data (Marin et al., 1993) and our results are compared with existing Trajan data analysis. We also discuss model selection issues for Trajan data between the Poisson model and zero inflated Poisson model using various criteria. In addition, we complement the previous work by Rodrigues (2003) via further data analysis using a hierarchical Bayesian model.

Bayesian inference on multivariate asymmetric jump-diffusion models (다변량 비대칭 라플라스 점프확산 모형의 베이지안 추론)

  • Lee, Youngeun;Park, Taeyoung
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.1
    • /
    • pp.99-112
    • /
    • 2016
  • Asymmetric jump-diffusion models are effectively used to model the dynamic behavior of asset prices with abrupt asymmetric upward and downward changes. However, the estimation of their extension to the multivariate asymmetric jump-diffusion model has been hampered by the analytically intractable likelihood function. This article confronts the problem using a data augmentation method and proposes a new Bayesian method for a multivariate asymmetric Laplace jump-diffusion model. Unlike the previous models, the proposed model is rich enough to incorporate all possible correlated jumps as well as mention individual and common jumps. The proposed model and methodology are illustrated with a simulation study and applied to daily returns for the KOSPI, S&P500, and Nikkei225 indices data from January 2005 to September 2015.

Structural modal identification and MCMC-based model updating by a Bayesian approach

  • Zhang, F.L.;Yang, Y.P.;Ye, X.W.;Yang, J.H.;Han, B.K.
    • Smart Structures and Systems
    • /
    • v.24 no.5
    • /
    • pp.631-639
    • /
    • 2019
  • Finite element analysis is one of the important methods to study the structural performance. Due to the simplification, discretization and error of structural parameters, numerical model errors always exist. Besides, structural characteristics may also change because of material aging, structural damage, etc., making the initial finite element model cannot simulate the operational response of the structure accurately. Based on Bayesian methods, the initial model can be updated to obtain a more accurate numerical model. This paper presents the work on the field test, modal identification and model updating of a Chinese reinforced concrete pagoda. Based on the ambient vibration test, the acceleration response of the structure under operational environment was collected. The first six translational modes of the structure were identified by the enhanced frequency domain decomposition method. The initial finite element model of the pagoda was established, and the elastic modulus of columns, beams and slabs were selected as model parameters to be updated. Assuming the error between the measured mode and the calculated one follows a Gaussian distribution, the posterior probability density function (PDF) of the parameter to be updated is obtained and the uncertainty is quantitatively evaluated based on the Bayesian statistical theory and the Metropolis-Hastings algorithm, and then the optimal values of model parameters can be obtained. The results show that the difference between the calculated frequency of the finite element model and the measured one is reduced, and the modal correlation of the mode shape is improved. The updated numerical model can be used to evaluate the safety of the structure as a benchmark model for structural health monitoring (SHM).

Joint analysis of binary and continuous data using skewed logit model in developmental toxicity studies (발달 독성학에서 비대칭 로짓 모형을 사용한 이진수 자료와 연속형 자료에 대한 결합분석)

  • Kim, Yeong-hwa;Hwang, Beom Seuk
    • The Korean Journal of Applied Statistics
    • /
    • v.33 no.2
    • /
    • pp.123-136
    • /
    • 2020
  • It is common to encounter correlated multiple outcomes measured on the same subject in various research fields. In developmental toxicity studies, presence of malformed pups and fetal weight are measured on the pregnant dams exposed to different levels of a toxic substance. Joint analysis of such two outcomes can result in more efficient inferences than separate models for each outcome. Most methods for joint modeling assume a normal distribution as random effects. However, in developmental toxicity studies, the response distributions may change irregularly in location and shape as the level of toxic substance changes, which may not be captured by a normal random effects model. Motivated by applications in developmental toxicity studies, we propose a Bayesian joint model for binary and continuous outcomes. In our model, we incorporate a skewed logit model for the binary outcome to allow the response distributions to have flexibly in both symmetric and asymmetric shapes on the toxic levels. We apply our proposed method to data from a developmental toxicity study of diethylhexyl phthalate.

The Risk Assessment and Prediction for the Mixed Deterioration in Cable Bridges Using a Stochastic Bayesian Modeling (확률론적 베이지언 모델링에 의한 케이블 교량의 복합열화 리스크 평가 및 예측시스템)

  • Cho, Tae Jun;Lee, Jeong Bae;Kim, Seong Soo
    • Journal of the Korea institute for structural maintenance and inspection
    • /
    • v.16 no.5
    • /
    • pp.29-39
    • /
    • 2012
  • The main objective is to predict the future degradation and maintenance budget for a suspension bridge system. Bayesian inference is applied to find the posterior probability density function of the source parameters (damage indices and serviceability), given ten years of maintenance data. The posterior distribution of the parameters is sampled using a Markov chain Monte Carlo method. The simulated risk prediction for decreased serviceability conditions are posterior distributions based on prior distribution and likelihood of data updated from annual maintenance tasks. Compared with conventional linear prediction model, the proposed quadratic model provides highly improved convergence and closeness to measured data in terms of serviceability, risky factors, and maintenance budget for bridge components, which allows forecasting a future performance and financial management of complex infrastructures based on the proposed quadratic stochastic regression model.

Application of Bootstrap and Bayesian Methods for Estimating Confidence Intervals on Biological Reference Points in Fisheries Management (부트스트랩과 베이지안 방법으로 추정한 수산자원관리에서의 생물학적 기준점의 신뢰구간)

  • Jung, Suk-Geun;Choi, Il-Su;Chang, Dae-Soo
    • Korean Journal of Fisheries and Aquatic Sciences
    • /
    • v.41 no.2
    • /
    • pp.107-112
    • /
    • 2008
  • To evaluate uncertainty and risk in biological reference points, we applied a bootstrapping method and a Bayesian procedure to estimate the related confidence intervals. Here we provide an example of the maximum sustainable yield (MSY) of turban shell, Batillus cornutus, estimated by the Schaefer and Fox models. Fitting the time series of catch and effort from 1968 to 2006 showed that the Fox model performs better than the Schaefer model. The estimated MSY and its bootstrap percentile confidence interval (CI) at ${\alpha}=0.05$ were 1,680 (1,420-1,950) tons for the Fox model and 2,170 (1,860-2,500) tons for the Schaefer model. The CIs estimated by the Bayesian approach gave similar ranges: 1,710 (1,450-2,000) tons for the Fox model and 2,230 (1,760-2,930) tons for the Schaefer model. Because uncertainty in effort and catch data is believed to be greater for earlier years, we evaluated the influence of sequentially excluding old data points by varying the first year of the time series from 1968 to 1992 to run 'backward' bootstrap resampling. The results showed that the means and upper 2.5% confidence limit (CL) of MSY varied greatly depending on the first year chosen whereas the lower 2.5% CL was robust against the arbitrary selection of data, especially for the Schaefer model. We demonstrated that the bootstrap and Bayesian approach could be useful in precautionary fisheries management, and we advise that the lower 2.5% CL derived by the Fox model is robust and a better biological reference point for the turban shells of Jeju Island.