• Title/Summary/Keyword: Markov chain Monte Carlo (MCMC)

Search Result 121, Processing Time 0.028 seconds

The inference and estimation for latent discrete outcomes with a small sample

  • Choi, Hyung;Chung, Hwan
    • Communications for Statistical Applications and Methods
    • /
    • v.23 no.2
    • /
    • pp.131-146
    • /
    • 2016
  • In research on behavioral studies, significant attention has been paid to the stage-sequential process for longitudinal data. Latent class profile analysis (LCPA) is an useful method to study sequential patterns of the behavioral development by the two-step identification process: identifying a small number of latent classes at each measurement occasion and two or more homogeneous subgroups in which individuals exhibit a similar sequence of latent class membership over time. Maximum likelihood (ML) estimates for LCPA are easily obtained by expectation-maximization (EM) algorithm, and Bayesian inference can be implemented via Markov chain Monte Carlo (MCMC). However, unusual properties in the likelihood of LCPA can cause difficulties in ML and Bayesian inference as well as estimation in small samples. This article describes and addresses erratic problems that involve conventional ML and Bayesian estimates for LCPA with small samples. We argue that these problems can be alleviated with a small amount of prior input. This study evaluates the performance of likelihood and MCMC-based estimates with the proposed prior in drawing inference over repeated sampling. Our simulation shows that estimates from the proposed methods perform better than those from the conventional ML and Bayesian method.

At-site Low Flow Frequency Analysis Using Bayesian MCMC: I. Comparative study for construction of Prior distribution (Bayesian MCMC를 이용한 저수량 점 빈도분석: I. 사전분포의 적용성 비교)

  • Kim, Sang-Ug;Lee, Kil-Seong;Park, Kyung-Shin
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2008.05a
    • /
    • pp.1121-1124
    • /
    • 2008
  • 저수분석(low flow analysis)은 수자원공학에서 중요한 분야 중 하나이며, 특히 저수량 빈도분석(low flow frequency analysis)의 결과는 저수(貯水)용량의 설계, 물 수급계획, 오염원의 배치 및 관개와 생태계의 보존을 위한 수량과 수질의 관리에 중요하게 사용된다. 그러므로 본 연구에서는 저수량 빈도분석을 위한 점빈도분석을 수행하였으며, 특히 빈도분석에 있어서의 불확실성을 탐색하기 위하여 Bayesian 방법을 적용하고 그 결과를 기존에 사용되던 불확실성 탐색방법과 비교하였다. 본 논문의 I편에서는 Bayesian 방법 중 사전분포(prior distribution)와 우도함수(likelihood function)의 복잡성에 상관없이 계산이 가능한 Bayesian MCMC(Bayesian Markov Chain Monte Carlo) 방법과 Metropolis-Hastings 알고리즘을 사용하기 위한 여러과정의 이론적 배경과 Bayesian 방법에서 가장 중요한 요소인 사전분포를 구축하고 이를 비교 및 평가하였다. 고려된 사전분포는 자료에 기반하지 않은 사전분포와 자료에 기반한 사전분포로써 두 사전분포를 이용하여 Metropolis-Hastings 알고리즘을 수행하고 그 결과를 비교하여 저수량 빈도분석에 합리적인 사전분포를 선정하였다. 또한 알고리즘의 수행과정에서 필요한 제안분포(proposal distribution)를 적용하여 그에 따른 알고리즘의 효율성을 채택률(acceptance rate)을 산정하여 검증해 보았다. 사전분포의 분석 결과, 자료에 기반한 사전분포가 자료에 기반하지 않은 사전분포보다 정확성 및 불확실성의 표현에 있어서 우수한 결과를 제시하는 것을 확인할 수 있었고, 채택률을 이용한 알고리즘의 효용성 역시 기존 연구자들이 제시하였던 만족스러운 범위를 가지는 것을 알 수 있었다. 최종적으로 선정된 사전분포는 본 연구의 II편에서 Bayesian MCMC 방법의 사전분포로 이용되었으며, 그 결과를 기존 불확실성의 추정방법의 하나인 2차 근사식을 이용한 최우추정(maximum likelihood estimation)방법의 결과와 비교하였다.

  • PDF

Variational Bayesian multinomial probit model with Gaussian process classification on mice protein expression level data (가우시안 과정 분류에 대한 변분 베이지안 다항 프로빗 모형: 쥐 단백질 발현 데이터에의 적용)

  • Donghyun Son;Beom Seuk Hwang
    • The Korean Journal of Applied Statistics
    • /
    • v.36 no.2
    • /
    • pp.115-127
    • /
    • 2023
  • Multinomial probit model is a popular model for multiclass classification and choice model. Markov chain Monte Carlo (MCMC) method is widely used for estimating multinomial probit model, but its computational cost is high. However, it is well known that variational Bayesian approximation is more computationally efficient than MCMC, because it uses subsets of samples. In this study, we describe multinomial probit model with Gaussian process classification and how to employ variational Bayesian approximation on the model. This study also compares the results of variational Bayesian multinomial probit model to the results of naive Bayes, K-nearest neighbors and support vector machine for the UCI mice protein expression level data.

At-site Low Flow Frequency Analysis Using Bayesian MCMC: II. Application and Comparative Studies (Bayesian MCMC를 이용한 저수량 점 빈도분석: II. 적용과 비교분석)

  • Kim, Sang-Ug;Lee, Kil-Seong
    • Journal of Korea Water Resources Association
    • /
    • v.41 no.1
    • /
    • pp.49-63
    • /
    • 2008
  • The Bayesian MCMC(Bayesian Markov Chain Monte Carlo) and the MLE(Maximum Likelihood Estimation) methods using a quadratic approximation are applied to perform the at-site low flow frequency analysis at the 4 stage stations (Nakdong, Waegwan, Goryeonggyo, and Jindong). Using the results of two types of the estimation method, the frequency curves including uncertainty are plotted. Eight case studies using the synthetic flow data with a sample size of 100, generated from 2-parmeter Weibull distribution are performed to compare with the results of analysis using the MLE and the Bayesian MCMC. The Bayesian MCMC and the MLE are applied to 36 years of gauged data to validate the efficiency of the developed scheme. These examples illustrate the advantages of the Bayesian MCMC and the limitations of the MLE based on a quadratic approximation. From the point of view of uncertainty analysis, the Bayesian MCMC is more effective than the MLE using a quadratic approximation when the sample size is small. In particular, the Bayesian MCMC is a more attractive method than MLE based on a quadratic approximation because the sample size of low flow at the site of interest is mostly not enough to perform the low flow frequency analysis.

A hidden Markov model for long term drought forecasting in South Korea

  • Chen, Si;Shin, Ji-Yae;Kim, Tae-Woong
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2015.05a
    • /
    • pp.225-225
    • /
    • 2015
  • Drought events usually evolve slowly in time and their impacts generally span a long period of time. This indicates that the sequence of drought is not completely random. The Hidden Markov Model (HMM) is a probabilistic model used to represent dependences between invisible hidden states which finally result in observations. Drought characteristics are dependent on the underlying generating mechanism, which can be well modelled by the HMM. This study employed a HMM with Gaussian emissions to fit the Standardized Precipitation Index (SPI) series and make multi-step prediction to check the drought characteristics in the future. To estimate the parameters of the HMM, we employed a Bayesian model computed via Markov Chain Monte Carlo (MCMC). Since the true number of hidden states is unknown, we fit the model with varying number of hidden states and used reversible jump to allow for transdimensional moves between models with different numbers of states. We applied the HMM to several stations SPI data in South Korea. The monthly SPI data from January 1973 to December 2012 was divided into two parts, the first 30-year SPI data (January 1973 to December 2002) was used for model calibration and the last 10-year SPI data (January 2003 to December 2012) for model validation. All the SPI data was preprocessed through the wavelet denoising and applied as the visible output in the HMM. Different lead time (T= 1, 3, 6, 12 months) forecasting performances were compared with conventional forecasting techniques (e.g., ANN and ARMA). Based on statistical evaluation performance, the HMM exhibited significant preferable results compared to conventional models with much larger forecasting skill score (about 0.3-0.6) and lower Root Mean Square Error (RMSE) values (about 0.5-0.9).

  • PDF

Estimation of the Mixture of Normals of Saving Rate Using Gibbs Algorithm (Gibbs알고리즘을 이용한 저축률의 정규분포혼합 추정)

  • Yoon, Jong-In
    • Journal of Digital Convergence
    • /
    • v.13 no.10
    • /
    • pp.219-224
    • /
    • 2015
  • This research estimates the Mixture of Normals of households saving rate in Korea. Our sample is MDSS, micro-data in 2014 and Gibbs algorithm is used to estimate the Mixture of Normals. Evidences say some results. First, Gibbs algorithm works very well in estimating the Mixture of Normals. Second, Saving rate data has at least two components, one with mean zero and the other with mean 29.4%. It might be that households would be separated into high saving group and low saving group. Third, analysis of Mixture of Normals cannot answer that question and we find that income level and age cannot explain our results.

Bayesian estimation of kinematic parameters of disk galaxies in large HI galaxy surveys

  • Oh, Se-Heon;Staveley-Smith, Lister
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.41 no.2
    • /
    • pp.62.2-62.2
    • /
    • 2016
  • We present a newly developed algorithm based on a Bayesian method for 2D tilted-ring analysis of disk galaxies which operates on velocity fields. Compared to the conventional ones based on a chi-squared minimisation procedure, this new Bayesian-based algorithm less suffers from local minima of the model parameters even with high multi-modality of their posterior distributions. Moreover, the Bayesian analysis implemented via Markov Chain Monte Carlo (MCMC) sampling only requires broad ranges of posterior distributions of the parameters, which makes the fitting procedure fully automated. This feature is essential for performing kinematic analysis of an unprecedented number of resolved galaxies from the upcoming Square Kilometre Array (SKA) pathfinders' galaxy surveys. A standalone code, the so-called '2D Bayesian Automated Tilted-ring fitter' (2DBAT) that implements the Bayesian fits of 2D tilted-ring models is developed for deriving rotation curves of galaxies that are at least marginally resolved (> 3 beams across the semi-major axis) and moderately inclined (20 < i < 70 degree). The main layout of 2DBAT and its performance test are discussed using sample galaxies from Australia Telescope Compact Array (ATCA) observations as well as artificial data cubes built based on representative rotation curves of intermediate-mass and massive spiral galaxies.

  • PDF

Performance assessment of bridges using short-period structural health monitoring system: Sungsu bridge case study

  • Kaloop, Mosbeh R.;Elsharawy, Mohamed;Abdelwahed, Basem;Hu, Jong Wan;Kim, Dongwook
    • Smart Structures and Systems
    • /
    • v.26 no.5
    • /
    • pp.667-680
    • /
    • 2020
  • This study aims at reporting a systematic procedure for evaluating the static and dynamic structural performance of steel bridges based on a short-period structural health monitoring measurement. Sungsu bridge located in Korea is considered as a case study presenting the most recent tests carried out to examine the bridge condition. Short-period measurements of Structural Health Monitoring (SHM) system were used during the bridge testing phase. A novel symmetry index is introduced using statistical analyses of deflection and strain measurements. Frequency Domain Decomposition (FDD) is implemented to the strain measurements to estimate the bridge mode shapes and damping ratios. Furthermore, Markov Chain Monte Carlo (MCMC) is also implemented to examine the reliability of bridge performance while ambient design trucks are in static or moving at different speeds. Strain, displacement and acceleration were measured at selected locations on the bridge. The results show that the symmetry index can be an efficient and useful measure in assessing the steel bridge performance. The results from the used method reveal that the performance of the Sungsu bridge is safe under operational conditions.

Seasonal rainfall short-term forecasting model considering climate indices (외부기상인자를 고려한 낙동강유역 계절강수량 단기예측모형)

  • Lee, Jeong-Ju;Kwon, Hyun-Han;Hwang, Kyu-Nam;Chun, Si-Young
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2011.05a
    • /
    • pp.401-401
    • /
    • 2011
  • 본 연구는 Bayesian MCMC(Markov Chain Monte Carlo)를 이용한 비정상성 빈도해석 모형에 외부기상인자를 결합하여 계절단위의 강수량을 예측하는데 목적을 두고 있으며, 그 중에서도 홍수 위험도와 관련하여 유용하게 이용될 수 있는 여름강수량을 예측 대상으로 하였다. 비정상성 빈도해석 모형을 기반으로 외부 기상인자에 의한 변동성을 고려하기 위해서는 대상 수문량을 한정할 필요가 있으며 극대치강수량과 연관성이 높은 장마전선, 태풍 등의 기상인자는 공간적 변동성 및 복합적인 특성들로 인해 예측인자를 구성하는 기상인자로 사용하기에는 무리가 있다. 따라서 본 연구에서는 계절단위의 수문량으로 여름강수량을 대상으로 하였으며, 이에 영향을 미치는 외부 기상인자로서 SST(sea surface temperature)와 OLR(outgoing longwave radiation)을 도입하였으며, 낙동강유역 여름강수량과의 공간 상관성이 높은 지역의 이전 겨울 SST와 6월 OLR을 예측인자로 활용한 7~9월 여름강수량 예측모형을 구성하였다. 모형의 검증은 결과를 알고 있는 2010년 여름 강수량을 대상으로 수행하였으며, 모형의 적용은 현재시점에서 관측된 2010년 겨울 SST와, 과거 관측 자료를 토대로 가정된 2011년 6월 OLR을 이용하여 2011년 여름 강수량을 예측하였다. 결과적으로 모형 매개변수들의 사후분포로부터 불확실성 구간을 포함한 예측결과를 구할 수 있었다.

  • PDF

Bayesian Variable Selection in the Proportional Hazard Model with Application to Microarray Data

  • Lee, Kyeong-Eun;Mallick, Bani K.
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2005.05a
    • /
    • pp.17-23
    • /
    • 2005
  • In this paper we consider the well-known semiparametric proportional hazards models for survival analysis. These models are usually used with few covariates and many observations (subjects). But, for a typical setting of gene expression data from DNA microarray, we need to consider the case where the number of covariates p exceeds the number of samples n. For a given vector of response values which are times to event (death or censored times) and p gene expressions(covariates), we address the issue of how to reduce the dimension by selecting the significant genes. This approach enables us to estimate the survival curve when n ${\ll}$p. In our approach, rather than fixing the number of selected genes, we will assign a prior distribution to this number. The approach creates additional flexibility by allowing the imposition of constraints, such as bounding the dimension via a prior, which in effect works as a penalty To implement our methodology, we use a Markov Chain Monte Carlo (MCMC) method. We demonstrate the use of the methodology to diffuse large B-cell lymphoma (DLBCL) complementary DNA (cDNA) data and Breast Carcinomas data.

  • PDF