• Title/Summary/Keyword: Bayesian information criterion(BIC)

Search Result 40, Processing Time 0.024 seconds

Generalized Linear Model with Time Series Data (비정규 시계열 자료의 회귀모형 연구)

  • 최윤하;이성임;이상열
    • The Korean Journal of Applied Statistics
    • /
    • v.16 no.2
    • /
    • pp.365-376
    • /
    • 2003
  • In this paper we reviewed a variety of non-Gaussian time series models, and studied the model selection criteria such as AIC and BIC to select proper models. We also considered the likelihood ratio test and applied it to analysis of Polio data set.

Determining on Model-based Clusters of Time Series Data (시계열데이터의 모델기반 클러스터 결정)

  • Jeon, Jin-Ho;Lee, Gye-Sung
    • The Journal of the Korea Contents Association
    • /
    • v.7 no.6
    • /
    • pp.22-30
    • /
    • 2007
  • Most real word systems such as world economy, stock market, and medical applications, contain a series of dynamic and complex phenomena. One of common methods to understand these systems is to build a model and analyze the behavior of the system. In this paper, we investigated methods for best clustering over time series data. As a first step for clustering, BIC (Bayesian Information Criterion) approximation is used to determine the number of clusters. A search technique to improve clustering efficiency is also suggested by analyzing the relationship between data size and BIC values. For clustering, two methods, model-based and similarity based methods, are analyzed and compared. A number of experiments have been performed to check its validity using real data(stock price). BIC approximation measure has been confirmed that it suggests best number of clusters through experiments provided that the number of data is relatively large. It is also confirmed that the model-based clustering produces more reliable clustering than similarity based ones.

The Study on the Verification of Speaker Change using GMM-UBM based KL distance (GMM-UBM 기반 KL 거리를 활용한 화자변화 검증에 대한 연구)

  • Cho, Joon-Beom;Lee, Ji-eun;Lee, Kyong-Rok
    • Journal of Convergence Society for SMB
    • /
    • v.6 no.4
    • /
    • pp.71-77
    • /
    • 2016
  • In this paper, we proposed a verification of speaker change utilizing the KL distance based on GMM-UBM to improve the performance of conventional BIC based Speaker Change Detection(SCD). We have verified Conventional BIC-based SCD using KL-distance based SCD which is robust against difference of information volume than BIC-based SCD. And we have applied GMM-UBM to compensate asymmetric information volume. Conventional BIC-based SCD was composed of two steps. Step 1, to detect the Speaker Change Candidate Point(SCCP). SCCP is positive local maximum point of dissimilarity d. Step 2, to determine the Speaker Change Point(SCP). If ${\Delta}BIC$ of SCCP is positive, it decides to SCP. We examined verification of SCP using GMM-UBM based KL distance D. If the value of D on each SCP is higher than threshold, we accepted that point to the final SCP. In the experimental condition MDR(Missed Detection Rate) is 0, FAR(False Alarm Rate) when the threshold value of 0.028 has been improved to 60.7%.

Can Housing Prices Be an Alternative to a Census-based Deprivation Index? An Evaluation Based on Multilevel Modeling (주택가격이 센서스에 기반한 박탈지수의 대안이 될 수 있는가?: 다수준 모델에 기반한 평가)

  • Sohn, Chul;Nakaya, Tomoki
    • Journal of Cadastre & Land InformatiX
    • /
    • v.48 no.2
    • /
    • pp.197-211
    • /
    • 2018
  • We conducted this research to examine how well regional housing prices are suited to use as an alternative to conventional census-based regional deprivation indices in health and medical geography studies. To examine the relative performance of mean regional housing prices compared to conventional census-based regional deprivation indices, we compared several multilevel logistic regression models, where the first level was individuals and the second was health districts in the Seoul Metropolitan Area (SMA) in Korea, for the sake of adjusting the regional clustering tendency of unknown factors. In these models, we predicted two dichotomous variables that represented individuals' after-lunch tooth brushing behavior and use of dental floss by individual characteristics and regional indices. Then, we compared the relative predictive performance of the models using the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC). The results from the estimations showed that mean regional housing prices and census-based deprivation indices were correlated with the two types of dental health behavior in a statistical sense. The results also revealed that the model with mean regional housing prices showed smaller AIC and BIC compared with other models with conventional census-based deprivation indices. These results imply that it is possible for housing prices summarized using aerial units to be used as an alternative to conventional census-based deprivation indices when the census variables employed cannot properly reflect the characteristics of the aerial units.

Optimal Bayesian MCMC based fire brigade non-suppression probability model considering uncertainty of parameters

  • Kim, Sunghyun;Lee, Sungsu
    • Nuclear Engineering and Technology
    • /
    • v.54 no.8
    • /
    • pp.2941-2959
    • /
    • 2022
  • The fire brigade non-suppression probability model is a major factor that should be considered in evaluating fire-induced risk through fire probabilistic risk assessment (PRA), and also uncertainty is a critical consideration in support of risk-informed performance-based (RIPB) fire protection decision-making. This study developed an optimal integrated probabilistic fire brigade non-suppression model considering uncertainty of parameters based on the Bayesian Markov Chain Monte Carlo (MCMC) approach on electrical fire which is one of the most risk significant contributors. The result shows that the log-normal probability model with a location parameter (µ) of 2.063 and a scale parameter (σ) of 1.879 is best fitting to the actual fire experience data. It gives optimal model adequacy performance with Bayesian information criterion (BIC) of -1601.766, residual sum of squares (RSS) of 2.51E-04, and mean squared error (MSE) of 2.08E-06. This optimal log-normal model shows the better performance of the model adequacy than the exponential probability model suggested in the current fire PRA methodology, with a decrease of 17.3% in BIC, 85.3% in RSS, and 85.3% in MSE. The outcomes of this study are expected to contribute to the improvement and securement of fire PRA realism in the support of decision-making for RIPB fire protection programs.

A Bayesian Approach to Gumbel Mixture Distribution for the Estimation of Parameter and its use to the Rainfall Frequency Analysis (Bayesian 기법을 이용한 혼합 Gumbel 분포 매개변수 추정 및 강우빈도해석 기법 개발)

  • Choi, Hong-Geun;Uranchimeg, Sumiya;Kim, Yong-Tak;Kwon, Hyun-Han
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.38 no.2
    • /
    • pp.249-259
    • /
    • 2018
  • More than half of annual rainfall occurs in summer season in Korea due to its climate condition and geographical location. A frequency analysis is mostly adopted for designing hydraulic structure under the such concentrated rainfall condition. Among the various distributions, univariate Gumbel distribution has been routinely used for rainfall frequency analysis in Korea. However, the distributional changes in extreme rainfall have been globally observed including Korea. More specifically, the univariate Gumbel distribution based rainfall frequency analysis is often fail to describe multimodal behaviors which are mainly influenced by distinct climate conditions during the wet season. In this context, we purposed a Gumbel mixture distribution based rainfall frequency analysis with a Bayesian framework, and further the results were compared to that of the univariate. It was found that the proposed model showed better performance in describing underlying distributions, leading to the lower Bayesian information criterion (BIC) values. The mixed Gumbel distribution was more robust for describing the upper tail of the distribution which playes a crucial role in estimating more reliable estimates of design rainfall uncertainty occurred by peak of upper tail than single Gumbel distribution. Therefore, it can be concluded that the mixed Gumbel distribution is more compatible for extreme frequency analysis rainfall data with two or more peaks on its distribution.

A Comparison Study on Statistical Modeling Methods (통계모델링 방법의 비교 연구)

  • Noh, Yoojeong
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.17 no.5
    • /
    • pp.645-652
    • /
    • 2016
  • The statistical modeling of input random variables is necessary in reliability analysis, reliability-based design optimization, and statistical validation and calibration of analysis models of mechanical systems. In statistical modeling methods, there are the Akaike Information Criterion (AIC), AIC correction (AICc), Bayesian Information Criterion, Maximum Likelihood Estimation (MLE), and Bayesian method. Those methods basically select the best fitted distribution among candidate models by calculating their likelihood function values from a given data set. The number of data or parameters in some methods are considered to identify the distribution types. On the other hand, the engineers in a real field have difficulties in selecting the statistical modeling method to obtain a statistical model of the experimental data because of a lack of knowledge of those methods. In this study, commonly used statistical modeling methods were compared using statistical simulation tests. Their advantages and disadvantages were then analyzed. In the simulation tests, various types of distribution were assumed as populations and the samples were generated randomly from them with different sample sizes. Real engineering data were used to verify each statistical modeling method.

A Speaker Change Detection Experiment that Uses a Statistical Method (통계적 기법을 이용한 화자변화 검출 실험)

  • Lee, Kyong-Rok;Kim, Jin-Young
    • Speech Sciences
    • /
    • v.8 no.4
    • /
    • pp.59-72
    • /
    • 2001
  • In this paper, we experimented with speaker change detection that uses a statistical method for NOD (News On Demand) service. A specified speaker's change can find out content of each data in speech if analysed because it means change of data contents in news data. Speaker change detection acts as preprocessor that divide input speech by speaker. This is an important preprocessor phase for speaker tracking. We detected speaker change using GLR(generalized likelihood ratio) distance base division and BIC (Bayesian information criterion) base division among matrix method. An experiment verified speaker change point using BIC base division after divide by speaker unit using GLR distance base method first. In the experimental result, FAR (False Alarm Rate) was 63.29 in high noise environment and FAR was 54.28 in low noise environment in MDR (Missed Detection Rate) 15% neighborhood.

  • PDF

An Optimization Method of Neural Networks using Adaptive Regulraization, Pruning, and BIC (적응적 정규화, 프루닝 및 BIC를 이용한 신경망 최적화 방법)

  • 이현진;박혜영
    • Journal of Korea Multimedia Society
    • /
    • v.6 no.1
    • /
    • pp.136-147
    • /
    • 2003
  • To achieve an optimal performance for a given problem, we need an integrative process of the parameter optimization via learning and the structure optimization via model selection. In this paper, we propose an efficient optimization method for improving generalization performance by considering the property of each sub-method and by combining them with common theoretical properties. First, weight parameters are optimized by natural gradient teaming with adaptive regularization, which uses a diverse error function. Second, the network structure is optimized by eliminating unnecessary parameters with natural pruning. Through iterating these processes, candidate models are constructed and evaluated based on the Bayesian Information Criterion so that an optimal one is finally selected. Through computational experiments on benchmark problems, we confirm the weight parameter and structure optimization performance of the proposed method.

  • PDF

Extensions of X-means with Efficient Learning the Number of Clusters (X-means 확장을 통한 효율적인 집단 개수의 결정)

  • Heo, Gyeong-Yong;Woo, Young-Woon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.12 no.4
    • /
    • pp.772-780
    • /
    • 2008
  • K-means is one of the simplest unsupervised learning algorithms that solve the clustering problem. However K-means suffers the basic shortcoming: the number of clusters k has to be known in advance. In this paper, we propose extensions of X-means, which can estimate the number of clusters using Bayesian information criterion(BIC). We introduce two different versions of algorithm: modified X-means(MX-means) and generalized X-means(GX-means), which employ one full covariance matrix for one cluster and so can estimate the number of clusters efficiently without severe over-fitting which X-means suffers due to its spherical cluster assumption. The algorithms start with one cluster and try to split a cluster iteratively to maximize the BIC score. The former uses K-means algorithm to find a set of optimal clusters with current k, which makes it simple and fast. However it generates wrongly estimated centers when the clusters are overlapped. The latter uses EM algorithm to estimate the parameters and generates more stable clusters even when the clusters are overlapped. Experiments with synthetic data show that the purposed methods can provide a robust estimate of the number of clusters and cluster parameters compared to other existing top-down algorithms.