• Title/Summary/Keyword: Inference models

Search Result 449, Processing Time 0.03 seconds

Accelerated Loarning of Latent Topic Models by Incremental EM Algorithm (점진적 EM 알고리즘에 의한 잠재토픽모델의 학습 속도 향상)

  • Chang, Jeong-Ho;Lee, Jong-Woo;Eom, Jae-Hong
    • Journal of KIISE:Software and Applications
    • /
    • v.34 no.12
    • /
    • pp.1045-1055
    • /
    • 2007
  • Latent topic models are statistical models which automatically captures salient patterns or correlation among features underlying a data collection in a probabilistic way. They are gaining an increased popularity as an effective tool in the application of automatic semantic feature extraction from text corpus, multimedia data analysis including image data, and bioinformatics. Among the important issues for the effectiveness in the application of latent topic models to the massive data set is the efficient learning of the model. The paper proposes an accelerated learning technique for PLSA model, one of the popular latent topic models, by an incremental EM algorithm instead of conventional EM algorithm. The incremental EM algorithm can be characterized by the employment of a series of partial E-steps that are performed on the corresponding subsets of the entire data collection, unlike in the conventional EM algorithm where one batch E-step is done for the whole data set. By the replacement of a single batch E-M step with a series of partial E-steps and M-steps, the inference result for the previous data subset can be directly reflected to the next inference process, which can enhance the learning speed for the entire data set. The algorithm is advantageous also in that it is guaranteed to converge to a local maximum solution and can be easily implemented just with slight modification of the existing algorithm based on the conventional EM. We present the basic application of the incremental EM algorithm to the learning of PLSA and empirically evaluate the acceleration performance with several possible data partitioning methods for the practical application. The experimental results on a real-world news data set show that the proposed approach can accomplish a meaningful enhancement of the convergence rate in the learning of latent topic model. Additionally, we present an interesting result which supports a possible synergistic effect of the combination of incremental EM algorithm with parallel computing.

Fuzzy Relation-Based Fuzzy Neural-Networks Using a Hybrid Identification Algorithm

  • Park, Ho-Seung;Oh, Sung-Kwun
    • International Journal of Control, Automation, and Systems
    • /
    • v.1 no.3
    • /
    • pp.289-300
    • /
    • 2003
  • In this paper, we introduce an identification method in Fuzzy Relation-based Fuzzy Neural Networks (FRFNN) through a hybrid identification algorithm. The proposed FRFNN modeling implement system structure and parameter identification in the efficient form of "If...., then... " statements, and exploit the theory of system optimization and fuzzy rules. The FRFNN modeling and identification environment realizes parameter identification through a synergistic usage of genetic optimization and complex search method. The hybrid identification algorithm is carried out by combining both genetic optimization and the improved complex method in order to guarantee both global optimization and local convergence. An aggregate objective function with a weighting factor is introduced to achieve a sound balance between approximation and generalization of the model. The proposed model is experimented with using two nonlinear data. The obtained experimental results reveal that the proposed networks exhibit high accuracy and generalization capabilities in comparison to other models.er models.

Bayesian pooling for contingency tables from small areas

  • Jo, Aejung;Kim, Dal Ho
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.6
    • /
    • pp.1621-1629
    • /
    • 2016
  • This paper studies Bayesian pooling for analysis of categorical data from small areas. Many surveys consist of categorical data collected on a contingency table in each area. Statistical inference for small areas requires considerable care because the subpopulation sample sizes are usually very small. Typically we use the hierarchical Bayesian model for pooling subpopulation data. However, the customary hierarchical Bayesian models may specify more exchangeability than warranted. We, therefore, investigate the effects of pooling in hierarchical Bayesian modeling for the contingency table from small areas. In specific, this paper focuses on the methods of direct or indirect pooling of categorical data collected on a contingency table in each area through Dirichlet priors. We compare the pooling effects of hierarchical Bayesian models by fitting the simulated data. The analysis is carried out using Markov chain Monte Carlo methods.

A study on the step stress life testing (계단적 충격 생명검사에 관한 연구)

  • 이석훈
    • The Korean Journal of Applied Statistics
    • /
    • v.2 no.2
    • /
    • pp.61-78
    • /
    • 1989
  • We consider the step stress life testing which has been developed in order to perform the life testing of the units whose normal life time is long within a reasonable amount of time. The models suggested for statistical analysis of the data obtained form the stress life testing are reviewed and a model which contains these models in some respect is suggested. The statistical inference based on the suggested model is done using maximum likelihood and weighted least square estimates. Finally we review the design of the simple step stress life testing and extend the result to the censoring case.

  • PDF

Effects of the Misspecification of Cointegrating Ranks in Seasonal Models

  • Seong, Byeong-Chan;Cho, Sin-Sup;Ahn, Sung-K.;Hwang, S.Y.
    • The Korean Journal of Applied Statistics
    • /
    • v.21 no.5
    • /
    • pp.783-789
    • /
    • 2008
  • We investigate the effects of the misspecification of cointegrating(CI) ranks at other frequencies on the inference of seasonal models at the frequency of interest; our study includes tests for CI ranks and estimation of CI vectors. Earlier studies focused mostly on a single frequency corresponding to one seasonal root at a time, ignoring possible cointegration at the remaining frequencies. We investigate the effects of the mis-specification, especially in finite samples, by adopting Gaussian reduced rank(GRR) estimation by Ahn and Reinsel (1994) that considers cointegration at all frequencies of seasonal unit roots simultaneously. It is observed that the identification of the seasonal CI rank at the frequency of interest is sensitive to the mis-prespecification of the CI ranks at other frequencies, mainly when the CI ranks at the remaining frequencies are underspecified.

Estimation of Qualities and Inference of Operating Conditions for Optimization of Wafer Fabrication Using Artificial Intelligent Methods

  • Bae, Hyeon;Kim, Sung-Shin;Woo, Kwang-Bang
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2005.06a
    • /
    • pp.1101-1106
    • /
    • 2005
  • The purpose of this study was to develop a process management system to manage ingot fabrication and the quality of the ingot. The ingot is the first manufactured material of wafers. Operating data (trace parameters) were collected on-line but quality data (measurement parameters) were measured by sampling inspection. The quality parameters were applied to evaluate the quality. Thus, preprocessing was necessary to extract useful information from the quality data. First, statistical methods were employed for data generation, and then modeling was accomplished, using the generated data, to improve the performance of the models. The function of the models is to predict the quality corresponding to the control parameters. The dynamic polynomial neural network (DPNN) was used for data modeling that used the ingot fabrication data.

  • PDF

Inference for heterogeneity of treatment eect in multi-center clinical trial

  • Ha, Il-Do
    • Journal of the Korean Data and Information Science Society
    • /
    • v.22 no.3
    • /
    • pp.605-612
    • /
    • 2011
  • In multi-center randomized clinical trial the treatment eect may be changed over centers. It is thus important to investigate the heterogeneity in treatment eect between centers. For this, uncorrelated random-eect models assuming independence between random-eect terms have been often used, which may be a strong assumption. In this paper we propose a correlated frailty modelling approach of investigating such heterogeneity using the hierarchical-likelihood method when the outcome is time-to-event. In particular, we show how to construct a proper prediction interval for frailty, which explores graphically the potential heterogeneity for a treatment-by-center interaction term. The proposed method is illustrated via numerical studies based on data from the design of a multi-center clinical trial.

Spatio-temporal models for generating a map of high resolution NO2 level

  • Yoon, Sanghoo;Kim, Mingyu
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.3
    • /
    • pp.803-814
    • /
    • 2016
  • Recent times have seen an exponential increase in the amount of spatial data, which is in many cases associated with temporal data. Recent advances in computer technology and computation of hierarchical Bayesian models have enabled to analyze complex spatio-temporal data. Our work aims at modeling data of daily average nitrogen dioxide (NO2) levels obtained from 25 air monitoring sites in Seoul between 2003 and 2010. We considered an independent Gaussian process model and an auto-regressive model and carried out estimation within a hierarchical Bayesian framework with Markov chain Monte Carlo techniques. A Gaussian predictive process approximation has shown the better prediction performance rather than a Hierarchical auto-regressive model for the illustrative NO2 concentration levels at any unmonitored location.

Off-line recognition of handwritten korean and alphanumeric characters using hidden markov models (Hidden Markov Model을 이용한 필기체 한글 및 영.숫자 오프라인 인식)

  • 김우성;박래홍
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.31B no.9
    • /
    • pp.85-100
    • /
    • 1994
  • This paper proposes a recognition system of constrained handwritten Hangul and alphanumeric characters using discrete hidden Markov models (HMM). HMM process encodes the distortion and similarity among patterns of a class through a doubly stochastic approach. Characterizing the statistical properties of characters using selected features, a recognition system can be implemented by absorbing possible variations in the form. Hangul shapes are classified into six types by fuzzy inference, and their recognition is performed based on quantized features by optimally ordering features according to their effectiveness in each class. The constrained alphanumerics recognition is also performed using the same features used in Hangul recognition. The forward-backward, Viterbi, and Baum-Welch reestimation algorithms are used for training and recognition of handwritten Hangul and alphanumeric characters. Simulation result shows that the proposed method recognizes handwritten Korean characters and alphanumerics effectively.

  • PDF

Moments calculation for truncated multivariate normal in nonlinear generalized mixed models

  • Lee, Seung-Chun
    • Communications for Statistical Applications and Methods
    • /
    • v.27 no.3
    • /
    • pp.377-383
    • /
    • 2020
  • The likelihood-based inference in a nonlinear generalized mixed model often requires computing moments of truncated multivariate normal random variables. Many methods have been proposed for the computation using a recurrence relation or the moment generating function; however, these methods rely on high dimensional numerical integrations. The numerical method is known to be inefficient for high dimensional integral in accuracy. Besides the accuracy, the methods demand too much computing time to use them in practical analyses. In this note, a moment calculation method is proposed under an assumption of a certain covariance structure that occurred mostly in generalized mixed models. The method needs only low dimensional numerical integrations.