• Title/Summary/Keyword: normal mixture model

Search Result 107, Processing Time 0.021 seconds

Reject Inference of Incomplete Data Using a Normal Mixture Model

  • Song, Ju-Won
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.2
    • /
    • pp.425-433
    • /
    • 2011
  • Reject inference in credit scoring is a statistical approach to adjust for nonrandom sample bias due to rejected applicants. Function estimation approaches are based on the assumption that rejected applicants are not necessary to be included in the estimation, when the missing data mechanism is missing at random. On the other hand, the density estimation approach by using mixture models indicates that reject inference should include rejected applicants in the model. When mixture models are chosen for reject inference, it is often assumed that data follow a normal distribution. If data include missing values, an application of the normal mixture model to fully observed cases may cause another sample bias due to missing values. We extend reject inference by a multivariate normal mixture model to handle incomplete characteristic variables. A simulation study shows that inclusion of incomplete characteristic variables outperforms the function estimation approaches.

Normal Mixture Model with General Linear Regressive Restriction: Applied to Microarray Gene Clustering

  • Kim, Seung-Gu
    • Communications for Statistical Applications and Methods
    • /
    • v.14 no.1
    • /
    • pp.205-213
    • /
    • 2007
  • In this paper, the normal mixture model subjected to general linear restriction for component-means based on linear regression is proposed, and its fitting method by EM algorithm and Lagrange multiplier is provided. This model is applied to gene clustering of microarray expression data, which demonstrates it has very good performances for real data set. This model also allows to obtain the clusters that an analyst wants to find out in the fashion that the hypothesis for component-means is represented by the design matrices and the linear restriction matrices.

Detection of Differentially Expressed Genes by Clustering Genes Using Class-Wise Averaged Data in Microarray Data

  • Kim, Seung-Gu
    • Communications for Statistical Applications and Methods
    • /
    • v.14 no.3
    • /
    • pp.687-698
    • /
    • 2007
  • A normal mixture model with which dependence between classes is incorporated is proposed in order to detect differentially expressed genes. Gene clustering approaches suffer from the high dimensional column of microarray expression data matrix which leads to the over-fit problem. Various methods are proposed to solve the problem. In this paper, use of simple averaging data within each class is proposed to overcome the various problems due to high dimensionality when the normal mixture model is fitted. Some experiments through simulated data set and real data set show its availability in actuality.

A Predictive Two-Group Multinormal Classification Rule Accounting for Model Uncertainty

  • Kim, Hea-Jung
    • Journal of the Korean Statistical Society
    • /
    • v.26 no.4
    • /
    • pp.477-491
    • /
    • 1997
  • A new predictive classification rule for assigning future cases into one of two multivariate normal population (with unknown normal mixture model) is considered. The development involves calculation of posterior probability of each possible normal-mixture model via a default Bayesian test criterion, called intrinsic Bayes factor, and suggests predictive distribution for future cases to be classified that accounts for model uncertainty by weighting the effect of each model by its posterior probabiliy. In this paper, our interest is focused on constructing the classification rule that takes care of uncertainty about the types of covariance matrices (homogeneity/heterogeneity) involved in the model. For the constructed rule, a Monte Carlo simulation study demonstrates routine application and notes benefits over traditional predictive calssification rule by Geisser (1982).

  • PDF

Variable Selection in Clustering by Recursive Fit of Normal Distribution-based Salient Mixture Model (정규분포기반 두각 혼합모형의 순환적 적합을 이용한 군집분석에서의 변수선택)

  • Kim, Seung-Gu
    • The Korean Journal of Applied Statistics
    • /
    • v.26 no.5
    • /
    • pp.821-834
    • /
    • 2013
  • Law et al. (2004) proposed a normal distribution based salient mixture model for variable selection in clustering. However, this model has substantial problems such as the unidentifiability of components an the inaccurate selection of informative variables in the case of a small cluster size. We propose an alternative method to overcome problems and demonstrate a good performance through experiments on simulated data and real data.

Use of Factor Analyzer Normal Mixture Model with Mean Pattern Modeling on Clustering Genes

  • Kim Seung-Gu
    • Communications for Statistical Applications and Methods
    • /
    • v.13 no.1
    • /
    • pp.113-123
    • /
    • 2006
  • Normal mixture model(NMM) frequently used to cluster genes on microarray gene expression data. In this paper some of component means of NMM are modelled by a linear regression model so that its design matrix presents the pattern between sample classes in microarray matrix. This modelling for the component means by given design matrices certainly has an advantage that we can lead the clusters that are previously designed. However, it suffers from 'overfitting' problem because in practice genes often are highly dimensional. This problem also arises when the NMM restricted by the linear model for component-means is fitted. To cope with this problem, in this paper, the use of the factor analyzer NMM restricted by linear model is proposed to cluster genes. Also several design matrices which are useful for clustering genes are provided.

Variable Selection in Linear Random Effects Models for Normal Data

  • Kim, Hea-Jung
    • Journal of the Korean Statistical Society
    • /
    • v.27 no.4
    • /
    • pp.407-420
    • /
    • 1998
  • This paper is concerned with selecting covariates to be included in building linear random effects models designed to analyze clustered response normal data. It is based on a Bayesian approach, intended to propose and develop a procedure that uses probabilistic considerations for selecting premising subsets of covariates. The approach reformulates the linear random effects model in a hierarchical normal and point mass mixture model by introducing a set of latent variables that will be used to identify subset choices. The hierarchical model is flexible to easily accommodate sign constraints in the number of regression coefficients. Utilizing Gibbs sampler, the appropriate posterior probability of each subset of covariates is obtained. Thus, In this procedure, the most promising subset of covariates can be identified as that with highest posterior probability. The procedure is illustrated through a simulation study.

  • PDF

Estimating Suitable Probability Distribution Function for Multimodal Traffic Distribution Function

  • Yoo, Sang-Lok;Jeong, Jae-Yong;Yim, Jeong-Bin
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.21 no.3
    • /
    • pp.253-258
    • /
    • 2015
  • The purpose of this study is to find suitable probability distribution function of complex distribution data like multimodal. Normal distribution is broadly used to assume probability distribution function. However, complex distribution data like multimodal are very hard to be estimated by using normal distribution function only, and there might be errors when other distribution functions including normal distribution function are used. In this study, we experimented to find fit probability distribution function in multimodal area, by using AIS(Automatic Identification System) observation data gathered in Mokpo port for a year of 2013. By using chi-squared statistic, gaussian mixture model(GMM) is the fittest model rather than other distribution functions, such as extreme value, generalized extreme value, logistic, and normal distribution. GMM was found to the fit model regard to multimodal data of maritime traffic flow distribution. Probability density function for collision probability and traffic flow distribution will be calculated much precisely in the future.

Dirichlet Process Mixtures of Linear Mixed Regressions

  • Kyung, Minjung
    • Communications for Statistical Applications and Methods
    • /
    • v.22 no.6
    • /
    • pp.625-637
    • /
    • 2015
  • We develop a Bayesian clustering procedure based on a Dirichlet process prior with cluster specific random effects. Gibbs sampling of a normal mixture of linear mixed regressions with a Dirichlet process was implemented to calculate posterior probabilities when the number of clusters was unknown. Our approach (unlike its counterparts) provides simultaneous partitioning and parameter estimation with the computation of the classification probabilities. A Monte Carlo study of curve estimation results showed that the model was useful for function estimation. We find that the proposed Dirichlet process mixture model with cluster specific random effects detects clusters sensitively by combining vague edges into different clusters. Examples are given to show how these models perform on real data.

Application of Finite Mixture to Characterise Degraded Gmelina arborea Roxb Plantation in Omo Forest Reserve, Nigeria

  • Ogana, Friday Nwabueze
    • Journal of Forest and Environmental Science
    • /
    • v.34 no.6
    • /
    • pp.451-456
    • /
    • 2018
  • The use of single component distribution to describe the irregular stand structure of degraded forest often lead to bias. Such biasness can be overcome by the application of finite mixture distribution. Therefore, in this study, finite mixture distribution was used to characterise the irregular stand structure of the Gmelina arborea plantation in Omo forest reserve. Thirty plots, ten each from the three stands established in 1984, 1990 and 2005 were used. The data were pooled per stand and fitted. Four finite mixture distributions including normal mixture, lognormal mixture, gamma mixture and Weibull mixture were considered. The method of maximum likelihood was used to fit the finite mixture distributions to the data. Model assessment was based on negative loglikelihood value ($-{\Lambda}{\Lambda}$), Akaike information criterion (AIC), Bayesian information criterion (BIC) and root mean square error (RMSE). The results showed that the mixture distributions provide accurate and precise characterisation of the irregular diameter distribution of the degraded Gmelina arborea stands. The $-{\Lambda}{\Lambda}$, AIC, BIC and RMSE values ranged from -715.233 to -348.375, 703.926 to 1433.588, 718.598 to 1451.334 and 3.003 to 7.492, respectively. Their performances were relatively the same. This approach can be used to describe other irregular forest stand structures, especially the multi-species forest.