• Title/Summary/Keyword: Binary regression model

Search Result 181, Processing Time 0.026 seconds

Sampling Based Approach to Bayesian Analysis of Binary Regression Model with Incomplete Data

  • Chung, Young-Shik
    • Journal of the Korean Statistical Society
    • /
    • v.26 no.4
    • /
    • pp.493-505
    • /
    • 1997
  • The analysis of binary data appears to many areas such as statistics, biometrics and econometrics. In many cases, data are often collected in which some observations are incomplete. Assume that the missing covariates are missing at random and the responses are completely observed. A method to Bayesian analysis of the binary regression model with incomplete data is presented. In particular, the desired marginal posterior moments of regression parameter are obtained using Meterpolis algorithm (Metropolis et al. 1953) within Gibbs sampler (Gelfand and Smith, 1990). Also, we compare logit model with probit model using Bayes factor which is approximated by importance sampling method. One example is presented.

  • PDF

Binary Forecast of Heavy Snow Using Statistical Models

  • Sohn, Keon-Tae
    • Communications for Statistical Applications and Methods
    • /
    • v.13 no.2
    • /
    • pp.369-378
    • /
    • 2006
  • This Study focuses on the binary forecast of occurrence of heavy snow in Honam area based on the MOS(model output statistic) method. For our study daily amount of snow cover at 17 stations during the cold season (November to March) in 2001 to 2005 and Corresponding 45 RDAPS outputs are used. Logistic regression model and neural networks are applied to predict the probability of occurrence of Heavy snow. Based on the distribution of estimated probabilities, optimal thresholds are determined via true shill score. According to the results of comparison the logistic regression model is recommended.

A Bayesian Method for Narrowing the Scope fo Variable Selection in Binary Response t-Link Regression

  • Kim, Hea-Jung
    • Journal of the Korean Statistical Society
    • /
    • v.29 no.4
    • /
    • pp.407-422
    • /
    • 2000
  • This article is concerned with the selecting predictor variables to be included in building a class of binary response t-link regression models where both probit and logistic regression models can e approximately taken as members of the class. It is based on a modification of the stochastic search variable selection method(SSVS), intended to propose and develop a Bayesian procedure that used probabilistic considerations for selecting promising subsets of predictor variables. The procedure reformulates the binary response t-link regression setup in a hierarchical truncated normal mixture model by introducing a set of hyperparameters that will be used to identify subset choices. In this setup, the most promising subset of predictors can be identified as that with highest posterior probability in the marginal posterior distribution of the hyperparameters. To highlight the merit of the procedure, an illustrative numerical example is given.

  • PDF

Binary regression model using skewed generalized t distributions (기운 일반화 t 분포를 이용한 이진 데이터 회귀 분석)

  • Kim, Mijeong
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.5
    • /
    • pp.775-791
    • /
    • 2017
  • We frequently encounter binary data in real life. Logistic, Probit, Cauchit, Complementary log-log models are often used for binary data analysis. In order to analyze binary data, Liu (2004) proposed a Robit model, in which the inverse of cdf of the Student's t distribution is used as a link function. Kim et al. (2008) also proposed a generalized t-link model to make the binary regression model more flexible. The more flexible skewed distributions allow more flexible link functions in generalized linear models. In the sense, we propose a binary data regression model using skewed generalized t distributions introduced in Theodossiou (1998). We implement R code of the proposed models using the glm function included in R base and R sgt package. We also analyze Pima Indian data using the proposed model in R.

A Bayesian Variable Selection Method for Binary Response Probit Regression

  • Kim, Hea-Jung
    • Journal of the Korean Statistical Society
    • /
    • v.28 no.2
    • /
    • pp.167-182
    • /
    • 1999
  • This article is concerned with the selection of subsets of predictor variables to be included in building the binary response probit regression model. It is based on a Bayesian approach, intended to propose and develop a procedure that uses probabilistic considerations for selecting promising subsets. This procedure reformulates the probit regression setup in a hierarchical normal mixture model by introducing a set of hyperparameters that will be used to identify subset choices. The appropriate posterior probability of each subset of predictor variables is obtained through the Gibbs sampler, which samples indirectly from the multinomial posterior distribution on the set of possible subset choices. Thus, in this procedure, the most promising subset of predictors can be identified as the one with highest posterior probability. To highlight the merit of this procedure a couple of illustrative numerical examples are given.

  • PDF

Prediction of extreme PM2.5 concentrations via extreme quantile regression

  • Lee, SangHyuk;Park, Seoncheol;Lim, Yaeji
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.3
    • /
    • pp.319-331
    • /
    • 2022
  • In this paper, we develop a new statistical model to forecast the PM2.5 level in Seoul, South Korea. The proposed model is based on the extreme quantile regression model with lasso penalty. Various meteorological variables and air pollution variables are considered as predictors in the regression model, and the lasso quantile regression performs variable selection and solves the multicollinearity problem. The final prediction model is obtained by combining various extreme lasso quantile regression estimators and we construct a binary classifier based on the model. Prediction performance is evaluated through the statistical measures of the performance of a binary classification test. We observe that the proposed method works better compared to the other classification methods, and predicts 'very bad' cases of the PM2.5 level well.

An educational tool for binary logistic regression model using Excel VBA (엑셀 VBA를 이용한 이분형 로지스틱 회귀모형 교육도구 개발)

  • Park, Cheolyong;Choi, Hyun Seok
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.2
    • /
    • pp.403-410
    • /
    • 2014
  • Binary logistic regression analysis is a statistical technique that explains binary response variable by quantitative or qualitative explanatory variables. In the binary logistic regression model, the probability that the response variable equals, say 1, one of the binary values is to be explained as a transformation of linear combination of explanatory variables. This is one of big barriers that non-statisticians have to overcome in order to understand the model. In this study, an educational tool is developed that explains the need of the binary logistic regression analysis using Excel VBA. More precisely, this tool explains the problems related to modeling the probability of the response variable equal to 1 as a linear combination of explanatory variables and then shows how these problems can be solved through some transformations of the linear combination.

Empirical Analysis on the Relationship between R&D Inputs and Performance Using Successive Binary Logistic Regression Models (연속적 이항 로지스틱 회귀모형을 이용한 R&D 투입 및 성과 관계에 대한 실증분석)

  • Park, Sungmin
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.40 no.3
    • /
    • pp.342-357
    • /
    • 2014
  • The present study analyzes the relationship between research and development (R&D) inputs and performance of a national technology innovation R&D program using successive binary Logistic regression models based on a typical R&D logic model. In particular, this study focuses on to answer the following three main questions; (1) "To what extent, do the R&D inputs have an effect on the performance creation?"; (2) "Is an obvious relationship verified between the immediate predecessor and its successor performance?"; and (3) "Is there a difference in the performance creation between R&D government subsidy recipient types and between R&D collaboration types?" Methodologically, binary Logistic regression models are established successively considering the "Success-Failure" binary data characteristic regarding the performance creation. An empirical analysis is presented analyzing the sample n = 2,178 R&D projects completed. This study's major findings are as follows. First, the R&D inputs have a statistically significant relationship only with the short-term, technical output, "Patent Registration." Second, strong dependencies are identified between the immediate predecessor and its successor performance. Third, the success probability of the performance creation is statistically significantly different between the R&D types aforementioned. Specifically, compared with "Large Company", "Small and Medium-Sized Enterprise (SMS)" shows a greater success probability of "Sales" and "New Employment." Meanwhile, "R&D Collaboration" achieves a larger success probability of "Patent Registration" and "Sales."

Bayesian inference of longitudinal Markov binary regression models with t-link function (t-링크를 갖는 마코프 이항 회귀 모형을 이용한 인도네시아 어린이 종단 자료에 대한 베이지안 분석)

  • Sim, Bohyun;Chung, Younshik
    • The Korean Journal of Applied Statistics
    • /
    • v.33 no.1
    • /
    • pp.47-59
    • /
    • 2020
  • In this paper, we present the longitudinal Markov binary regression model with t-link function when its transition order is known or unknown. It is assumed that logit or probit models are considered in binary regression models. Here, t-link function can be used for more flexibility instead of the probit model since the t distribution approaches to normal distribution as the degree of freedom goes to infinity. A Markov regression model is considered because of the longitudinal data of each individual data set. We propose Bayesian method to determine the transition order of Markov regression model. In particular, we use the deviance information criterion (DIC) (Spiegelhalter et al., 2002) of possible models in order to determine the transition order of the Markov binary regression model if the transition order is known; however, we compute and compare their posterior probabilities if unknown. In order to overcome the complicated Bayesian computation, our proposed model is reconstructed by the ideas of Albert and Chib (1993), Kuo and Mallick (1998), and Erkanli et al. (2001). Our proposed method is applied to the simulated data and real data examined by Sommer et al. (1984). Markov chain Monte Carlo methods to determine the optimal model are used assuming that the transition order of the Markov regression model are known or unknown. Gelman and Rubin's method (1992) is also employed to check the convergence of the Metropolis Hastings algorithm.

Analyzing Survival Data as Binary Outcomes with Logistic Regression

  • Lim, Jo-Han;Lee, Kyeong-Eun;Hahn, Kyu-S.;Park, Kun-Woo
    • Communications for Statistical Applications and Methods
    • /
    • v.17 no.1
    • /
    • pp.117-126
    • /
    • 2010
  • Clinical researchers often analyze survival data as binary outcomes using the logistic regression method. This paper examines the information loss resulting from analyzing survival time as binary outcomes. We first demonstrate that, under the proportional hazard assumption, this binary discretization does result in a significant information loss. Second, when fitting a logistic model to survival time data, researchers inadvertently use the maximal statistic. We implement a numerical study to examine the properties of the reference distribution for this statistic, finally, we show that the logistic regression method can still be a useful tool for analyzing survival data in particular when the proportional hazard assumption is questionable.