• Title/Summary/Keyword: bayesian regression

Search Result 262, Processing Time 0.027 seconds

A comparison study of Bayesian high-dimensional linear regression models (베이지안 고차원 선형 회귀분석에서의 비교연구)

  • Shin, Ju-Won;Lee, Kyoungjae
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.3
    • /
    • pp.491-505
    • /
    • 2021
  • We consider linear regression models in high-dimensional settings (p ≫ n) and compare various classes of priors. The spike and slab prior is one of the most widely used priors for Bayesian regression models, but its model space is vast, resulting in a bad performance in finite samples. As an alternative, various continuous shrinkage priors, including the horseshoe prior and its variants, have been proposed. Although each of the above priors has been investigated separately, exhaustive comparative studies of their performance have been conducted very rarely. In this study, we compare the spike and slab prior, the horseshoe prior and its variants in various simulation settings. The performance of each method is demonstrated in terms of the regression coefficient estimation and variable selection. Finally, some remarks and suggestions are given based on comprehensive simulation studies.

Development of benthic macroinvertebrate species distribution models using the Bayesian optimization (베이지안 최적화를 통한 저서성 대형무척추동물 종분포모델 개발)

  • Go, ByeongGeon;Shin, Jihoon;Cha, Yoonkyung
    • Journal of Korean Society of Water and Wastewater
    • /
    • v.35 no.4
    • /
    • pp.259-275
    • /
    • 2021
  • This study explored the usefulness and implications of the Bayesian hyperparameter optimization in developing species distribution models (SDMs). A variety of machine learning (ML) algorithms, namely, support vector machine (SVM), random forest (RF), boosted regression tree (BRT), XGBoost (XGB), and Multilayer perceptron (MLP) were used for predicting the occurrence of four benthic macroinvertebrate species. The Bayesian optimization method successfully tuned model hyperparameters, with all ML models resulting an area under the curve (AUC) > 0.7. Also, hyperparameter search ranges that generally clustered around the optimal values suggest the efficiency of the Bayesian optimization in finding optimal sets of hyperparameters. Tree based ensemble algorithms (BRT, RF, and XGB) tended to show higher performances than SVM and MLP. Important hyperparameters and optimal values differed by species and ML model, indicating the necessity of hyperparameter tuning for improving individual model performances. The optimization results demonstrate that for all macroinvertebrate species SVM and RF required fewer numbers of trials until obtaining optimal hyperparameter sets, leading to reduced computational cost compared to other ML algorithms. The results of this study suggest that the Bayesian optimization is an efficient method for hyperparameter optimization of machine learning algorithms.

Bayesian Analysis for Neural Network Models

  • Chung, Younshik;Jung, Jinhyouk;Kim, Chansoo
    • Communications for Statistical Applications and Methods
    • /
    • v.9 no.1
    • /
    • pp.155-166
    • /
    • 2002
  • Neural networks have been studied as a popular tool for classification and they are very flexible. Also, they are used for many applications of pattern classification and pattern recognition. This paper focuses on Bayesian approach to feed-forward neural networks with single hidden layer of units with logistic activation. In this model, we are interested in deciding the number of nodes of neural network model with p input units, one hidden layer with m hidden nodes and one output unit in Bayesian setup for fixed m. Here, we use the latent variable into the prior of the coefficient regression, and we introduce the 'sequential step' which is based on the idea of the data augmentation by Tanner and Wong(1787). The MCMC method(Gibbs sampler and Metropolish algorithm) can be used to overcome the complicated Bayesian computation. Finally, a proposed method is applied to a simulated data.

A dynamic Bayesian approach for probability of default and stress test

  • Kim, Taeyoung;Park, Yousung
    • Communications for Statistical Applications and Methods
    • /
    • v.27 no.5
    • /
    • pp.579-588
    • /
    • 2020
  • Obligor defaults are cross-sectionally correlated as obligors share common economic conditions; in addition obligors are longitudinally correlated so that an economic shock like the IMF crisis in 1998 lasts for a period of time. A longitudinal correlation should be used to construct statistical scenarios of stress test with which we replace a type of artificial scenario that the banks have used. We propose a Bayesian model to accommodate such correlation structures. Using 402 obligors to a domestic bank in Korea, our model with a dynamic correlation is compared to a Bayesian model with a stationary longitudinal correlation and the classical logistic regression model. Our model generates statistical financial statement under a stress situation on individual obligor basis so that the genearted financial statement produces a similar distribution of credit grades to when the IMF crisis occurred and complies with Basel IV (Basel Committee on Banking Supervision, 2017) requirement that the credit grades under a stress situation are not sensitive to the business cycle.

A Study on the Distributed Lag Model by Bayesian Decision Making Method (분포시차모형의 Bayesian 의사결정법에 관한 연구)

  • 이필령
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.8 no.11
    • /
    • pp.27-34
    • /
    • 1985
  • Recently the distributed lag models for time series data have been used in several quantitative analyses. But the analyses of time series which have the serial correlations in error terms and the lagged values of dependent variables violate the hypothesis of OLS method. This paper suggests that the approach technique of distributed lay model with serial correlation should be applied by the Bayesian inference to estimate the parameters. For the application of distributed lag model by Bayesian analysis, the data for monthly consumption expenditure per household by items of commodities from 1972 to 1981 are used in order to estimate the lagged coefficient of processed food and the regression coefficient of the food and beverage.

  • PDF

The probabilistic estimation of inundation region using a multiple logistic regression analysis (다중 Logistic 회귀분석을 통한 침수지역의 확률적 도출)

  • Jung, Minkyu;Kim, Jin-Guk;Uranchimeg, Sumiya;Kwon, Hyun-Han
    • Journal of Korea Water Resources Association
    • /
    • v.53 no.2
    • /
    • pp.121-129
    • /
    • 2020
  • The increase of impervious surface and development along the river due to urbanization not only causes an increase in the number of associated flood risk factors but also exacerbates flood damage, leading to difficulties in flood management. Flood control measures should be prioritized based on various geographical information in urban areas. In this study, a probabilistic flood hazard assessment was applied to flood-prone areas near an urban river. Flood hazard maps were alternatively considered and used to describe the expected inundation areas for a given set of predictors such as elevation, slope, runoff curve number, and distance to river. This study proposes a Bayesian logistic regression-based flood risk model that aims to provide a probabilistic risk metric such as population-at-risk (PAR). Finally, the logistic regression model demonstrates the probabilistic flood hazard maps for the entire area.

Bayesian smoothing under structural measurement error model with multiple covariates

  • Hwang, Jinseub;Kim, Dal Ho
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.3
    • /
    • pp.709-720
    • /
    • 2017
  • In healthcare and medical research, many important variables have a measurement error such as body mass index and laboratory data. It is also not easy to collect samples of large size because of high cost and long time required to collect the target patient satisfied with inclusion and exclusion criteria. Beside, the demand for solving a complex scientific problem has highly increased so that a semiparametric regression approach could be of substantial value solving this problem. To address the issues of measurement error, small domain and a scientific complexity, we conduct a multivariable Bayesian smoothing under structural measurement error covariate in this article. Specifically we enhance our previous model by incorporating other useful auxiliary covariates free of measurement error. For the regression spline, we use a radial basis functions with fixed knots for the measurement error covariate. We organize a fully Bayesian approach to fit the model and estimate parameters using Markov chain Monte Carlo. Simulation results represent that the method performs well. We illustrate the results using a national survey data for application.

Production of Agrometeorological Information in Onion Fields using Geostatistical Models (지구 통계 모형을 이용한 양파 재배지 농업기상정보 생성 방법)

  • Im, Jieun;Yoon, Sanghoo
    • Journal of Environmental Science International
    • /
    • v.27 no.7
    • /
    • pp.509-518
    • /
    • 2018
  • Weather is the most influential factor for crop cultivation. Weather information for cultivated areas is necessary for growth and production forecasting of agricultural crops. However, there are limitations in the meteorological observations in cultivated areas because weather equipment is not installed. This study tested methods of predicting the daily mean temperature in onion fields using geostatistical models. Three models were considered: inverse distance weight method, generalized additive model, and Bayesian spatial linear model. Data were collected from the AWS (automatic weather system), ASOS (automated synoptic observing system), and an agricultural weather station between 2013 and 2016. To evaluate the prediction performance, data from AWS and ASOS were used as the modeling data, and data from the agricultural weather station were used as the validation data. It was found that the Bayesian spatial linear regression performed better than other models. Consequently, high-resolution maps of the daily mean temperature of Jeonnam were generated using all observed weather information.

Bayesian forecasting approach for structure response prediction and load effect separation of a revolving auditorium

  • Ma, Zhi;Yun, Chung-Bang;Shen, Yan-Bin;Yu, Feng;Wan, Hua-Ping;Luo, Yao-Zhi
    • Smart Structures and Systems
    • /
    • v.24 no.4
    • /
    • pp.507-524
    • /
    • 2019
  • A Bayesian dynamic linear model (BDLM) is presented for a data-driven analysis for response prediction and load effect separation of a revolving auditorium structure, where the main loads are self-weight and dead loads, temperature load, and audience load. Analyses are carried out based on the long-term monitoring data for static strains on several key members of the structure. Three improvements are introduced to the ordinary regression BDLM, which are a classificatory regression term to address the temporary audience load effect, improved inference for the variance of observation noise to be updated continuously, and component discount factors for effective load effect separation. The effects of those improvements are evaluated regarding the root mean square errors, standard deviations, and 95% confidence intervals of the predictions. Bayes factors are used for evaluating the probability distributions of the predictions, which are essential to structural condition assessments, such as outlier identification and reliability analysis. The performance of the present BDLM has been successfully verified based on the simulated data and the real data obtained from the structural health monitoring system installed on the revolving structure.

Analyzing effect and importance of input predictors for urban streamflow prediction based on a Bayesian tree-based model

  • Nguyen, Duc Hai;Bae, Deg-Hyo
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2022.05a
    • /
    • pp.134-134
    • /
    • 2022
  • Streamflow forecasting plays a crucial role in water resource control, especially in highly urbanized areas that are very vulnerable to flooding during heavy rainfall event. In addition to providing the accurate prediction, the evaluation of effects and importance of the input predictors can contribute to water manager. Recently, machine learning techniques have applied their advantages for modeling complex and nonlinear hydrological processes. However, the techniques have not considered properly the importance and uncertainty of the predictor variables. To address these concerns, we applied the GA-BART, that integrates a genetic algorithm (GA) with the Bayesian additive regression tree (BART) model for hourly streamflow forecasting and analyzing input predictors. The Jungrang urban basin was selected as a case study and a database was established based on 39 heavy rainfall events during 2003 and 2020 from the rain gauges and monitoring stations. For the goal of this study, we used a combination of inputs that included the areal rainfall of the subbasins at current time step and previous time steps and water level and streamflow of the stations at time step for multistep-ahead streamflow predictions. An analysis of multiple datasets including different input predictors was performed to define the optimal set for streamflow forecasting. In addition, the GA-BART model could reasonably determine the relative importance of the input variables. The assessment might help water resource managers improve the accuracy of forecasts and early flood warnings in the basin.

  • PDF