• Title/Summary/Keyword: 베이지안 회귀분석

Search Result 73, Processing Time 0.023 seconds

Estimating Probability of Mode Choice at Regional Level by Considering Spatial Association of Departure Place (출발지 공간 연관성을 고려한 지역별 수단선택확률 추정 연구)

  • Eom, Jin-Ki;Park, Man-Sik;Heo, Tae-Young
    • Journal of the Korean Society for Railway
    • /
    • v.12 no.5
    • /
    • pp.656-662
    • /
    • 2009
  • In general, the analysis of travelers' mode choice behavior is accomplished by developing the utility functions which reflect individual's preference of mode choice according to their demographic and travel characteristics. In this paper, we propose a methodology that takes the spatial effects of individuals' departure locations into account in the mode choice model. The statistical models considered here are spatial logistic regression model and conditional autoregressive model taking a spatial association parameter into account. We employed the Bayesian approach in order to obtain more reliable parameter estimates. The proposed methodology allows us to estimate mode shares by departure places even though the survey does not cover all areas.

Robust multiple imputation method for missings with boundary and outliers (한계와 이상치가 있는 결측치의 로버스트 다중대체 방법)

  • Park, Yousung;Oh, Do Young;Kwon, Tae Yeon
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.6
    • /
    • pp.889-898
    • /
    • 2019
  • The problem of missing value imputation for variables in surveys that include item missing becomes complicated if outliers and logical boundary conditions between other survey items cannot be ignored. If there are outliers and boundaries in a variable including missing values, imputed values based on previous regression-based imputation methods are likely to be biased and not meet boundary conditions. In this paper, we approach these difficulties in imputation by combining various robust regression models and multiple imputation methods. Through a simulation study on various scenarios of outliers and boundaries, we find and discuss the optimal combination of robust regression and multiple imputation method.

Features Reduction using Logistic Regression for Spam Filtering (로지스틱 회귀 분석을 이용한 스펨 필터링의 특징 축소)

  • Jung, Yong-Gyu;Lee, Bum-Joon
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.10 no.2
    • /
    • pp.13-18
    • /
    • 2010
  • Today, The much amount of spam that occupies the mail server and network storage occurs the lack of negative issues, such as overload, and for users to delete the spam should spend time, resources have a problem. Automatic spam filtering on the incidence to solve the problem is essential. A lot of Spam filters have tried to solve the problem emerged as an essential element automatically. Unlike traditional method such as Naive Bayesian, PCA through the many-dimensional data set of spam with a few spindle-dimensional process that narrowed the operation to reduce the burden on certain groups for classification Logistic regression analysis method was used to filter the spam. Through the speed and performance, it was able to get the positive results.

A Comparative Study on the Accuracy of Important Statistical Prediction Techniques for Marketing Data (마케팅 데이터를 대상으로 중요 통계 예측 기법의 정확성에 대한 비교 연구)

  • Cho, Min-Ho
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.14 no.4
    • /
    • pp.775-780
    • /
    • 2019
  • Techniques for predicting the future can be categorized into statistics-based and deep-run-based techniques. Among them, statistic-based techniques are widely used because simple and highly accurate. However, working-level officials have difficulty using many analytical techniques correctly. In this study, we compared the accuracy of prediction by applying multinomial logistic regression, decision tree, random forest, support vector machine, and Bayesian inference to marketing related data. The same marketing data was used, and analysis was conducted by using R. The prediction results of various techniques reflecting the data characteristics of the marketing field will be a good reference for practitioners.

A Study on Parameter Tuning for Redis via Parameter Classification and Phased Bayesian Optimization (Redis 파라미터 분류 및 단계적 베이지안 최적화를 통한 파라미터 튜닝 연구)

  • Jo, Seong-Woon;Park, Sang-Hyun
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2021.11a
    • /
    • pp.476-479
    • /
    • 2021
  • DBMS 파라미터 튜닝이란 데이터베이스에서 제공하는 다양한 파라미터의 값을 조율하여, 최적의 성능을 도출하는 과정이다. 데이터베이스 종류에 따라 파라미터 개수가 수십 개에서 수백 개로 다양하며, 각 기능이 모두 다르기 때문에 최적의 조합을 찾는 것은 쉽지 않다. 선행 연구에서는 BO 기법을 사용하여 적절한 파라미터 값을 추출했지만, 파라미터 개수에 비례하여 차원이 커지는 문제가 발생한다. 본 논문에서는 통계적으로 파라미터를 분류하여 탐색 공간을 줄인 다음 단계적으로 BO 를 수행하는 PBO 방식을 제안한다. 파라미터 값을 랜덤하게 할당하여 벤치마킹한 결과값을 군집화한 후, 각 군집별로 파라미터와의 연관성을 분석해 높은 상관관계를 가진 파라미터를 매칭시켜 분류한다. 제안하는 방법론을 검증하기 위하여 8 가지 회귀 모델과의 비교 실험을 통해 제안한 방법론의 우수성을 검증하였다.

Pattern Recognition using Robust Feedforward Neural Networks (로버스트 다층전방향 신경망을 이용한 패턴인식)

  • Hwang, Chang-Ha;Kim, Sang-Min
    • Journal of the Korean Data and Information Science Society
    • /
    • v.9 no.2
    • /
    • pp.345-355
    • /
    • 1998
  • The back propagation(BP) algorithm allows multilayer feedforward neural networks to learn input-output mappings from training samples. It iteratively adjusts the network parameters(weights) to minimize the sum of squared approximation errors using a gradient descent technique. However, the mapping acquired through the BP algorithm may be corrupt when errorneous training data are employed. In this paper two types of robust backpropagation algorithms are discussed both from a theoretical point of view and in the case studies of nonlinear regression function estimation and handwritten Korean character recognition. For future research we suggest Bayesian learning approach to neural networks and compare it with two robust backpropagation algorithms.

  • PDF

Inferential Problems in Bayesian Logistic Regression Models (베이지안 로지스틱 회귀모형에서의 추론에 대한 연구)

  • Hwang, Jin-Soo;Kang, Sung-Chan
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.6
    • /
    • pp.1149-1160
    • /
    • 2011
  • Model selection and hypothesis testing problems in Bayesian inference are still debated between scholars. Bayesian factors traditionally used as a criterion in Bayesian hypothesis testing and model selection, are easy to understand but sometimes hard to compute. In addition, there are other model selection criterions such as DIC(Deviance Information Criterion) by Spiegelhalter et al. (2002) and Bayesian P-values for testing. In this paper, we briefly introduce the Bayesian hypothesis testing and model selection procedure. In addition we have applied a Bayesian inference to Swiss banknote data by a fitting logistic regression model and computing several test statistics to see if they provide consistent results.

Analysis of Total Crime Count Data Based on Spatial Association Structure (공간적 연관구조를 고려한 총범죄 자료 분석)

  • Choi, Jung-Soon;Park, Man-Sik;Won, Yu-Bok;Kim, Hag-Yeol;Heo, Tae-Young
    • The Korean Journal of Applied Statistics
    • /
    • v.23 no.2
    • /
    • pp.335-344
    • /
    • 2010
  • Reliability of the estimation is usually damaged in the situation where a linear regression model without spatial dependencies is employed to the spatial data analysis. In this study, we considered the conditional autoregressive model in order to construct spatial association structures and estimate the parameters via the Bayesian approaches. Finally, we compared the performances of the models with spatial effects and the ones without spatial effects. We analyzed the yearly total crime count data measured from each of 25 districts in Seoul, South Korea in 2007.

Validation of diacylglycerol O-acyltransferase1 gene effect on milk yield using Bayesian regression (베이지안 회귀를 이용한 국내 홀스타인 젖소의 유량형질 관련 DGAT1유전자 효과 검증)

  • Cho, Kwang-Hyun;Cho, Chung-Il;Park, Kyong-Do;Lee, Joon-Ho
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.6
    • /
    • pp.1249-1258
    • /
    • 2015
  • DGAT1(diacylglycerol O-acyltransferase1) gene is well known as a major gene of milk production in dairy cattle. This study was conducted to investigate how the DGAT1 gene effect on milk yield was appeared from the genome wide association (GWA) using high density whole genome SNP chip. The data set used in this study consisted of 353 Korean Holstein sires with 50k SNP genotypes and deregressed estimated breeding values of milk yield. After quality control 41,051 SNPs were selected and locations on chromosome were mapped using UMD 3.1. Bayesian regression of BayesB method (pi=0.99) was used to estimate the SNP effects and genomic breeding values. Percentages of variance explained by 1 Mb non-overlapping windows were calculated to detect the QTL region. As the result of this study, top 1 and 3 of 2,516 windows were seen around DGAT1 gene region and 0.51% and 0.48% of genetic variance were explained by these two windows. Although SNPs on the DGAT1 gene region are excluded in commercial 50k SNP chip, the effect of DGAT1 gene seem to be reflected on GWA by the SNPs which are in linkage disequilibrium with DGAT1 gene.

Investigating Opinion Mining Performance by Combining Feature Selection Methods with Word Embedding and BOW (Bag-of-Words) (속성선택방법과 워드임베딩 및 BOW (Bag-of-Words)를 결합한 오피니언 마이닝 성과에 관한 연구)

  • Eo, Kyun Sun;Lee, Kun Chang
    • Journal of Digital Convergence
    • /
    • v.17 no.2
    • /
    • pp.163-170
    • /
    • 2019
  • Over the past decade, the development of the Web explosively increased the data. Feature selection step is an important step in extracting valuable data from a large amount of data. This study proposes a novel opinion mining model based on combining feature selection (FS) methods with Word embedding to vector (Word2vec) and BOW (Bag-of-words). FS methods adopted for this study are CFS (Correlation based FS) and IG (Information Gain). To select an optimal FS method, a number of classifiers ranging from LR (logistic regression), NN (neural network), NBN (naive Bayesian network) to RF (random forest), RS (random subspace), ST (stacking). Empirical results with electronics and kitchen datasets showed that LR and ST classifiers combined with IG applied to BOW features yield best performance in opinion mining. Results with laptop and restaurant datasets revealed that the RF classifier using IG applied to Word2vec features represents best performance in opinion mining.