• Title/Summary/Keyword: 이항반응변수

Search Result 27, Processing Time 0.019 seconds

A Study for Improving the Performance of Data Mining Using Ensemble Techniques (앙상블기법을 이용한 다양한 데이터마이닝 성능향상 연구)

  • Jung, Yon-Hae;Eo, Soo-Heang;Moon, Ho-Seok;Cho, Hyung-Jun
    • Communications for Statistical Applications and Methods
    • /
    • v.17 no.4
    • /
    • pp.561-574
    • /
    • 2010
  • We studied the performance of 8 data mining algorithms including decision trees, logistic regression, LDA, QDA, Neral network, and SVM and their combinations of 2 ensemble techniques, bagging and boosting. In this study, we utilized 13 data sets with binary responses. Sensitivity, Specificity and missclassificate error were used as criteria for comparison.

Bayesian Inference for the Zero In ated Negative Binomial Regression Model (제로팽창 음이항 회귀모형에 대한 베이지안 추론)

  • Shim, Jung-Suk;Lee, Dong-Hee;Jun, Byoung-Cheol
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.5
    • /
    • pp.951-961
    • /
    • 2011
  • In this paper, we propose a Bayesian inference using the Markov Chain Monte Carlo(MCMC) method for the zero inflated negative binomial(ZINB) regression model. The proposed model allows the regression model for zero inflation probability as well as the regression model for the mean of the dependent variable. This extends the work of Jang et al. (2010) to the fully defiend ZINB regression model. In addition, we apply the proposed method to a real data example, and compare the efficiency with the zero inflated Poisson model using the DIC. Since the DIC of the ZINB is smaller than that of the ZIP, the ZINB model shows superior performance over the ZIP model in zero inflated count data with overdispersion.

Fit of the number of insurance solicitor's turnovers using zero-inflated negative binomial regression (영과잉 음이항회귀 모형을 이용한 보험설계사들의 이직횟수 적합)

  • Chun, Heuiju
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.5
    • /
    • pp.1087-1097
    • /
    • 2017
  • This study aims to find the best model to fit the number of insurance solicitor's turnovers of life insurance companies using count data regression models such as poisson regression, negative binomial regression, zero-inflated poisson regression, or zero-inflated negative binomial regression. Out of the four models, zero-inflated negative binomial model has been selected based on AIC and SBC criteria, which is due to over-dispersion and high proportion of zero-counts. The significant factors to affect insurance solicitor's turnover found to be a work period in current company, a total work period as financial planner, an affiliated corporation, and channel management satisfaction. We also have found that as the job satisfaction or the channel management satisfaction gets lower as channel management satisfaction, the number of insurance solicitor's turnovers increases. In addition, the total work period as financial planner has positive relationship with the number of insurance solicitor's turnovers, but the work period in current company has negative relationship with it.

On sampling algorithms for imbalanced binary data: performance comparison and some caveats (불균형적인 이항 자료 분석을 위한 샘플링 알고리즘들: 성능비교 및 주의점)

  • Kim, HanYong;Lee, Woojoo
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.5
    • /
    • pp.681-690
    • /
    • 2017
  • Various imbalanced binary classification problems exist such as fraud detection in banking operations, detecting spam mail and predicting defective products. Several sampling methods such as over sampling, under sampling, SMOTE have been developed to overcome the poor prediction performance of binary classifiers when the proportion of one group is dominant. In order to overcome this problem, several sampling methods such as over-sampling, under-sampling, SMOTE have been developed. In this study, we investigate prediction performance of logistic regression, Lasso, random forest, boosting and support vector machine in combination with the sampling methods for binary imbalanced data. Four real data sets are analyzed to see if there is a substantial improvement in prediction performance. We also emphasize some precautions when the sampling methods are implemented.

Categorical data analysis of sensory evaluation data with Hanwoo bull beef (한우 수소 고기 관능평가 데이터에 대한 범주형 자료 분석)

  • Lee, Hye-Jung;Cho, Soo-Hyun;Kim, Jae-Hee
    • Journal of the Korean Data and Information Science Society
    • /
    • v.20 no.5
    • /
    • pp.819-827
    • /
    • 2009
  • This study was conducted to investigate the relationship between the sociodemographic factors and the Korean consumers palatability evaluation grades with Hanwoo sensory evaluation data. The dichotomy logistic regression model and the multinomial logistic regression model are fitted with the independent variables such as the consumer living location, age, gender, occupation, monthly income, and beef cut and the the palatability grade as the dependent variable. Stepwise variable selection procedure is incorporated to find the final model and odds ratios are calculated to find the associations between categories.

  • PDF

Tests for Equality of Dispersions in the Generalized Bivariate Negative Binomial Regression Model with Heterogeneous Dispersions (서로 다른 산포를 갖는 이변량 음이항 회귀모형에서 산포의 동일성에 대한 검정)

  • Han, Sang-Moon;Jung, Byoung-Cheol
    • Communications for Statistical Applications and Methods
    • /
    • v.18 no.2
    • /
    • pp.219-227
    • /
    • 2011
  • In this paper, we proposed a generalized bivariate negative binomial distribution allowing heterogeneous dispersions on two dependent variables based on a trivariate reduction technique. In this model, we propose the score and LR tests for testing the equality of dispersions and compare the efficiencies of the proposed tests using a Monte Carlo study. The Monte Carlo study shows that the proposed score and LR tests prove to be an efficient test for the equality of dispersions in the view of the significance level and power. However, the score test is easier to compute than the LR test and it shows a slightly better performance than the LR test from the Monte Carlo study, we suggest the use of score tests for testing the equality of dispersions on two dependent variables. In addition, an empirical example is provided to illustrate the results.

The Study of Response' Type according to a Position of Variable on Linear Equation - Centering around the First and Third Grade of Middle School - (일차방정식에서 변수의 위치에 따른 반응 유형에 관한 연구 -중학교 1학년과 3학년을 중심으로-)

  • Seo, Jong-Jin
    • Journal of the Korean School Mathematics Society
    • /
    • v.12 no.3
    • /
    • pp.267-289
    • /
    • 2009
  • Students have difficulties in solving linear equation problems with a variable on the right side rather than linear equation problems a variable on the left side of the sign of equality. In order for students to overcome such difficulties, opportunities to experience many types of basic linear equation problems would have to be provided. Also, it is necessary to examine the process of students' problem solving process by constructing various types of evaluation item and test them in instruction and learning of linear equations, or grasp students' studying statues through individual interview and based on theses, error correction through feedbacks have to be achieved.

  • PDF

A Study on Mante1-Haenszel Test of Conditional Independence ($2\times2$ 분할표를 이용한 조건부 독립성 검정)

  • 김지현;임현선
    • The Korean Journal of Applied Statistics
    • /
    • v.11 no.2
    • /
    • pp.257-268
    • /
    • 1998
  • Many epidemiological studies investigate whether an association exists between a binary risk factor X and a binary response variable Y. They analyse whether an observed association between X and Y persists when the level of another factor Z that might influence the association is controlled. This involves testing conditional independence of X and Y controlling for Z. The Mantel-Haenszel test is most widely used to test conditional independence for sparse tables. But if the association between X and Y varies along the levels of Z, Mantel-Haenszel test has a low power problem. In this study, we propose an alternative test procedure which overcomes the low power problem in that case. We find out the null distribution of the alternative test statistic and compare its performance with the Mantel-Haenszel test by simulation.

  • PDF

Hyper-Geometric Distribution Software Reliability Growth Model : Generalizatio, Estimation and Prediction (초기하분포 소프트웨어 신뢰성 성장 모델 : 일반화, 추정과 예측)

  • Park, Jung-Yang;Yu, Chang-Yeol;Park, Jae-Hong
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.9
    • /
    • pp.2343-2349
    • /
    • 1999
  • The hyper-geometric distribution software reliability growth model (HGDM) was recently developed and successfully applied to real data sets. The HGDM considers the sensitivity factor as a parameter to be estimated. In order to reflect the random behavior of the test-and-debug process, this paper generalizes the HGDM by assuming that the sensitivity factor is a binomial random variable. Such a generalization enables us to easily understand the statistical characteristics of the HGDM. It is shown that the least squares method produces the identical results for both the HGDM and the generalized HGDM. Methods for computing the maximum likelihood estimates and predicting the future outcomes are also presented.

  • PDF

Comparing the performance of likelihood ratio test and F-test for gamma generalized linear models (감마 일반화 선형 모형에서의 가능도비 검정과 F-검정 비교연구)

  • Jo, Seongil;Han, Jeongseop;Lee, Woojoo
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.4
    • /
    • pp.475-484
    • /
    • 2018
  • Gamma generalized linear models are useful for non-negative and skewed responses. However, these models have received less attention than Poisson and binomial generalized linear models. In particular, hypothesis testing for the significance of regression coefficients has not been thoroughly studied. In this paper we assess the performance of various test statistics for gamma generalized linear models based on numerical studies. Our results show that the likelihood ratio test and F-type test are generally recommended and that the partial deviance test should be avoided in practice.