• 제목/요약/키워드: Binary Logistic Model

검색결과 163건 처리시간 0.019초

3차원 잔차산점도를 이용한 로지스틱회귀모형에서 교호작용의 탐색 (Exploring interaction using 3-D residual plots in logistic regression model)

  • 강명욱
    • Journal of the Korean Data and Information Science Society
    • /
    • 제25권1호
    • /
    • pp.177-185
    • /
    • 2014
  • 로지스틱회귀모형에서 설명변수만으로는 충분히 설명이 되지 못하고 설명변수의 변환된 형태인 이차항 또는 교호작용항이 필요한 경우가 있다. 설명변수가 두 개이고 조건부 분포가 이변량 정규분포를 따르는 경우 로지스틱회귀모형에서는 기본적으로 이차항과 교호작용항이 모형에 포함되어야 한다. 하지만 조건부 분포의 분산과 상관계수에 따라 이차항과 교호작용항이 필요하지 않게 되는 경우도 있다. 분산이나 상관계수에 대한 정보는 산점도를 보고 대체적인 판단이 가능하지만 교호작용항의 필요성을 판단하기가 쉽지 않다. 본 논문에서는 3차원 잔차산점도를 이용한 교호작용의 탐색방법을 제시하고 이 방법을 실제 자료에 적용시켜본다.

로지스틱 회귀모형에서 이변량 정규분포에 근거한 로그-밀도비 (Log-density Ratio with Two Predictors in a Logistic Regression Model)

  • 강명욱;윤재은
    • 응용통계연구
    • /
    • 제26권1호
    • /
    • pp.141-149
    • /
    • 2013
  • 로지스틱회귀모형에서 두 설명변수의 조건부 분포가 모두 이변량 정규분포라고 할 수 있다면 설명변수들의 함수로 표현되는 로그-밀도비를 통해 모형에 포함시켜야하는 항을 알 수 있다. 두개의 이변량 정규분포에서 분산-공분산행렬이 같은 경우에는 이차항과 교차항 없이 일차항만으로 충분하다. 상관계수가 모두 0이면 교차항은 설명변수의 분산과 관계없이 필요하지 않다. 또한 로지스틱회귀모형에서 로그-밀도비를 통해 이차항과 교차항이 필요하지 않게 되는 다른 조건들도 알아본다.

Geographically weighted kernel logistic regression for small area proportion estimation

  • Shim, Jooyong;Hwang, Changha
    • Journal of the Korean Data and Information Science Society
    • /
    • 제27권2호
    • /
    • pp.531-538
    • /
    • 2016
  • In this paper we deal with the small area estimation for the case that the response variables take binary values. The mixed effects models have been extensively studied for the small area estimation, which treats the spatial effects as random effects. However, when the spatial information of each area is given specifically as coordinates it is popular to use the geographically weighted logistic regression to incorporate the spatial information by assuming that the regression parameters vary spatially across areas. In this paper, relaxing the linearity assumption and propose a geographically weighted kernel logistic regression for estimating small area proportions by using basic principle of kernel machine. Numerical studies have been carried out to compare the performance of proposed method with other methods in estimating small area proportion.

Semiparametric kernel logistic regression with longitudinal data

  • Shim, Joo-Yong;Seok, Kyung-Ha
    • Journal of the Korean Data and Information Science Society
    • /
    • 제23권2호
    • /
    • pp.385-392
    • /
    • 2012
  • Logistic regression is a well known binary classification method in the field of statistical learning. Mixed-effect regression models are widely used for the analysis of correlated data such as those found in longitudinal studies. We consider kernel extensions with semiparametric fixed effects and parametric random effects for the logistic regression. The estimation is performed through the penalized likelihood method based on kernel trick, and our focus is on the efficient computation and the effective hyperparameter selection. For the selection of optimal hyperparameters, cross-validation techniques are employed. Numerical results are then presented to indicate the performance of the proposed procedure.

범주형 자료의 진단방법에 관한 연구 (A Study on Diagnostics Method for Categorical Data)

  • 이선규;조범석
    • 산업경영시스템학회지
    • /
    • 제18권33호
    • /
    • pp.93-102
    • /
    • 1995
  • In this study we are concerned with the diagnostics method of cross-classified categorical data using logistic regression model of binary response models for cell proportions. under this model, we could examine the goodness-of-fit of the models using Pearson's $x^2$test statistic and likelihood ratio statistic. Under this model, these statistics are assumed that sample survey schemes are with replacement sampling model. But these statistics are often inappropriate for analysing contingency tables consists of complex sampling schemes obtained sample survey data. In this study we are examined diagnostics procedures detecting any outlying cell proportions and influential observations on design space in logistic regression modeltake account of the survey design effects.

  • PDF

Goodness-of-fit tests for a proportional odds model

  • Lee, Hyun Yung
    • Journal of the Korean Data and Information Science Society
    • /
    • 제24권6호
    • /
    • pp.1465-1475
    • /
    • 2013
  • The chi-square type test statistic is the most commonly used test in terms of measuring testing goodness-of-fit for multinomial logistic regression model, which has its grouped data (binomial data) and ungrouped (binary) data classified by a covariate pattern. Chi-square type statistic is not a satisfactory gauge, however, because the ungrouped Pearson chi-square statistic does not adhere well to the chi-square statistic and the ungrouped Pearson chi-square statistic is also not a satisfactory form of measurement in itself. Currently, goodness-of-fit in the ordinal setting is often assessed using the Pearson chi-square statistic and deviance tests. These tests involve creating a contingency table in which rows consist of all possible cross-classifications of the model covariates, and columns consist of the levels of the ordinal response. I examined goodness-of-fit tests for a proportional odds logistic regression model-the most commonly used regression model for an ordinal response variable. Using a simulation study, I investigated the distribution and power properties of this test and compared these with those of three other goodness-of-fit tests. The new test had lower power than the existing tests; however, it was able to detect a greater number of the different types of lack of fit considered in this study. I illustrated the ability of the tests to detect lack of fit using a study of aftercare decisions for psychiatrically hospitalized adolescents.

데이터마이닝을 활용한 한국프로야구 승패예측모형 수립에 관한 연구 (Using Data Mining Techniques to Predict Win-Loss in Korean Professional Baseball Games)

  • 오윤학;김한;윤재섭;이종석
    • 대한산업공학회지
    • /
    • 제40권1호
    • /
    • pp.8-17
    • /
    • 2014
  • In this research, we employed various data mining techniques to build predictive models for win-loss prediction in Korean professional baseball games. The historical data containing information about players and teams was obtained from the official materials that are provided by the KBO website. Using the collected raw data, we additionally prepared two more types of dataset, which are in ratio and binary format respectively. Dividing away-team's records by the records of the corresponding home-team generated the ratio dataset, while the binary dataset was obtained by comparing the record values. We applied seven classification techniques to three (raw, ratio, and binary) datasets. The employed data mining techniques are decision tree, random forest, logistic regression, neural network, support vector machine, linear discriminant analysis, and quadratic discriminant analysis. Among 21(= 3 datasets${\times}$7 techniques) prediction scenarios, the most accurate model was obtained from the random forest technique based on the binary dataset, which prediction accuracy was 84.14%. It was also observed that using the ratio and the binary dataset helped to build better prediction models than using the raw data. From the capability of variable selection in decision tree, random forest, and stepwise logistic regression, we found that annual salary, earned run, strikeout, pitcher's winning percentage, and four balls are important winning factors of a game. This research is distinct from existing studies in that we used three different types of data and various data mining techniques for win-loss prediction in Korean professional baseball games.

로지스틱 회귀모형을 이용한 환경정책 효과 분석: 울산광역시 녹지변화 분석을 중심으로 (An Analysis of Environmental Policy Effect on Green Space Change using Logistic Regression Model : The Case of Ulsan Metropolitan City)

  • 이성주;류지은;전성우
    • 한국환경복원기술학회지
    • /
    • 제23권4호
    • /
    • pp.13-30
    • /
    • 2020
  • This study aims to analyze the qualitative and quantitative effects of environmental policies in terms of green space management using logistic regression model(LRM). Landsat satellite imageries in 1985, 1992, 2000, 2008, and 2015 are classified using a hybrid-classification method. Based on these classified maps, logistic regression model having a deforestation tendency of the past is built. Binary green space change map is used for the dependent variable and four explanatory variables are used: distance from green space, distance from settlements, elevation, and slope. The green space map of 2008 and 2015 is predicted using the constructed model. The conservation effect of Ulsan's environmental policies is quantified through the numerical comparison of green area between the predicted and real data. Time-series analysis of green space showed that restoration and destruction of green space are highly related to human activities rather than natural land transition. The effect of green space management policy was spatially-explicit and brought a significant increase in green space. Furthermore, as a result of quantitative analysis, Ulsan's environmental policy had effects of conserving and restoring 111.75㎢ and 175.45㎢ respectively for the periods of eight and fifteen years. Among four variables, slope was the most determinant factor that accounts for the destruction of green space in the city. This study presents logistic regression model as a way of evaluating the effect of environmental policies that have been practiced in the city. It has its significance in that it allows us a comprehensive understanding of the effect by considering every direct and indirect effect from other domains, such as air and water, on green space. We conclude discussing practicability of implementing environmental policy in terms of green space management with the focus on a non-statutory plan.

로지스틱회귀모형에서 로그-밀도비를 이용한 변수의 선택 (Variable Selection with Log-Density in Logistic Regression Model)

  • 강명욱;신은영
    • Communications for Statistical Applications and Methods
    • /
    • 제19권1호
    • /
    • pp.1-11
    • /
    • 2012
  • 로지스틱회귀모형에서 반응변수가 주어졌을 때 설명변수의 조건부 확률분포의 로그-밀도비는 어떤 설명변수가어떻게모형에포함되는지에대한변수선택문제에서유용한정보를제공한다. 설명변수의 조건부 확률분포가 좌우대칭이 아닌 경우 감마분포로 가정하는 것이 적절하다. 여러 가지 모의실험을 수행한 결과를 보면, $x{\mid}y$ = 0과 $x{\mid}y$ = 1의 두 분포가 겹치는 경우에서는 x항과 log(x)항 모두 필요하다. 그리고 두 분포가 분리된 경우에는 x항 또는 log(x)항 중 하나만 필요하다.

공공부조 수급자의 자살생각 영향요인 (Factors Affecting on Suicidal Ideation in Public Assistance Recipients)

  • 이주현;김민지;이병희;노진원
    • 한국콘텐츠학회논문지
    • /
    • 제15권8호
    • /
    • pp.366-374
    • /
    • 2015
  • 본 연구는 공공부조를 지원받고 있는 대상자를 중심으로 자살생각에 영향을 미치는 요인이 무엇인지 살펴보았다. 이를 위하여 한국보건사회연구원과 서울대학교 사회복지연구소에서 조사한 한국복지패널(Korea Welfare Panel Study) 7차년도(2012년) 자료를 사용하였으며, 분석방법은 자살생각에 대한 영향의 정도를 파악하기 위해 이항 로짓모형을 이용한 회귀분석(binary logistic regression analysis)을 실시하였다. 분석 결과 중졸인 경우, 기혼인 경우, 자아존중감이 높을수록, 공공부조 만족도가 높을수록 자살생각을 할 가능성이 낮은 것으로 나타났다. 또한 우울이 있는 경우, 중년층인 경우 자살생각을 할 가능성이 높았다. 빈곤층에게 신체적, 심리적 요인만이 아닌 공공부조 만족도 역시 자살생각에 영향을 미칠 수 있다는 것을 증명하였다. 따라서 지원을 받는 대상자들의 만족도를 측정하는 것도 자살생각에 영향을 미치는 중요한 요인 중 하나일 수 있음을 시사한다.