• 제목/요약/키워드: binary logistic regression

검색결과 413건 처리시간 0.025초

A Comparative Study of Predictive Factors for Hypertension using Logistic Regression Analysis and Decision Tree Analysis

  • SoHyun Kim;SungHyoun Cho
    • Physical Therapy Rehabilitation Science
    • /
    • 제12권2호
    • /
    • pp.80-91
    • /
    • 2023
  • Objective: The purpose of this study is to identify factors that affect the incidence of hypertension using logistic regression and decision tree analysis, and to build and compare predictive models. Design: Secondary data analysis study Methods: We analyzed 9,859 subjects from the Korean health panel annual 2019 data provided by the Korea Institute for Health and Social Affairs and National Health Insurance Service. Frequency analysis, chi-square test, binary logistic regression, and decision tree analysis were performed on the data. Results: In logistic regression analysis, those who were 60 years of age or older (Odds ratio, OR=68.801, p<0.001), those who were divorced/widowhood/separated (OR=1.377, p<0.001), those who graduated from middle school or younger (OR=1, reference), those who did not walk at all (OR=1, reference), those who were obese (OR=5.109, p<0.001), and those who had poor subjective health status (OR=2.163, p<0.001) were more likely to develop hypertension. In the decision tree, those over 60 years of age, overweight or obese, and those who graduated from middle school or younger had the highest probability of developing hypertension at 83.3%. Logistic regression analysis showed a specificity of 85.3% and sensitivity of 47.9%; while decision tree analysis showed a specificity of 81.9% and sensitivity of 52.9%. In classification accuracy, logistic regression and decision tree analysis showed 73.6% and 72.6% prediction, respectively. Conclusions: Both logistic regression and decision tree analysis were adequate to explain the predictive model. It is thought that both analysis methods can be used as useful data for constructing a predictive model for hypertension.

로지스틱회귀에서 잔차산점도를 이용한 모형평가 (Model assessment with residual plot in logistic regression)

  • 강명욱
    • Journal of the Korean Data and Information Science Society
    • /
    • 제26권1호
    • /
    • pp.141-150
    • /
    • 2015
  • 로지스틱회귀에서 모형을 평가하거나 진단할 때 가설검정이 주로 사용되지만 이것만으로는 놓칠 수 있는 부분이 많고 이에 대한 보완을 위하여 그래픽적 방법의 사용이 요구된다. 그래프를 이용한 모형의 적절성 평가를 위한 도구로 잔차산점도가 널리 이용되고 있으나 적용 범위가 선형회귀에 국한되는 문제점이 있다. 해결 방안으로 주변모형산점도를 이용하여 모형의 적절성을 평가하는 방법이 있으나 역시 문제점을 가지고 있다. 본 논문에서는 주변모형산점도의 대안으로 카이잔차산점도를 제안하고 그 효용성을 알아본다.

불균형 이분 데이터 분류분석을 위한 데이터마이닝 절차 (A Data Mining Procedure for Unbalanced Binary Classification)

  • 정한나;이정화;전치혁
    • 대한산업공학회지
    • /
    • 제36권1호
    • /
    • pp.13-21
    • /
    • 2010
  • The prediction of contract cancellation of customers is essential in insurance companies but it is a difficult problem because the customer database is large and the target or cancelled customers are a small proportion of the database. This paper proposes a new data mining approach to the binary classification by handling a large-scale unbalanced data. Over-sampling, clustering, regularized logistic regression and boosting are also incorporated in the proposed approach. The proposed approach was applied to a real data set in the area of insurance and the results were compared with some other classification techniques.

How Do South Korean People View the US and Chinese National Influence?: Is Soft Power Zero-Sum?

  • Zhao, Xiaoyu
    • Asian Journal for Public Opinion Research
    • /
    • 제5권1호
    • /
    • pp.15-40
    • /
    • 2017
  • This paper addresses the zero-sum of soft power against the backdrop of the rise of China and the relative "decline" of America. It attempts to find out that whether the "decline" of America's soft power is caused by the rise of China's soft power, and whether China's rise could guarantee with certainty the growth of soft power. In light of the particularity of South Korea, that is, its economy relies on China and its security relies on the US, this paper chooses South Korea as the entry point for the study. Based on the Pew data from a South Korean opinion poll, this paper conducts bivariate correlation and binary logistic regression respectively, to explore the existence of zero-sum "competitions" between China's and America's soft power.

보행자 다리상해 영향요인 분석 (Analysis of Factors Affecting Pedestrian Leg Injury Severity)

  • 박재홍;오철
    • 한국자동차공학회논문집
    • /
    • 제19권3호
    • /
    • pp.9-15
    • /
    • 2011
  • This study analyzed contributing factors affecting leg injury severity in pedestrian-vehicle crashes. A Binary Logistic Regression (BLR) method was used to identify the factors. Independent variables include characteristics for pedestrian, vehicle, road, and environmental conditions. The leg injury severity is classified into two classes, which are dependent variables in this study, such as 'severe' and 'minor' injuries. Pedestrian age, collision speed, and the height of vehicle were identified as significant factors for the leg injury. The probabilistic outcome of predicting leg injury severity can be effectively used in not only deriving pedestrian-related safety policies but also developing advanced vehicular technologies for pedestrian protection.

3차원 잔차산점도를 이용한 로지스틱회귀모형에서 교호작용의 탐색 (Exploring interaction using 3-D residual plots in logistic regression model)

  • 강명욱
    • Journal of the Korean Data and Information Science Society
    • /
    • 제25권1호
    • /
    • pp.177-185
    • /
    • 2014
  • 로지스틱회귀모형에서 설명변수만으로는 충분히 설명이 되지 못하고 설명변수의 변환된 형태인 이차항 또는 교호작용항이 필요한 경우가 있다. 설명변수가 두 개이고 조건부 분포가 이변량 정규분포를 따르는 경우 로지스틱회귀모형에서는 기본적으로 이차항과 교호작용항이 모형에 포함되어야 한다. 하지만 조건부 분포의 분산과 상관계수에 따라 이차항과 교호작용항이 필요하지 않게 되는 경우도 있다. 분산이나 상관계수에 대한 정보는 산점도를 보고 대체적인 판단이 가능하지만 교호작용항의 필요성을 판단하기가 쉽지 않다. 본 논문에서는 3차원 잔차산점도를 이용한 교호작용의 탐색방법을 제시하고 이 방법을 실제 자료에 적용시켜본다.

로지스틱 회귀모형에서 이변량 정규분포에 근거한 로그-밀도비 (Log-density Ratio with Two Predictors in a Logistic Regression Model)

  • 강명욱;윤재은
    • 응용통계연구
    • /
    • 제26권1호
    • /
    • pp.141-149
    • /
    • 2013
  • 로지스틱회귀모형에서 두 설명변수의 조건부 분포가 모두 이변량 정규분포라고 할 수 있다면 설명변수들의 함수로 표현되는 로그-밀도비를 통해 모형에 포함시켜야하는 항을 알 수 있다. 두개의 이변량 정규분포에서 분산-공분산행렬이 같은 경우에는 이차항과 교차항 없이 일차항만으로 충분하다. 상관계수가 모두 0이면 교차항은 설명변수의 분산과 관계없이 필요하지 않다. 또한 로지스틱회귀모형에서 로그-밀도비를 통해 이차항과 교차항이 필요하지 않게 되는 다른 조건들도 알아본다.

On a Bayes Criterion for the Goodness-of-Link Test for Binary Response Regression Models : Probit Link versus Logit Link

  • Kim, Hea-Jung
    • Journal of the Korean Statistical Society
    • /
    • 제26권2호
    • /
    • pp.261-276
    • /
    • 1997
  • In the context of binary response regression, the problem of constructing Bayesian goodness-of-link test for testing logit link versus probit link is considered. Based upon the well known facts that cdf of logistic variate .approx. cdf of $t_{8}$/.634 and, as .nu. .to. .infty., cdf of $t_{\nu}$ approximates to that of N(0,1), Bayes factor is derived as a test criterion. A synthesis of the Gibbs sampling and a marginal likelihood estimation scheme is also proposed to compute the Bayes factor. Performance of the test is investigated via Monte Carlo study. The new test is also illustrated with an empirical data example.e.

  • PDF

로지스틱 회귀분석을 통한 암반사면의 안정성 평가법 제안 (A Proposal of the Evaluation Method for Rock Slope Stability Using Logistic Regression Analysis)

  • 이용희;김종열
    • 터널과지하공간
    • /
    • 제14권2호
    • /
    • pp.133-141
    • /
    • 2004
  • 현장조사를 통해 암반사면의 안정성을 평가하기 위해서 여러 연구자들에 의해 평가법이 제안되었다. 그러나 기존의 평가법들은 제안자의 주관적 판단에 의해 평가항목의 선정과 가중치가 달리 적용되고 있어 평가법에 따라 안정성 평가결과도 서로 상이하게 나타나고 있다. 따라서 각 평가항목에 대한 가중치의 객관성을 확보하기 위해 로지스틱 회귀분석을 실시하여 안정성 평가법을 제안하였다.

합류하는 두 항공기간 도착순서 결정에 대한 로지스틱회귀 예측 모형 (Prediction Model with a Logistic Regression of Sequencing Two Arrival Flows)

  • 정소연;이금진
    • 한국항공운항학회지
    • /
    • 제23권4호
    • /
    • pp.42-48
    • /
    • 2015
  • This paper has its purpose on constructing a prediction model of the arrival sequencing strategy which reflects the actual sequencing patterns of air traffic controllers. As the first step, we analyzed a pair-wise sequencing of two aircraft entering TMA from different entering points. Based on the historical trajectory data, several traffic factors such as time, speed and traffic density were examined for the model. With statistically significant factors, we constructed a prediction model of arrival sequencing through a binary logistic regression analysis. With the estimated coefficients, the performance of the model was conducted through a cross validation.