• 제목/요약/키워드: logistic regression models

검색결과 652건 처리시간 0.028초

Support Vector Machines을 이용한 개인신용평가 : 중국 금융기관을 중심으로 (An Application of Support Vector Machines to Personal Credit Scoring: Focusing on Financial Institutions in China)

  • 딩쉬엔저;이영찬
    • 산업융합연구
    • /
    • 제16권4호
    • /
    • pp.33-46
    • /
    • 2018
  • 개인신용평가는 은행이 대출을 승인할 때 수익성 있는 의사결정을 적절히 유도할 수 있는 효과적인 도구이다. 최근 많은 분류 알고리즘 및 모델이 개인신용평가에 사용되고 있다. 개인신용평가 기법은 대체로 통계적 방법과 비 통계적 방법으로 구분된다. 통계적 방법에는 선형회귀분석, 판별분석, 로지스틱 회귀분석, 의사결정나무 등이 포함된다. 비 통계적 방법에는 선형계획법, 신경망, 유전자 알고리즘 및 Support Vector Machines 등이 포함된다. 그러나 신용평가모형 개발을 위해 어떠한 방법이 최선인지에 관해서는 일관된 결론을 내리기는 어렵다. 본 논문에서는 중국 금융기관의 개인 신용 데이터를 사용하여 가장 대표적인 신용평가 기법인 로지스틱 회귀분석, 신경망 그리고 Support Vector Machines의 성능을 비교하고자 한다. 구체적으로, 세 가지 모형을 각각 구축하여 고객을 분류하고 분석 결과를 비교하였다. 분석결과에 따르면, Support Vector Machines이 로지스틱 회귀분석과 신경망보다 더 나은 성능을 가지는 것으로 나타났다.

유전자 알고리즘을 이용한 신경망 설계 (Designing Neural Network Using Genetic Algorithm)

  • 박정선
    • 한국정보처리학회논문지
    • /
    • 제4권9호
    • /
    • pp.2309-2314
    • /
    • 1997
  • 본 연구는 보험 회사의 파산 예측을 위하여 신경회로망이 사용되는데 이를 최적화하기 위하여 유전자 알고리즘이 사용된다. 유전자 알고리즘은 최적의 네트워크 구조와 매개변수들을 제시해 준다. 유전자 알고리즘에 의해 설계된 신경회로망은 파산 예측을 함에 있어 discriminant analysis, logistic regression, ID3, CART 등과 비교되는데 가장 좋은 성능을 보여준다.

  • PDF

객체지향 메트릭을 이용한 결함 예측 모형의 실험적 비교 (A Comparative Experiment of Software Defect Prediction Models using Object Oriented Metrics)

  • 김윤규;김태연;채흥석
    • 한국정보과학회논문지:컴퓨팅의 실제 및 레터
    • /
    • 제15권8호
    • /
    • pp.596-600
    • /
    • 2009
  • 검증과 확인을 통한 소프트웨어의 효율적인 관리를 지원하기 위하여 객체지향 메트릭 기반의 결함 예측 모형이 많이 제안되고 있다. 제안된 모형은 주로 로지스틱 회귀분석으로 개발하였다. 그리고 개발된 모형의 결함 예측 정확도는 60${\sim}$70%이었다. 본 논문에서는 기존 결함 예측 모형의 효과를 확인하기 위하여 이클립스 3.3을 대상으로 개발된 모형과 유사한 방법으로 실험을 하였다. 실험 결과 모형의 정확성은 약 40%이었다. 이는 주장된 예측력보다 많이 낮은 수치이었다. 또한 단순 로지스틱 회귀분석이 다중 로지스틱 회귀분석보다 높은 예측력을 보였다.

기계학습 알고리즘을 이용한 보행만족도 예측모형 개발 (Developing a Pedestrian Satisfaction Prediction Model Based on Machine Learning Algorithms)

  • 이제승;이현희
    • 국토계획
    • /
    • 제54권3호
    • /
    • pp.106-118
    • /
    • 2019
  • In order to develop pedestrian navigation service that provides optimal pedestrian routes based on pedestrian satisfaction levels, it is required to develop a prediction model that can estimate a pedestrian's satisfaction level given a certain condition. Thus, the aim of the present study is to develop a pedestrian satisfaction prediction model based on three machine learning algorithms: Logistic Regression, Random Forest, and Artificial Neural Network models. The 2009, 2012, 2013, 2014, and 2015 Pedestrian Satisfaction Survey Data in Seoul, Korea are used to train and test the machine learning models. As a result, the Random Forest model shows the best prediction performance among the three (Accuracy: 0.798, Recall: 0.906, Precision: 0.842, F1 Score: 0.873, AUC: 0.795). The performance of Artificial Neural Network is the second (Accuracy: 0.773, Recall: 0.917, Precision: 0.811, F1 Score: 0.868, AUC: 0.738) and Logistic Regression model's performance follows the second (Accuracy: 0.764, Recall: 1.000, Precision: 0.764, F1 Score: 0.868, AUC: 0.575). The precision score of the Random Forest model implies that approximately 84.2% of pedestrians may be satisfied if they walk the areas, suggested by the Random Forest model.

A Study on a car Insurance purchase Prediction Using Two-Class Logistic Regression and Two-Class Boosted Decision Tree

  • AN, Su Hyun;YEO, Seong Hee;KANG, Minsoo
    • 한국인공지능학회지
    • /
    • 제9권1호
    • /
    • pp.9-14
    • /
    • 2021
  • This paper predicted a model that indicates whether to buy a car based on primary health insurance customer data. Currently, automobiles are being used to land transportation and living, and the scope of use and equipment is expanding. This rapid increase in automobiles has caused automobile insurance to emerge as an essential business target for insurance companies. Therefore, if the car insurance sales are predicted and sold using the information of existing health insurance customers, it can generate continuous profits in the insurance company's operating performance. Therefore, this paper aims to analyze existing customer characteristics and implement a predictive model to activate advertisements for customers interested in such auto insurance. The goal of this study is to maximize the profits of insurance companies by devising communication strategies that can optimize business models and profits for customers. This study was conducted through the Microsoft Azure program, and an automobile insurance purchase prediction model was implemented using Health Insurance Cross-sell Prediction data. The program algorithm uses Two-Class Logistic Regression and Two-Class Boosted Decision Tree at the same time to compare two models and predict and compare the results. According to the results of this study, when the Threshold is 0.3, the AUC is 0.837, and the accuracy is 0.833, which has high accuracy. Therefore, the result was that customers with health insurance could induce a positive reaction to auto insurance purchases.

Predicting Suicidal Ideation in College Students with Mental Health Screening Questionnaires

  • Shim, Geumsook;Jeong, Bumseok
    • Psychiatry investigation
    • /
    • 제15권11호
    • /
    • pp.1037-1045
    • /
    • 2018
  • Objective The present study aimed to identify risk factors for future SI and to predict individual-level risk for future or persistent SI among college students. Methods Mental health check-up data collected over 3 years were retrospectively analyzed. Students were categorized as suicidal ideators and non-ideators at baseline. Logistic regression analyses were performed separately for each group, and the predicted probability for each student was calculated. Results Students likely to exhibit future SI had higher levels of mental health problems, including depression and anxiety, and significant risk factors for future SI included depression, current SI, social phobia, alcohol problems, being female, low self-esteem, and number of close relationships and concerns. Logistic regression models that included current suicide ideators revealed acceptable area under the curve (AUC) values (0.7-0.8) in both the receiver operating characteristic (ROC) and precision recall (PR) curves for predicting future SI. Predictive models with current suicide non-ideators revealed an acceptable level of AUCs only for ROC curves. Conclusion Several factors such as low self-esteem and a focus on short-term rather than long-term outcomes may enhance the prediction of future SI. Because a certain range of SI clearly necessitates clinical attention, further studies differentiating significant from other types of SI are necessary.

고령화연구패널조사를 이용한 경도인지장애 예측모형 (Prediction Models of Mild Cognitive Impairment Using the Korea Longitudinal Study of Ageing)

  • 박효진;하주영
    • 대한간호학회지
    • /
    • 제50권2호
    • /
    • pp.191-199
    • /
    • 2020
  • Purpose: The purpose of this study was to compare sociodemographic characteristics of a normal cognitive group and mild cognitive impairment group, and establish prediction models of Mild Cognitive Impairment (MCI). Methods: This study was a secondary data analysis research using data from "the 4th Korea Longitudinal Study of Ageing" of the Korea Employment Information Service. A total of 6,405 individuals, including 1,329 individuals with MCI and 5,076 individuals with normal cognitive abilities, were part of the study. Based on the panel survey items, the research used 28 variables. The methods of analysis included a χ2-test, logistic regression analysis, decision tree analysis, predicted error rate, and an ROC curve calculated using SPSS 23.0 and SAS 13.2. Results: In the MCI group, the mean age was 71.4 and 65.8% of the participants was women. There were statistically significant differences in gender, age, and education in both groups. Predictors of MCI determined by using a logistic regression analysis were gender, age, education, instrumental activity of daily living (IADL), perceived health status, participation group, cultural activities, and life satisfaction. Decision tree analysis of predictors of MCI identified education, age, life satisfaction, and IADL as predictors. Conclusion: The accuracy of logistic regression model for MCI is slightly higher than that of decision tree model. The implementation of the prediction model for MCI established in this study may be utilized to identify middle-aged and elderly people with risks of MCI. Therefore, this study may contribute to the prevention and reduction of dementia.

로지스틱모형에서 그래픽을 이용한 회귀와 모형평가 (Graphical regression and model assessment in logistic model)

  • 강명욱;김부용;홍주희
    • Journal of the Korean Data and Information Science Society
    • /
    • 제21권1호
    • /
    • pp.21-32
    • /
    • 2010
  • 그래픽적 회귀는 모형에 대한 가정을 하지 않고 회귀정보를 모두 포함하는 충분요약그림을 찾아내는 분석 방법으로 모든 회귀정보를 저차원의 그림으로 표현할 수 있게 하는 데에 그 목적이 있다. 잔차산점도를 이용한 모형의 평가는 적용 범위가 선형회귀모형에 국한되는 문제점이 있기 때문에 일반화선형모형에서는 그 대안으로 주변모형 산점도를 이용하여 모형의 적절성을 평가한다. 본 논문에서는 일반화선형모형 중에서 이진반응변수를 갖는 로지스틱모형에서의 그래픽적 회귀 방법과 주변모형 산점도를 이용한 모형평가 방법을 알아본다.

Generalized Partially Linear Additive Models for Credit Scoring

  • Shim, Ju-Hyun;Lee, Young-K.
    • 응용통계연구
    • /
    • 제24권4호
    • /
    • pp.587-595
    • /
    • 2011
  • Credit scoring is an objective and automatic system to assess the credit risk of each customer. The logistic regression model is one of the popular methods of credit scoring to predict the default probability; however, it may not detect possible nonlinear features of predictors despite the advantages of interpretability and low computation cost. In this paper, we propose to use a generalized partially linear model as an alternative to logistic regression. We also introduce modern ensemble technologies such as bagging, boosting and random forests. We compare these methods via a simulation study and illustrate them through a German credit dataset.

Data Mining for Knowledge Management in a Health Insurance Domain

  • Chae, Young-Moon;Ho, Seung-Hee;Cho, Kyoung-Won;Lee, Dong-Ha;Ji, Sun-Ha
    • 지능정보연구
    • /
    • 제6권1호
    • /
    • pp.73-82
    • /
    • 2000
  • This study examined the characteristicso f the knowledge discovery and data mining algorithms to demonstrate how they can be used to predict health outcomes and provide policy information for hypertension management using the Korea Medical Insurance Corporation database. Specifically this study validated the predictive power of data mining algorithms by comparing the performance of logistic regression and two decision tree algorithms CHAID (Chi-squared Automatic Interaction Detection) and C5.0 (a variant of C4.5) since logistic regression has assumed a major position in the healthcare field as a method for predicting or classifying health outcomes based on the specific characteristics of each individual case. This comparison was performed using the test set of 4,588 beneficiaries and the training set of 13,689 beneficiaries that were used to develop the models. On the contrary to the previous study CHAID algorithm performed better than logistic regression in predicting hypertension but C5.0 had the lowest predictive power. In addition CHAID algorithm and association rule also provided the segment characteristics for the risk factors that may be used in developing hypertension management programs. This showed that data mining approach can be a useful analytic tool for predicting and classifying health outcomes data.

  • PDF