• Title/Summary/Keyword: 설명모형

Search Result 2,376, Processing Time 0.047 seconds

Domain Knowledge Incorporated Local Rule-based Explanation for ML-based Bankruptcy Prediction Model (머신러닝 기반 부도예측모형에서 로컬영역의 도메인 지식 통합 규칙 기반 설명 방법)

  • Soo Hyun Cho;Kyung-shik Shin
    • Information Systems Review
    • /
    • v.24 no.1
    • /
    • pp.105-123
    • /
    • 2022
  • Thanks to the remarkable success of Artificial Intelligence (A.I.) techniques, a new possibility for its application on the real-world problem has begun. One of the prominent applications is the bankruptcy prediction model as it is often used as a basic knowledge base for credit scoring models in the financial industry. As a result, there has been extensive research on how to improve the prediction accuracy of the model. However, despite its impressive performance, it is difficult to implement machine learning (ML)-based models due to its intrinsic trait of obscurity, especially when the field requires or values an explanation about the result obtained by the model. The financial domain is one of the areas where explanation matters to stakeholders such as domain experts and customers. In this paper, we propose a novel approach to incorporate financial domain knowledge into local rule generation to provide explanations for the bankruptcy prediction model at instance level. The result shows the proposed method successfully selects and classifies the extracted rules based on the feasibility and information they convey to the users.

A Study on the Modal Split Model Using Zonal Data (존 데이터 기반 수단분담모형에 관한 연구)

  • Ryu, Si-Kyun;Rho, Jeong-Hyun;Kim, Ji-Eun
    • Journal of Korean Society of Transportation
    • /
    • v.30 no.1
    • /
    • pp.113-123
    • /
    • 2012
  • This study introduces a new type of a modal split model that use zonal data instead of cost data as independent variables. It has been indicated that the ones using cost data have deficiencies in the multicollinearity of travel time and cost variables and unpredictability of independent variables. The zonal data employed in this study include (1) socioeconomic data, (2) land use data and (3) transportation system data. The test results showed that the proposed modal split model using zonal data performs better than the other does.

Korean Stock Price Index and Macroeconomic Forces (우리나라 증권시장과 거시경제변수 : ANN와 VECM의 설명력 비교)

  • Jung, Sung-Chang;Lee, Timothy H.
    • The Korean Journal of Financial Management
    • /
    • v.19 no.2
    • /
    • pp.211-231
    • /
    • 2002
  • 본 연구의 목적은 VECM(Vector Error Correction Model)과 인공지능모형(Artificial Neural Networks)을 이용하여 우리나라 증권시장과 거시경제 변수들과의 장기적 관계에 대한 설명력을 비교해보고자 함에 있다. VECM이 APT(Arbitrage Pricing Theory)에 기초를 둔 선형동학모형이라고 한다면, 인공지능모형은 비모수적 비선형모형이라는 점에서, 두 방법론의 분석결과를 직접 비판하는 것은 의미있는 연구라고 할 수 있다. 인공지능모형을 주로 활용하는 선행연구들에 의하면, 증권시장은 시장의 특이패턴들로 인해 계량경제학적 접근인 선형 모형보다는 인공지능모형을 통해 증권시장의 움직임을 설명하고 예측하는 것이 더 바람직할 수도 있다는 것이다. 따라서, 본 연구에서는 VECM분석에서 자료의 안정성을 검증하고, 공적분 백터를 발견한 이후, 장기적 균형관계의 실증적 분석을 하였다. 그리고, 인공지능모형에서는 delta rule과 Sigmoid 함수를 이용한 GRNN(General Regression Neural Net)과 Back-Propagation등의 방법들을 활용하였다. 이러한 분석결과, Back-Propagation 모형이 다른 모든 모형들보다도 더 우수한 설명력을 보여주고 있었다. 이러한 결과들은 인공지능모형이 동태적인 선형 모형보다도 더 우수한 설명력을 제공할 수 있는 가능성을 보여주고 있었다.

  • PDF

Subset Selection in the Poisson Models - A Normal Predictors case - (포아송 모형에서의 설명변수 선택문제 - 정규분포 설명변수하에서 -)

  • 박종선
    • The Korean Journal of Applied Statistics
    • /
    • v.11 no.2
    • /
    • pp.247-255
    • /
    • 1998
  • In this paper, a new subset selection problem in the Poisson model is considered under the normal predictors. It turns out that the subset model has bigger valiance than that of the Poisson model with random predictors and this has been used to derive new subset selection method similar to Mallows'$C_p$.

  • PDF

Domain Knowledge Incorporated Counterfactual Example-Based Explanation for Bankruptcy Prediction Model (부도예측모형에서 도메인 지식을 통합한 반사실적 예시 기반 설명력 증진 방법)

  • Cho, Soo Hyun;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.2
    • /
    • pp.307-332
    • /
    • 2022
  • One of the most intensively conducted research areas in business application study is a bankruptcy prediction model, a representative classification problem related to loan lending, investment decision making, and profitability to financial institutions. Many research demonstrated outstanding performance for bankruptcy prediction models using artificial intelligence techniques. However, since most machine learning algorithms are "black-box," AI has been identified as a prominent research topic for providing users with an explanation. Although there are many different approaches for explanations, this study focuses on explaining a bankruptcy prediction model using a counterfactual example. Users can obtain desired output from the model by using a counterfactual-based explanation, which provides an alternative case. This study introduces a counterfactual generation technique based on a genetic algorithm (GA) that leverages both domain knowledge (i.e., causal feasibility) and feature importance from a black-box model along with other critical counterfactual variables, including proximity, distribution, and sparsity. The proposed method was evaluated quantitatively and qualitatively to measure the quality and the validity.

Effects of Multicollinearity in Logit Model (로짓모형에 있어서 다중공선성의 영향에 관한 연구)

  • Ryu, Si-Kyun
    • Journal of Korean Society of Transportation
    • /
    • v.26 no.1
    • /
    • pp.113-126
    • /
    • 2008
  • This research aims to explore the effects of multicollinearity on the reliability and goodness of fit of logit model. To investigate the effects of multicollinearity on the multinominal logit model, numerical experiments are performed. The exploratory variables(attributes of utility functions) which have a certain degree of correlations from (rho=) 0.0 to (rho=) 0.9 are generated and rho-squares and t-statistics which are the indices of goodness of fit and reliability of logit model are traced. From the well designed numerical experiments, following findings are validated : 1) When a new exploratory variable is added, some of rho-squares increase while the others decrease. 2) The higher relations between generic variables lead a logit model worse with respect to goodness of fit. 3) Multicollinearity has a tendency to produce over-evaluated parameters. 4) The reliability of the estimated parameter has a tendency to decrease when the correlations between attributes are high. These results suggest that we have to examine the existence of multicollinearity and perform the proper treatments to diminish multicollinearity when we develop logit model.

Variable Selection in PLS Regression with Penalty Function (벌점함수를 이용한 부분최소제곱 회귀모형에서의 변수선택)

  • Park, Chong-Sun;Moon, Guy-Jong
    • Communications for Statistical Applications and Methods
    • /
    • v.15 no.4
    • /
    • pp.633-642
    • /
    • 2008
  • Variable selection algorithm for partial least square regression using penalty function is proposed. We use the fact that usual partial least square regression problem can be expressed as a maximization problem with appropriate constraints and we will add penalty function to this maximization problem. Then simulated annealing algorithm can be used in searching for optimal solutions of above maximization problem with penalty functions added. The HARD penalty function would be suggested as the best in several aspects. Illustrations with real and simulated examples are provided.

A credit classification method based on generalized additive models using factor scores of mixtures of common factor analyzers (공통요인분석자혼합모형의 요인점수를 이용한 일반화가법모형 기반 신용평가)

  • Lim, Su-Yeol;Baek, Jang-Sun
    • Journal of the Korean Data and Information Science Society
    • /
    • v.23 no.2
    • /
    • pp.235-245
    • /
    • 2012
  • Logistic discrimination is an useful statistical technique for quantitative analysis of financial service industry. Especially it is not only easy to be implemented, but also has good classification rate. Generalized additive model is useful for credit scoring since it has the same advantages of logistic discrimination as well as accounting ability for the nonlinear effects of the explanatory variables. It may, however, need too many additive terms in the model when the number of explanatory variables is very large and there may exist dependencies among the variables. Mixtures of factor analyzers can be used for dimension reduction of high-dimensional feature. This study proposes to use the low-dimensional factor scores of mixtures of factor analyzers as the new features in the generalized additive model. Its application is demonstrated in the classification of some real credit scoring data. The comparison of correct classification rates of competing techniques shows the superiority of the generalized additive model using factor scores.

Statistical Modeling for Forecasting Maximum Electricity Demand in Korea (한국 최대 전력량 예측을 위한 통계모형)

  • Yoon, Sang-Hoo;Lee, Young-Saeng;Park, Jeong-Soo
    • Communications for Statistical Applications and Methods
    • /
    • v.16 no.1
    • /
    • pp.127-135
    • /
    • 2009
  • It is necessary to forecast the amount of the maximum electricity demand for stabilizing the flow of electricity. The time series data was collected from the Korea Energy Research between January 2000 and December 2006. The data showed that they had a strong linear trend and seasonal change. Winters seasonal model, ARMA model were used to examine it. Root mean squared prediction error and mean absolute percentage prediction error were a criteria to select the best model. In addition, a nonstationary generalized extreme value distribution with explanatory variables was fitted to forecast the maximum electricity.

깁스표본기법을 이용한 설명변수 선택문제에서 사전분포의 설정-선형회귀모형을 중심으로-

  • 박종선;남궁평;한숙영
    • Communications for Statistical Applications and Methods
    • /
    • v.4 no.2
    • /
    • pp.333-343
    • /
    • 1997
  • 선형회귀분석에서 변수의 선택문제는 최적의 모형을 찾는데 아주 중요한 부분을 차지한다. George와 McCulloch(1993)는 계층적 베이즈 모형과 깁스표본법을 이용하여 선형회귀모형에서 변수를 선택하는 문제를 고려하였다. 이 논문에서는 George와 McCulloch의 모형을 바탕으로 각각의 설명변수가 모형에 포함될 사전확률을 객관적인 기준에 의하여 결정하는 문제를 고려하여 보았다.

  • PDF