• 제목/요약/키워드: Model interpretability

검색결과 48건 처리시간 0.028초

혼파초지에서 모형의 단계적 적용을 통한 수량예측 연구 (A Research on Yield Prediction of Mixed Pastures in Korea via Model Construction in Stages)

  • 오승민;김문주;팽경룬;이배훈;김지융;김병완;조무환;성경일
    • 한국초지조사료학회지
    • /
    • 제37권1호
    • /
    • pp.80-91
    • /
    • 2017
  • 본 연구는 기후요인을 이용한 혼파초지 수량예측모형을 기초로 하여 시비, 파종 및 조성연차 요인을 단계적으로 적용하여 해석력이 높은 모형을 선정하는데 목적이 있다. 혼파초지 수량예측모형 구축 과정은 자료(풀사료 및 기상자료)수집, 가공, 분석 및 모형 구축의 순이었다. 여기서 수량예측모형은 기후, 시비, 파종 및 조성연차 요인을 고려하여 6가지를 구축하였으며, 해석력 및 풀사료 생산 이론 측면의 검토를 통해 최적의 모형을 선택하였다. 그 결과 기후, 시비 및 파종과 조성연차(조성연차의 그룹화) 요인을 고려한 Model VI이 선택되었다(해석력=53.8%). Model VI의 요인 별 해석력은 기후요인이 가장 크고(24.5%) 시비(17.8%), 파종(10.7%) 및 조성연차(0.8%) 요인의 순이었다. 그러나 건물수량과 하고일수 간에 나타난 정(+)의 상관관계는 지역별 및 적산변수 등의 관점에서 검토가 필요하다. 또한 시비량 및 파종량은 특정값에 집중적으로 분포하고 있어 이차항(Quadratic term)을 이용하여 적정 수준에 관한 연구가 요구된다.

Generalized Partially Linear Additive Models for Credit Scoring

  • Shim, Ju-Hyun;Lee, Young-K.
    • 응용통계연구
    • /
    • 제24권4호
    • /
    • pp.587-595
    • /
    • 2011
  • Credit scoring is an objective and automatic system to assess the credit risk of each customer. The logistic regression model is one of the popular methods of credit scoring to predict the default probability; however, it may not detect possible nonlinear features of predictors despite the advantages of interpretability and low computation cost. In this paper, we propose to use a generalized partially linear model as an alternative to logistic regression. We also introduce modern ensemble technologies such as bagging, boosting and random forests. We compare these methods via a simulation study and illustrate them through a German credit dataset.

Regression Models for Haplotype-Based Association Studies

  • Oh, So-Hee;NamKung, Jung-Hyun;Park, Tae-Sung
    • Genomics & Informatics
    • /
    • 제5권1호
    • /
    • pp.19-23
    • /
    • 2007
  • In this paper, we provide an overview of statistical models for haplotype-based association studies, and summarize their features based on the design matrix. We classify the design matrix into the two types: direct and indirect. For these two kinds of matrices, we present and compare characteristics using a simple hypothetical example, and a real data set. The motivation behind this study was to provide practitioners with an improved understanding, to facilitate the informed selection of the appropriate haplotype-based model and to improve the interpretability of the models.

비관측요인모형을 이용한 한국의 국내총생산 분석 (Analysis of Korean GDP by unobserved components model)

  • 성병찬;이승경
    • Journal of the Korean Data and Information Science Society
    • /
    • 제22권5호
    • /
    • pp.829-837
    • /
    • 2011
  • 본 논문에서는 비관측요인모형을 이용하여 한국의 국내총생산 시계열 자료를 분석한다. 이 모형이 확률적 및 결정적 요인들을 모두 포괄할 수 있다는 점을 이용하여, 보다 다양한 형태로 시계열 자료의 모형화를 시도하였으며, 지수평활법 및 박스-젠킨스의 ARIMA모형과 예측력을 비교하였다. 국내 총생산 자료에 대한 2년간의 미래 예측에서 비관측요인모형이 보다 우수함을 보인다.

Enhancing prediction accuracy of concrete compressive strength using stacking ensemble machine learning

  • Yunpeng Zhao;Dimitrios Goulias;Setare Saremi
    • Computers and Concrete
    • /
    • 제32권3호
    • /
    • pp.233-246
    • /
    • 2023
  • Accurate prediction of concrete compressive strength can minimize the need for extensive, time-consuming, and costly mixture optimization testing and analysis. This study attempts to enhance the prediction accuracy of compressive strength using stacking ensemble machine learning (ML) with feature engineering techniques. Seven alternative ML models of increasing complexity were implemented and compared, including linear regression, SVM, decision tree, multiple layer perceptron, random forest, Xgboost and Adaboost. To further improve the prediction accuracy, a ML pipeline was proposed in which the feature engineering technique was implemented, and a two-layer stacked model was developed. The k-fold cross-validation approach was employed to optimize model parameters and train the stacked model. The stacked model showed superior performance in predicting concrete compressive strength with a correlation of determination (R2) of 0.985. Feature (i.e., variable) importance was determined to demonstrate how useful the synthetic features are in prediction and provide better interpretability of the data and the model. The methodology in this study promotes a more thorough assessment of alternative ML algorithms and rather than focusing on any single ML model type for concrete compressive strength prediction.

다중 목적 입자 군집 최적화 알고리즘 이용한 방사형 기저 함수 기반 다항식 신경회로망 구조 설계 (Structural Design of Radial Basis Function-based Polynomial Neural Networks by Using Multiobjective Particle Swarm Optimization)

  • 김욱동;오성권
    • 전기학회논문지
    • /
    • 제61권1호
    • /
    • pp.135-142
    • /
    • 2012
  • In this paper, we proposed a new architecture called radial basis function-based polynomial neural networks classifier that consists of heterogeneous neural networks such as radial basis function neural networks and polynomial neural networks. The underlying architecture of the proposed model equals to polynomial neural networks(PNNs) while polynomial neurons in PNNs are composed of Fuzzy-c means-based radial basis function neural networks(FCM-based RBFNNs) instead of the conventional polynomial function. We consider PNNs to find the optimal local models and use RBFNNs to cover the high dimensionality problems. Also, in the hidden layer of RBFNNs, FCM algorithm is used to produce some clusters based on the similarity of given dataset. The proposed model depends on some parameters such as the number of input variables in PNNs, the number of clusters and fuzzification coefficient in FCM and polynomial type in RBFNNs. A multiobjective particle swarm optimization using crowding distance (MoPSO-CD) is exploited in order to carry out both structural and parametric optimization of the proposed networks. MoPSO is introduced for not only the performance of model but also complexity and interpretability. The usefulness of the proposed model as a classifier is evaluated with the aid of some benchmark datasets such as iris and liver.

Aeroengine performance degradation prediction method considering operating conditions

  • Bangcheng Zhang;Shuo Gao;Zhong Zheng;Guanyu Hu
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제17권9호
    • /
    • pp.2314-2333
    • /
    • 2023
  • It is significant to predict the performance degradation of complex electromechanical systems. Among the existing performance degradation prediction models, belief rule base (BRB) is a model that deal with quantitative data and qualitative information with uncertainty. However, when analyzing dynamic systems where observable indicators change frequently over time and working conditions, the traditional belief rule base (BRB) can not adapt to frequent changes in working conditions, such as the prediction of aeroengine performance degradation considering working condition. For the sake of settling this problem, this paper puts forward a new hidden belief rule base (HBRB) prediction method, in which the performance of aeroengines is regarded as hidden behavior, and operating conditions are used as observable indicators of the HBRB model to describe the hidden behavior to solve the problem of performance degradation prediction under different times and operating conditions. The performance degradation prediction case study of turbofan aeroengine simulation experiments proves the advantages of HBRB model, and the results testify the effectiveness and practicability of this method. Furthermore, it is compared with other advanced forecasting methods. The results testify this model can generate better predictions in aspects of accuracy and interpretability.

공간 탐색 최적화 알고리즘을 이용한 K-Means 클러스터링 기반 다항식 방사형 기저 함수 신경회로망: 설계 및 비교 해석 (K-Means-Based Polynomial-Radial Basis Function Neural Network Using Space Search Algorithm: Design and Comparative Studies)

  • 김욱동;오성권
    • 제어로봇시스템학회논문지
    • /
    • 제17권8호
    • /
    • pp.731-738
    • /
    • 2011
  • In this paper, we introduce an advanced architecture of K-Means clustering-based polynomial Radial Basis Function Neural Networks (p-RBFNNs) designed with the aid of SSOA (Space Search Optimization Algorithm) and develop a comprehensive design methodology supporting their construction. In order to design the optimized p-RBFNNs, a center value of each receptive field is determined by running the K-Means clustering algorithm and then the center value and the width of the corresponding receptive field are optimized through SSOA. The connections (weights) of the proposed p-RBFNNs are of functional character and are realized by considering three types of polynomials. In addition, a WLSE (Weighted Least Square Estimation) is used to estimate the coefficients of polynomials (serving as functional connections of the network) of each node from output node. Therefore, a local learning capability and an interpretability of the proposed model are improved. The proposed model is illustrated with the use of nonlinear function, NOx called Machine Learning dataset. A comparative analysis reveals that the proposed model exhibits higher accuracy and superb predictive capability in comparison to some previous models available in the literature.

다차원척도법에 의한 서울주민의 교통수단선호 분석 (Multidimensional Scaling of User Preferences for the Transportation Modes in Seoul.)

  • 허우선
    • 대한교통학회지
    • /
    • 제4권1호
    • /
    • pp.12-27
    • /
    • 1986
  • This study examined user preferences toward transportation modes in Seoul. Two multidimensional scaling models, the ideal point and vector models, were applied to data on mode preferences of 114 adults in the metropolitan area. While both models produced fairly similar results, the vector model performed slightly better than the other in terms of interpretability of the results. The transport attributes elicited are comfort, flexibility, travel cost, travel time, privacy, and safety; among which comfort is salient most. The comfort variable is a multi-faceted attribute in nature. The variations of attribute preferences are most significant between the gender groups as well as worker/nonworker groups. In particular, male workers, female workers and female nonworkers form three distinctive market segments. An unidimensional scaling of the preference data reveals that subway, auto-driver, and subscription bus modes are preferred most, whereas motorcycle and bicycle least. The other modes of express bus, taxt, auto-passenger, bus and walk rank intermediately. An examination of how preference orders vary among modal groups hints that users align their stated attitudes to their choice in order to reduce cognitive dissonance.

  • PDF

Multidimensional Scaling of Asymmetric Distance Matrices

  • Huh, Myung-Hoe;Lee, Yong-Goo
    • 응용통계연구
    • /
    • 제25권4호
    • /
    • pp.613-620
    • /
    • 2012
  • In most cases of multidimensional scaling(MDS), the distances or dissimilarities among units are assumed to be symmetric. Thus, it is not an easy task to deal with asymmetric distances. Asymmetric MDS developed so far face difficulties in the interpretation of results. This study proposes a much simpler asymmetric MDS, that utilizes the notion of "altitude". The analogy arises in mountaineering: It is easier (more difficult) to move from the higher (lower) point to the lower (higher). The idea is formulated as a quantification problem, in which the disparity of distances is maximally related to the altitude difference. The proposed method is demonstrated in three examples, in which the altitudes are visualized by rainbow colors to ease the interpretability of users.