Comparison Study for Data Fusion and Clustering Classification Performances

다구찌 디자인을 이용한 데이터 퓨전 및 군집분석 분류 성능 비교

  • 신형원 (연세대학교 컴퓨터과학.산업시스템공학과) ;
  • 손소영 (연세대학교 컴퓨터과학.산업시스템공학과)
  • Published : 2000.04.01

Abstract

In this paper, we compare the classification performance of both data fusion and clustering algorithms (Data Bagging, Variable Selection Bagging, Parameter Combining, Clustering) to logistic regression in consideration of various characteristics of input data. Four factors used to simulate the logistic model are (1) correlation among input variables (2) variance of observation (3) training data size and (4) input-output function. Since the relationship between input & output is not typically known, we use Taguchi design to improve the practicality of our study results by letting it as a noise factor. Experimental study results indicate the following: Clustering based logistic regression turns out to provide the highest classification accuracy when input variables are weakly correlated and the variance of data is high. When there is high correlation among input variables, variable bagging performs better than logistic regression. When there is strong correlation among input variables and high variance between observations, bagging appears to be marginally better than logistic regression but was not significant.

Keywords