Proceedings of the Korean Operations and Management Science Society Conference (한국경영과학회:학술대회논문집)
- 2000.04a
- /
- Pages.601-604
- /
- 2000
Comparison Study for Data Fusion and Clustering Classification Performances
다구찌 디자인을 이용한 데이터 퓨전 및 군집분석 분류 성능 비교
Abstract
In this paper, we compare the classification performance of both data fusion and clustering algorithms (Data Bagging, Variable Selection Bagging, Parameter Combining, Clustering) to logistic regression in consideration of various characteristics of input data. Four factors used to simulate the logistic model are (1) correlation among input variables (2) variance of observation (3) training data size and (4) input-output function. Since the relationship between input & output is not typically known, we use Taguchi design to improve the practicality of our study results by letting it as a noise factor. Experimental study results indicate the following: Clustering based logistic regression turns out to provide the highest classification accuracy when input variables are weakly correlated and the variance of data is high. When there is high correlation among input variables, variable bagging performs better than logistic regression. When there is strong correlation among input variables and high variance between observations, bagging appears to be marginally better than logistic regression but was not significant.
Keywords