DOI QR코드

DOI QR Code

Defect Severity-based Ensemble Model using FCM

FCM을 적용한 결함심각도 기반 앙상블 모델

  • 이나영 (강릉원주대학교 컴퓨터공학과) ;
  • 권기태 (강릉원주대학교 컴퓨터공학과)
  • Received : 2016.07.18
  • Accepted : 2016.09.29
  • Published : 2016.12.15

Abstract

Software defect prediction is an important factor in efficient project management and success. The severity of the defect usually determines the degree to which the project is affected. However, existing studies focus only on the presence or absence of a defect and not the severity of defect. In this study, we proposed an ensemble model using FCM based on defect severity. The severity of the defect of NASA data set's PC4 was reclassified. To select the input column that affected the severity of the defect, we extracted the important defect factor of the data set using Random Forest (RF). We evaluated the performance of the model by changing the parameters in the 10-fold cross-validation. The evaluation results were as follows. First, defect severities were reclassified from 58, 40, 80 to 30, 20, 128. Second, BRANCH_COUNT was an important input column for the degree of severity in terms of accuracy and node impurities. Third, smaller tree number led to more variables for good performance.

소프트웨어 결함 예측은 프로젝트의 효율적인 관리와 성공에 있어 중요한 요소이다. 이 결함은 심각도에 따라 프로젝트에 영향을 미치는 정도가 다르다. 그러나 기존 연구는 결함 유무만 관심을 두고 심각도를 고려하지 않는다. 본 논문에서는 소프트웨어 관리 효율과 품질 향상을 위해 FCM을 적용한 결함 심각도 기반 앙상블 모델을 제안한다. 제안된 모델은 FCM으로 NASA PC4의 결함심각도를 재분류한다. 그리고 RF(Random Forest)로 심각도에 영향을 주는 입력 column을 선별하여 데이터 핵심 결함 요인을 추출한다. 또한 10-fold 교차검증으로 파라미터를 변경해 모델 성능을 평가한다. 실험 결과는 다음과 같다. 첫째, 결함심각도가 58,40,80에서 30,20,128로 재분류되었다. 둘째, 심각도에 영향을 주는 중요한 입력 column은 정확도와 노드 불순도 측면에서 BRANCH_COUNT였다. 셋째, 성능평가는 트리수가 작고 고려할 변수가 많을수록 좋은 성능을 보였다.

Keywords

References

  1. Ali Bou Nassifa, Danny Hob, Luiz Fernando Capretza, "Towards an early software estimation using log-linear regression and a multilayer perceptron model," The Journal of Systems and Software, 86, pp. 144-160, 2013. https://doi.org/10.1016/j.jss.2012.07.050
  2. IEEE Computer Society, IEEE Standard Classification for Software Anomalies, IEEE Std. 1044, 2009.
  3. Tracy Halla, Sarah Beechamb, David Bowesc, David Grayc, Steve Counsella, "A systematic literature review on fault prediction performance in software engineering," Software Engineering, IEEE Transactions on 38.6, 1276-1304, 2012. https://doi.org/10.1109/TSE.2011.103
  4. Swapna M. Patil, R.V. Argiddi, "Study Of Fault Prediction Using Quad Tree Based K-Means Algorithm And Quad Tree Based EM Algorithm," International Journal of Engineering And Computer Science, Vol. 3 Issue 11, November, 9193-9196, 2014.
  5. Jui-Sheng Chou, Chih-Fong Tsai, Anh-Duc Pham, Yu-Hsin Lu, "Machine learning in concrete strength simulations," Construction and Building Materials 73, 771-780, 2014. https://doi.org/10.1016/j.conbuildmat.2014.09.054
  6. Bezdec, J.C., "Pattern Recognition with Fuzzy Objective Function Algorithms," Plenum Press, New York, 1981.
  7. E. S. Hong, "Sortware Quality Prediction based on Defect Severity," Journal of the Korea Society of Computer and Information, Vol. 20, No. 5, pp. 73-81, 2015.
  8. Riju Kaushal, Sunil Khullar, "PSO based neural network approaches for prediction of level of severity of faults in nasa's public domain defect dataset," International Journal of Information Technology and Knowledge Management July-December, Vol. 5, No. 2, pp. 453-457, 2012.