소프트웨어 공수 예측의 정확성에 대한 이상치 제거의 영향 분석

Analyzing Influence of Outlier Elimination on Accuracy of Software Effort Estimation

  • 발행 : 2008.10.15

초록

정확한 소프트웨어 공수 예측은 소프트웨어 관련 여러 커뮤니티들에서 예전부터 항상 이슈가 되어 왔다. 소프트웨어 공수 예측의 정확도를 향상시키기 위해 지금까지 많은 연구들에서는 데이타 품질이 공수 예측에 중요한 요소들 중 하나임에도 불구하고 이것에 대한 고려 없이 공수 예측 기법들에만 초점을 맞추어 왔다. 본 연구에서는 소프웨어어 공수 예측 기법과 이상치 제거 기법들 사이의 영향 관계를 공수 예측 정확도의 관점에서 실험적으로 살펴본다. 두 개의 프로젝트 데이타들(ISBSG와 국내의 한 금융 조직으로부터 수집된 데이타)에 대해 일반적으로 많이 사용되는 세 가지 공수 예측 기법(최소제곱법, 신경망 네트워크, 그리고 베이지안 네트워크)과 두 가지 이상치 제거 기법(최소절사제곱법과 K-means 클러스터링)을 적용시켜 결과들을 서로 비교해 보고 이상치 제거 기법을 적용하지 않은 결과와도 비교해 본다.

Accurate software effort estimation has always been a challenge for the software industrial and academic software engineering communities. Many studies have focused on effort estimation methods to improve the estimation accuracy of software effort. Although data quality is one of important factors for accurate effort estimation, most of the work has not considered it. In this paper, we investigate the influence of outlier elimination on the accuracy of software effort estimation through empirical studies applying two outlier elimination methods(Least trimmed square regression and K-means clustering) and three effort estimation methods(Least squares regression, Neural network and Bayesian network) associatively. The empirical studies are performed using two industry data sets(the ISBSG Release 9 and the Bank data set which consists of the project data collected from a bank in Korea) with or without outlier elimination.

키워드

참고문헌

  1. C. van Koten and A.R. Gray, "Bayesian Statistical Effort Prediction Models for Data-centred 4GL software development," Information and Software Technology, Vol.48, No.11, pp. 1056-1067, 2006 https://doi.org/10.1016/j.infsof.2006.01.001
  2. A. Gray and S. MacDonell, "Application of Fuzzy Logic to Software Metric Models for Development Effort Estimation," Annual Meeting of the North American Fuzzy Information Processing Society, pp. 394-399, 1997
  3. M. Jorgensen, "Experience With the Accuracy of Software Maintenance Task Effort Prediction Models," IEEE Transactions on Software Engineering, Vol.21, No.8, pp. 674-681, 1995 https://doi.org/10.1109/32.403791
  4. E. Mendes, C. Lokan, R. Harrison, and C. Triggs, "A Replicated Comparison of Cross-company and Within-company Effort Estimation Models using the ISBSG Database," 11th IEEE International Software Metrics Symposium, 2005
  5. P.C. Pendharkar, G.H. Subramanian, and J.A. Rodger, "A Probabilistic Model for Predicting Software Development Effort," IEEE Transactions on Software Engineering, Vol.31, No.7, pp. 615-624, 2005 https://doi.org/10.1109/TSE.2005.75
  6. V.K.Y. Chan and W.E. Wong, "Outlier Elimination in Construction of Software Metric Models," Proceedings of the 22nd ACM Symposium on Applied Computing, pp. 1484-1488, 2007
  7. A.R. Gray and S.G. MacDonell, "A Comparison of Techniques for Developing Predictive Models of Software Metrics," Information and Software Technology, Vol.39, No.6, pp. 425-437, 1997 https://doi.org/10.1016/S0950-5849(96)00006-7
  8. J. Heaton, Introduction to Neural Networks with Java, Chesterfield, MO : Heaton Research, Inc, 2005
  9. S. Chulani, B. Boehm, and B. Steece, "Bayesian Analysis of Empirical Software Engineering Cost Models," IEEE Transactions on Software Engineering, Vol.25, No.4, pp. 573-583, 1999 https://doi.org/10.1109/32.799958
  10. P.J. Rousseeuw, "Multivariate Estimation with High Breakdown Point," Mathematical Statistics and Applications, pp. 283-297, 1985
  11. P.J. Rousseeuw and A.M. Leroy, Robust Regression and Outlier Detection, NY : John Wiley & Sons, Inc, 1987
  12. A.K. Jain, M.N. Murty, and P.J. Flynn, "Data Clustering: A Review," ACM Computing Surveys, Vol.31, No.3, pp. 264-323, 1999 https://doi.org/10.1145/331499.331504
  13. S. Lamrous and M. Taileb, "Divisive Hierarchical K-means," International Conference on Intelligent Agents, Web Technologies and Internet Commerce, 2006
  14. International Software Benchmarking Standards Group, http://www.isbsg.org, 2005
  15. P.J. Rousseeuw, "Least Median Squares Regression," Journal of American Statistical Association, Vol.79, No.388, pp. 871-880, 1984 https://doi.org/10.2307/2288718
  16. Q. Song and M. Shepperd, "A new imputation method for small software project data sets," Journal of Systems and Software, Vol.80, No.1, pp. 51-62, 2007 https://doi.org/10.1016/j.jss.2006.05.003
  17. M.Mendes and A.Pala, "Type I Error Rate and Power of Three Normality Tests," Pakistan Journal of Information and Technology, Vol.2, No.2, pp. 135-139, 2003 https://doi.org/10.3923/itj.2003.135.139
  18. P.J. Rousseeuw and K. van Driessen, "Computing LTS Regression for Large Data Sets," Data Mining and Knowledge Discovery, Vol.12, No.1, pp. 29-45, 2006 https://doi.org/10.1007/s10618-005-0024-4
  19. B. Kitchenham, S.G. MacDonell, L. Pickard, and M.J. Shepperd, "Assessing Prediction Systems," The Information Science Discussion Paper Series, University of Otago, 1999
  20. T. Foss, E. Stensrud, B. Kitchenham, and I. Myrtveit, "A Simulation Study of the Model Evaluation Criterion MMRE," IEEE Transactions on Software Engineering, Vol.29, No.11, pp. 985-995, 2003 https://doi.org/10.1109/TSE.2003.1245300
  21. S.D. Conte, H.E. Dunsmore, and V.Y. Shen, Software Engineering Metrics and Models. Benjamin/ Cummings Publishing Company, 1986