DOI QR코드

DOI QR Code

A Comparative Study of Estimation by Analogy using Data Mining Techniques

  • Nagpal, Geeta (Dept. of Computer Science and Engineering, National Institute of Technology) ;
  • Uddin, Moin (Delhi Technological University) ;
  • Kaur, Arvinder (University School of IT, Gurugobind Singh Indraprastha University)
  • Received : 2012.04.20
  • Accepted : 2012.10.08
  • Published : 2012.12.31

Abstract

Software Estimations provide an inclusive set of directives for software project developers, project managers, and the management in order to produce more realistic estimates based on deficient, uncertain, and noisy data. A range of estimation models are being explored in the industry, as well as in academia, for research purposes but choosing the best model is quite intricate. Estimation by Analogy (EbA) is a form of case based reasoning, which uses fuzzy logic, grey system theory or machine-learning techniques, etc. for optimization. This research compares the estimation accuracy of some conventional data mining models with a hybrid model. Different data mining models are under consideration, including linear regression models like the ordinary least square and ridge regression, and nonlinear models like neural networks, support vector machines, and multivariate adaptive regression splines, etc. A precise and comprehensible predictive model based on the integration of GRA and regression has been introduced and compared. Empirical results have shown that regression when used with GRA gives outstanding results; indicating that the methodology has great potential and can be used as a candidate approach for software effort estimation.

Keywords

References

  1. J. L. Deng, "Control problems of grey system". System and Control Letters, Vol.1, 1982, pp.288-94. https://doi.org/10.1016/S0167-6911(82)80025-X
  2. J.Deng,. "Introduction to Grey System theory", The Journal of Grey System,Vol.1, No.1, 1989, pp.1-24.
  3. J.Deng, "Grey information space", The Journal of Grey System Vol.1, No.1, 1989, pp.103-117.
  4. J. M. Jou, P. Y.Chen, and J. M.Sun, "The grey prediction search algorithm for block motion estimation". IEEE Transactions on Circuits and Systems for Video Technology, Vol.9, No.6, 1999, pp.843-848. https://doi.org/10.1109/76.785721
  5. S. L. Su, Y. C. Su, and J. F.Huang, "Grey-based power control for DS-CDMA cellular mobile systems". IEEE Transactions on Vehicular Technology, Vol.49, No.6,2000, pp.2081-2088. https://doi.org/10.1109/25.901877
  6. B.C.Jiang, , S. L.Tasi and C. C.Wang, "Machine vision-based gray relational theory applied to IC marking inspection". IEEE Transactions on Semiconductor Manufacturing, Vol.15, No.4, 2002, pp.531-539 https://doi.org/10.1109/TSM.2002.804906
  7. R. C.Luo, T. M.Chen, and K. L. Su, "Target tracking using a hierarchical grey-fuzzy motion decision making method". IEEE Transactions on Systems, Man and Cybernetics, Part A, Vol.31, No.3, 2001, pp.179-186. https://doi.org/10.1109/3468.925657
  8. Y. F.Wang, "On-demand forecasting of stock prices using a real-time predictor". IEEE Transactions on Knowledge and Data Engineering, Vol.15, No.4, 2003, pp.1033-1037. https://doi.org/10.1109/TKDE.2003.1209017
  9. S. J.Huang and C. L.Huang, "Control of an inverted pendulum using grey prediction model". IEEE Transactions on Industry Applications, Vol.36, No.2, 2000, pp.452-458. https://doi.org/10.1109/28.833761
  10. T. Mukhopadhyay, S. Vicinanza and M .J. Prietula, "Examining the feasibility of a case-based reasoning model for software effort estimation", MIS Quarterly, Vol.16, No.2, 1992, pp.155-171. https://doi.org/10.2307/249573
  11. M. J. Shepperd and C.Schofield, "Estimating Software Project Effort Using Analogies", IEEE Transaction on Software Engineering ,Vol.23, 1997, pp.736-743. https://doi.org/10.1109/32.637387
  12. L. Angelis, I. Stamelo, "A simulation tool for efficient analogy based cost estimation," Empirical Software Engineering, Vol.5, 2000, pp.35-68. https://doi.org/10.1023/A:1009897800559
  13. J.W. Keung, B. A. Kitchenham, D. R. Jeffery, "Analogy-X:Providing Statistical Inference to Analogy- Based Software Cost Estimation", IEEE Transactions on Software Engineering, Vol.34, No.4, 2008.
  14. B. Baskeles, B. Turhan, A. Bener, "Software effort estimation using machine learning methods," 22nd international symposium on Computer and information sciences, 2007, pp.1-6.
  15. A. Idri, A. Abran, T. M. Khoshgoftaar, "Estimating Software Project Effort by Analogy Based on Linguistic Values,", Eighth IEEE International Symposium on Software Metrics (METRICS'02), 2002, pp.21.
  16. M. Azzeh, D. Neagu and P. I. Cowling, "Analogy-based software effort estimation using Fuzzy numbers". Journal of Systems and Software, Vol.84, No.2, 2011, pp.270-284 [doi: 10.1016/j.jss.2010. 09.028]
  17. Q.Song, M.Shepperd and C.Mair,"Using Grey Relational Analysis to Predict Software Effort with Small Data Sets". Proceedings of the 11th International Symposium on Software Metrics (METRICS'05), 2005, pp.35-45.
  18. C. J. Hsu and C. Y. Huang, "Comparison and Assessment of Improved Grey Relation Analysis for Software Development Effort Estimation," Proceedings of the 3rd International Conference on Management of Innovation and Technology (ICMIT'06), 2006, pp.663-667.
  19. Q.Song and M. J.Shepperd, "Predicting software project effort: A grey relational analysis based method". Expert Syst. Appl. Vol.38, No.6, 2011, pp.7302-7316. [ doi:10.1016/j.eswa.2010.12.005]
  20. S. J. Huang, N. H. Chiu and L.W. Chen, "Integration of the grey relational analysis with genetic algorithm for software effort estimation". European Journal of operational and research Vol.188, 2007, pp.898-909. [doi:10.1145/1540438.1540440]
  21. M. V. Kosti, N. Mittas, L. Angelis, " DD-EbA: An algorithm for determining the number of neighbors in cost estimation by analogy using distance distributions", 3d Artificial Intelligence Techniques in Software Engineering Workshop,7 October, 2010, Larnaca, Cyprus.
  22. G. Li, J.Ruhe, A. Al-Emran and M.M.Richter, "A flexible method for software effort estimation by analogy", Empirical Software Engineering, Vol.12, No.65, 2007, pp.106. [doi:10.1007/s10664-006- 7552-4]
  23. M. Azzeh, D. Neagu and P. I. Cowling, "Fuzzy grey relational analysis for software effort estimation", Journal of Empirical software Engineering, Vol.15, No.1, 2010. [ doi:10.1007/s10664-009-9113-0]
  24. K. Srinivasan and D. Fisher, "Machine Learning Approaches to Estimating Software Development Effort," IEEE Trans. Software Eng., Vol.21, No.2, 1995, pp.126-137. https://doi.org/10.1109/32.345828
  25. G. Wittig and G. Finnie, "Estimating Software Development Effort with Connectionist Models," Information and Software Technology, Vol.39, No.7, 1997, pp.469-476. https://doi.org/10.1016/S0950-5849(97)00004-9
  26. C. Burgess and M. Lefley, "Can Genetic Programming Improve Software Effort Estimation? A Comparative Evaluation," Information and Software Technology, Vol.43, 2001, pp.863-873. https://doi.org/10.1016/S0950-5849(01)00192-6
  27. Y. Shan, R.J. McKay, C. J. Lokan and D.L. Essam, "Software Project Effort Estimation Using Genetic Programming", IEEE, Available at: http://www.isbsg.org.au, 2002.
  28. A.Idri, A. Abran and T. M. Khoshgoftaar, "Estimating Software Project Effort by Analogy Based on Linguistic Values", Eighth IEEE International Symposium on Software Metrics (METRICS'02), 2002.
  29. X. Huang, L. F. Capretz and J. Ren, "A Neuro Fuzzy Model for Software Cost Estimation", Proceedings of the third International Conference on Quality Software (QSIC'03) 0-7695 2015-4/03, IEEE, 2003.
  30. Z. Chen, T. Menzies, D. Port, and B. Boehm, "Feature Subset Selection Can Improve Software Cost Estimation Accuracy," ACM SIGSOFT Software Eng. Notes, Vol.30, No.4, 2005, pp.1-6
  31. P. Sentas, L. Angelis, I. Stamelos, and G. Bleris, "Software Productivity and Effort Prediction with Ordinal Regression," Information and Software Technology, Vol.47, 2005, pp.17-29. https://doi.org/10.1016/j.infsof.2004.05.001
  32. A. F. Sheta, "Estimation of the COCOMO Model Parameters Using Genetic Algorithms for NASA Software Projects", Journal of Computer Science 2 (2): 118-123, ISSN 1549-3636, 2006. https://doi.org/10.3844/jcssp.2006.118.123
  33. M. Auer, A. Trendowicz, B. Graser, E. Haunschmid, and S. Biffl,"Optimal Project Feature Selection Weigths in Analogy-Based Cost Estimation: Improvement and Limitations," IEEE Trans.Software Eng., Vol.32, No.2, 2006, pp.83-92. https://doi.org/10.1109/TSE.2006.1599418
  34. N.-H. Chiu and S.-J. Huang, "The Adjusted Analogy-Based Software Effort Estimation Based on Similarity Distances," The J. Systems and Software, Vol.80, 2007, pp.628-640. https://doi.org/10.1016/j.jss.2006.06.006
  35. K. Chaudhary, "GA Based Optimization of Software Development Effort Estimation", GA Based Optimization of Software Effort Estimation, IJCSI, Vol.1, 2010, pp.38-40.
  36. M. Azzeh, D. Neagu and P. Cowling, "Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selection Algorithm", PROMISE'08 ,Leipzig, Germany, 2008.
  37. V. Ch, M.K. Hari, T. S. Sethi, B. S. S. Kaushal and A. Sharma, "CPN-A Hybrid Model for Software Cost Estimation", 978-1-4244-9477-4/11, IEEE, 2011.
  38. P. J. Huber, "Robust Estimation of a Location Parameter". Annals of Mathematical Statistics, Vol.35, 1964, pp.73-101. https://doi.org/10.1214/aoms/1177703732
  39. P. J. Huber, Robust regression: Asymptotics, conjectures and Monte Carlo, The Annals of Statistics, Vol.1, 1981, pp.799-821.
  40. D.F.Andrews, P.J. Bickel, F. R. Hampel, P.J. Huber, W.H. Rogers and J. W.Tukey, Robust Estimates of Location: Survey and Advances. Princeton University Press, Princeton, New Jersey,1972.
  41. $MATLAB^{{\circledR}}$ Documentation, http://www.mathworks.com/help/techdoc/
  42. G.A.N.Mbamalu and M.E.El. Hawary, "Load Forecasting via Suboptimal Seasonal Autoregressive models and Iteratively Reweighted Least Squares Estimation" IEEE Transactions on Power Systems,Vol.8, No.1, 1993, pp.343-347. https://doi.org/10.1109/59.221222
  43. V.Verardi, and C. Croux, "Robust regression in Stata", Stata Journal, StataCorp LP, Vol.9, No.3, 2009, pp.439-453.
  44. PROMISE Repository of empirical software engineering data http://promisedata.org/ repository
  45. Dolado JJ (2001) "On the problem of the software cost function". Journal of Information and Software Technology, Vol.43, pp.61-72. https://doi.org/10.1016/S0950-5849(00)00137-3
  46. Mair C, Kadoda G, Lefley M, Phalp K, Schofield C, Shepperd M, Webster S , "An investigation of machine learning based prediction systems". J Syst Software, Vol.53, 2000, pp.23-29. https://doi.org/10.1016/S0164-1212(00)00005-4
  47. G. Nagpal, M. Uddin and A. Kaur, "A hybrid technique using Grey Relational analysis and Regression for Software Effort Estimation using Feature Selection" International Journal of Soft Computing and Engineering (IJSCE), Vol.1, No.6, 2012.

Cited by

  1. How to measure similarity for multiple categorical data sets? vol.74, pp.10, 2015, https://doi.org/10.1007/s11042-014-1914-5
  2. Correcting vindictive bidding behaviors in sponsored search auctions vol.69, pp.3, 2014, https://doi.org/10.1007/s11227-013-1002-z
  3. Mapping discovery modeling and its empirical research for the scientific and technological knowledge concept in unified concept space vol.18, pp.1, 2015, https://doi.org/10.1007/s10586-013-0339-7
  4. Efficient duality-based subsequent matching on time-series data in green computing vol.69, pp.3, 2014, https://doi.org/10.1007/s11227-013-1028-2
  5. Mutagenicity, anticancer activity and blood brain barrier: similarity and dissimilarity of molecular alerts vol.28, pp.5, 2018, https://doi.org/10.1080/15376516.2017.1422579
  6. A hierarchical clustering based method to evaluate reuse of rare earth tailings under cloud computing environment pp.1573-7543, 2019, https://doi.org/10.1007/s10586-017-1654-1