Fault Prediction Using Statistical and Machine Learning Methods for Improving Software Quality

  • Malhotra, Ruchika ;
  • Jain, Ankita
  • Received : 2011.05.16
  • Accepted : 2012.02.13
  • Published : 2012.06.30


An understanding of quality attributes is relevant for the software organization to deliver high software reliability. An empirical assessment of metrics to predict the quality attributes is essential in order to gain insight about the quality of software in the early phases of software development and to ensure corrective actions. In this paper, we predict a model to estimate fault proneness using Object Oriented CK metrics and QMOOD metrics. We apply one statistical method and six machine learning methods to predict the models. The proposed models are validated using dataset collected from Open Source software. The results are analyzed using Area Under the Curve (AUC) obtained from Receiver Operating Characteristics (ROC) analysis. The results show that the model predicted using the random forest and bagging methods outperformed all the other models. Hence, based on these results it is reasonable to claim that quality models have a significant relevance with Object Oriented metrics and that machine learning methods have a comparable performance with statistical methods.


Empirical Validation;Object Oriented;Receiver Operating Characteristics;Statistical Methods;Machine Learning;Fault Prediction


  1. L. Briand, W. Daly and J. Wust, "Exploring the relationships between design measures and software quality," Journal of Systems and Software, Vol.51, No.3, 2000, pp.245-273.
  2. G. Pai, "Empirical analysis of software fault content and fault proneness using Bayesian methods," IEEE Transactions on Software Eng., Vol.33,No.10,2007, pp.675-686.
  3. K. K. Aggarwal, Y. Singh, A. Kaur, and R. Malhotra, "Empirical analysis for investigating the effect of object-oriented metrics on fault proneness: A replicated case study," Software Process: Improvement and Practice, Vol.16,No.1,2009,pp.39-62.
  4. Y. Singh, A. Kaur, and R. Malhotra, "Empirical vlidation of object-oriented metrics for predicting fault proneness models," Software Quality Journal, Vol.18,No.1, 2010,pp.3-35.
  5. S. Chidamber and C. Kemerer, "A Metrics Suite for Object-Oriented Design," IEEE Trans. Soft Ware Eng., Vol.20, No.6, 1994, pp.476-493.
  6. L.Briand, P. Devanbu, W. Melo, "An investigation into coupling Measures for C++," In Proceedings of the 19th International Conference on Software Engineering.
  7. J. Bansiya and C. Davis, "A Hierarchical Model for Object-Oriented Design Quality Assessment," IEEE Trans. Software Eng., Vol.28, No.1, 2002, pp.4-17.
  8. F. Brito e Abreu and W. Melo, "Evaluating the Impact of Object-Oriented Design on Software Quality," Proceedings Third Int'l Software Metrics Symposium, 1996, pp.90-99.
  9. M.Lorenz and J. Kidd, "Object-Oriented Software Metrics," Prentice-Hall, 1994.
  10. W. Li and W. Henry, "Object-Poiented Metrics that Predict Maintainability," In Journal of Software and Sytems, 1993, Vol.23, pp.111-122.
  11. M.Cartwright and M. Shepperd, "An empirical investigation of an object-oriented software system," IEEE Transactions on Software Engineering, Vol.26, No.8,1999, pp.786-796.
  12. T.Gyimothy, R. Ferenc, and I.Siket, "Empirical validation of object-oriented metrics on open source software for fault prediction," IEEE Transactions on Software Engineering, Vol.31, No.10, 2005, pp.897-910.
  13. S. Kanmani, V.R. Uthariaraj, V. Sankaranarayanan, P. Thambidurai, "Object-oriented software prediction using neural networks," Information and Software Technology, Vol.49, 2007, pp.482-492.
  14. I. Gondra, "Applying machine learning to software fault-proneness prediction," The Journal of Systems and Software," Vol.81, 2008, pp.186-195.
  15. Promise.
  16. K. El Emam, S. Benlarbi, N. Goel, and S. Rai, "A validation of object-oriented metrics," NRC Technical report ERB-1063,1999.
  17. C. Catal and B. Diri, "A systematic review of software fault prediction studies," Expert Systems with Applications Vol.36, 2009, pp 7346-7354.
  18. N. Ohlsson, M. Zhao and M. Helander, M, "Application of multivariate analysis for soft ware fault prediction," Software Quality Journal, Vol.7, 1998,pp.51-66.
  19. T.M. Khoshgoftaar, E.B. Allen, K.S. Kalaichelvan and N. Goel, "Early quality prediction: a case study in telecommunications," IEEE Software, Vol.13, No.1, 1996, pp.65-71.
  20. K.E. Emam and W. Melo, "The Prediction of Faulty Classes Using Object-Oriented Design Metrics," Technical report: NRC 43609, 1999.
  21. M.H. Tang, M.H. Kao, and M.H. Chen , "An empirical study on object-oriented metrics," In Proceedings of Metrics, 242-249.
  22. L. Briand, J. Wuest, S. Ikonomovski, and H. Lounis, "A comprehensive Investigation of Quality Factors in Object-Oriented Designs: An Industrial Case Study," International Software Engineering Research Network, technical report ISERN-98-29, 1998.
  23. K. El Emam, S. Benlarbi, N. Goel, and S. Rai, "The confounding effect of class size on the validity of objectoriented metrics," IEEE Transactions on Software Engineering, Vol.27, No.7, 2001, pp.630-650.
  24. L. Briand, J. Wust, J and H. Lounis, "Replicated Case Studies for Investigating Quality Factors in Object-Oriented Designs," Empirical Software Engineering. International Journal (Toronto,Ont.), Vol.6, No.1, 2001, pp.11-58.
  25. P. Yu, T. Systa, and H. Muller, "Predicting fault-proneness using OO metrics: An industrial case study," In Proceedings of Sixth European Conference on Software Maintenance and Reengineering, Budapest, Hungary, 2002, pp.99-107.
  26. Y. Zhou, and H. Leung, H, "Empirical Analysis of Object-Oriented Design Metrics for Predicting High and Low Severity Faults," IEEE Transactions on Software Engineering, Vol.32, No.10, 2006, pp.771-789.
  27. N. Fenton and N. Ohlsson, "Quantitative analysis of faults and failures in a complex software system," IEEE Transactions on Software Engineering, Vol.26, No.8, 2000, pp.797-814.
  28. R. Shatnawi and W. Li, "The effectiveness of software metrics in identifying error-prone classes in post release software evolution process," The Journal of Systems and Software,Vol.81, 2008,pp.1868-1882.
  29. R. Malhotra and Y. Singh, "On the Applicability of Machine Learning Techniques for ObjectOriented Software Fault Prediction," Software Engineering: An International Journal, Vol.1,No.1, 2011, pp.24-37.
  30. ckjm download :
  31. V. Basili, L. Briand and W.Melo, "A validation of object-oriented design metrics as quality Indicators," IEEE Transactions on Software Engineering, Vol.22, No.10,1996, pp.751-761.
  32. D. Hosmer and S. Lemeshow, Applied logistic regression. New York: Wiley,1989.
  33. C.M. Bishop, "Neural Networks for Pattern Recognition," Oxford, U.K. : Claredon Press, 1995.
  34. J.R. Quinlan, C4.5 : Programs for Machine Learning. Morgan Kaufmann, 1993.
  35. A. Porter and R. Selly, "Empirically guided Software Devlopment using Metric-Based Classification Trees," IEEE Software, Vol.7, No.2, 1990, pp.46-54.
  36. F. Xing, P. Gua, and M.R. Lyu, "A novel method for early software quality prediction based on support vector machine," In: Proceedings of IEEE International Conference on Software Reliability Engineering, 2005, pp.213-222.
  37. Y. Freund, R. Schapire, "Experiments with a new boosting algorithm," In: Thirteenth International Conference on Machine Learning, San Francisco, 1996, pp.148-156.
  38. J. Friedman, T. Hastie, and R. Tibshirani, "Additive Logistic Regression: a Statistical View of Boosting," Stanford University.
  39. Weka. Available:
  40. Y. Freund and R.E. Schapire, "A Short Introduction to Boosting," Journal of Japanese Society for Artificial Intelligence, Vol.14, No.5, 1999, pp.771-780.
  41. L.Breiman, "Bagging predictors," Machine Learning, Vol.24, 1996, pp.123-140.
  42. R. Malhotra and A.Jain, "Software Effort Prediction using Statistical and Machine Learning Me thod," International Journal of Advanced Computer Science and Applications , Vol.2, No.1, 2011.
  43. M.Stone, "Cross-validatory choice and assessment of statistical predictions," Journal Royal Stat. Soc., Vol.36, 1974, pp.111-147.
  44. M.English, C.Exton, I.Rigon and B.Clearyp, "Fault Detection and Prediction in an open source Software project," Proceeding: PROMISE '09 Proceedings of the 5th International conference on Predictor Models in Software Engineering.
  45. H.Olague, L. Etzkorn, S. Gholston, and S.Quattlebaum, "Empirical validation of three software metrics suites to predict fault-proneness of object-oriented classes developed using highly iterative or agile software development processes," IEEE Transactions on Software Engineering, Vol.33, No.8,2007, pp.402-419.
  46. Y.Zhou, B. Xu and H. Leung, "On the ability of complexity metrics to predict fault-prone classes in object - oriented systems," The journal of Systems and Software, Vol.83, 2010,pp.660-674.
  47. R. Burrows, F.C. Ferrari, O.A.L. Lemos, A. Garcia and F. Taiani, "The impact of Coupling on the fault- Proneness of Aspect-oriented Programs:An Empirical Study," IEEE 21st Internati onal Symposium on Software Reliability Engineering, 2010.

Cited by

  1. A systematic review of machine learning techniques for software fault prediction vol.27, 2015,
  2. Prediction-based proactive load balancing approach through VM migration vol.32, pp.4, 2016,
  3. Botnet detection using graph-based feature clustering vol.4, pp.1, 2017,
  4. Botnet Detection Using Support Vector Machines with Artificial Fish Swarm Algorithm vol.2014, 2014,
  5. An empirical framework for defect prediction using machine learning techniques with Android software vol.49, 2016,
  6. A study on software fault prediction techniques 2017,
  7. Empirical evidence on the link between object-oriented measures and external quality attributes: a systematic literature review vol.20, pp.3, 2015,
  8. A parallel algorithm for robust fault detection in semiconductor manufacturing processes vol.17, pp.3, 2014,
  9. An empirical analysis of the effectiveness of software metrics and fault prediction model for identifying faulty classes vol.53, 2017,
  10. Effective fault prediction model developed using Least Square Support Vector Machine (LSSVM) 2017,
  11. Intelligent failure prediction models for scientific workflows vol.42, pp.3, 2015,
  12. Systems performance prediction using requirements quality attributes classification vol.21, pp.4, 2016,
  13. An empirical evaluation of classification algorithms for fault prediction in open source projects 2016,
  14. On the application of search-based techniques for software engineering predictive modeling: A systematic review and future directions vol.32, 2017,
  15. Empirical analysis of search based algorithms to identify change prone classes of open source software vol.47, 2017,
  16. Visual stereo matching combined with intuitive transition of pixel values vol.75, pp.23, 2016,
  17. Empirical validation for effectiveness of fault prediction technique based on cost analysis framework 2016,
  18. Prediction of defect severity by mining software project reports vol.8, pp.2, 2017,
  20. Investigating the Effect of Sensitivity and Severity Analysis on Fault Proneness in Open Source Software vol.8, pp.1, 2017,