Analyzing Machine Learning Techniques for Fault Prediction Using Web Applications

Malhotra, Ruchika;Sharma, Anjali;

doi:10.3745/JIPS.04.0077

Journal of Information Processing Systems

Volume 14 Issue 3
/
Pages.751-770
/
2018
/
1976-913X(pISSN)
/
2092-805X(eISSN)

Korea Information Processing Society (한국정보처리학회)

DOI QR Code

Analyzing Machine Learning Techniques for Fault Prediction Using Web Applications

Malhotra, Ruchika (Dept. of Computer Science and Engineering, Delhi Technological University) ;
Sharma, Anjali (Dept. of Computer Science and Engineering, Delhi Technological University)

Received : 2016.04.01
Accepted : 2017.04.09
Published : 2018.06.30

https://doi.org/10.3745/JIPS.04.0077 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Web applications are indispensable in the software industry and continuously evolve either meeting a newer criteria and/or including new functionalities. However, despite assuring quality via testing, what hinders a straightforward development is the presence of defects. Several factors contribute to defects and are often minimized at high expense in terms of man-hours. Thus, detection of fault proneness in early phases of software development is important. Therefore, a fault prediction model for identifying fault-prone classes in a web application is highly desired. In this work, we compare 14 machine learning techniques to analyse the relationship between object oriented metrics and fault prediction in web applications. The study is carried out using various releases of Apache Click and Apache Rave datasets. En-route to the predictive analysis, the input basis set for each release is first optimized using filter based correlation feature selection (CFS) method. It is found that the LCOM3, WMC, NPM and DAM metrics are the most significant predictors. The statistical analysis of these metrics also finds good conformity with the CFS evaluation and affirms the role of these metrics in the defect prediction of web applications. The overall predictive ability of different fault prediction models is first ranked using Friedman technique and then statistically compared using Nemenyi post-hoc analysis. The results not only upholds the predictive capability of machine learning models for faulty classes using web applications, but also finds that ensemble algorithms are most appropriate for defect prediction in Apache datasets. Further, we also derive a consensus between the metrics selected by the CFS technique and the statistical analysis of the datasets.

Keywords

References

P. He, B. Li, X. Liu, J. Chen, and Y. Ma, "An empirical study on software defect prediction with a simplified metric set," Information and Software Technology, vol. 59, pp. 170-190, 2015. https://doi.org/10.1016/j.infsof.2014.11.006
F. Rahman, D. Posnett, and P. Devanbu, "Recalling the imprecision of cross-project defect prediction," in Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, Cary, NC, 2012.
S. H. Kan, Metrics and Models in Software Quality Engineering, 2nd ed. Boston, MA: Addison-Wesley, 2003.
S. Lessmann, B. Baesens, C. Mues, and S. Pietsch, "Benchmarking classification models for software defect prediction: a proposed framework and novel findings," IEEE Transactions on Software Engineering, vol. 34, no. 4, pp. 485-496, 2008 https://doi.org/10.1109/TSE.2008.35
T. Gyimothy, R. Ferenc, and I. Siket, "Empirical validation of object-oriented metrics on open source software for fault prediction," IEEE Transactions on Software Engineering, vol. 31, no. 10, pp. 897-910, 2005. https://doi.org/10.1109/TSE.2005.112
T. Menzies, J. Greenwald, and A. Frank, "Data mining static code attributes to learn defect predictors," IEEE Transactions on Software Engineering, vol. 33, no. 1, pp. 2-13, 2007. https://doi.org/10.1109/TSE.2007.256941
J. Demsar, "Statistical comparisons of classifiers over multiple data sets," Journal of Machine Learning Research, vol. 7, pp. 1-30, 2006.
M. D'Ambros, M. Lanza, and R. Robbes, "Evaluating defect prediction approaches: a benchmark and an extensive comparison," Empirical Software Engineering, vol. 17, no. 4-5, pp. 531-577, 2012. https://doi.org/10.1007/s10664-011-9173-9
J. Bansiya, and C. G. Davis, "A hierarchical model for object-oriented design quality assessment," IEEE Transactions on Software Engineering, vol. 28, no. 1, pp. 4-17, 2002. https://doi.org/10.1109/32.979986
S. R. Chidamber and C. F. Kemerer "A metrics suite for object oriented design," IEEE Transactions on Software Engineering, vol. 20, no. 6, pp. 476-493, 1994. https://doi.org/10.1109/32.295895
T. Fawcett, "ROC graphs: notes and practical considerations for researchers," Machine Learning, vol. 31, no. 1, pp. 1-38, 2004.
M. A. Hall, "Correlation-based feature selection for machine learning," Ph.D dissertation, Department of Computer Science, The Waikato University, Hamilton, New Zealand, 1998.
M. A. Hall and L. A. Smith, "Practical feature subset selection for machine learning," in Proceedings of the 21st Australasian Computer Science Conference, Perth, Australia, 1998, pp. 181-191.
M. A. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten, "The WEKA data mining software: an update," ACM SIGKDD Explorations Newsletter, vol. 11, no. 1, pp. 10-18, 2009. https://doi.org/10.1145/1656274.1656278
Q. Song, J. Ni, and G. Wang, "A fast clustering-based feature subset selection algorithm for high-dimensional data," IEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 1, pp. 1-14, 2013. https://doi.org/10.1109/TKDE.2011.181
V. Bolon-Canedo, N. Sanchez-Marono, A. Alonso-Betanzos, J. M. Benitez, and F. Herrera, "A review of microarray datasets and applied feature selection methods," Information Sciences, vol. 282, pp. 111-135, 2014. https://doi.org/10.1016/j.ins.2014.05.042
D. Liu, D. W. Sun, and X. A. Zeng, "Recent advances in wavelength selection techniques for hyperspectral image processing in the food industry," Food and Bioprocess Technology, vol. 7, no. 2, pp. 307-323, 2014. https://doi.org/10.1007/s11947-013-1193-6
A. B. de Carvalho, A. Pozo, S. Vergilio, and A. Lenz, "Predicting fault proneness of classes through a multiobjective particle swarm optimization algorithm," in Proceedings of the 20th IEEE International Conference on Tools with Artificial Intelligence, Dayton, OH, 2008, pp. 387-394.
C. Catal, B. Diri, and B. Ozumut, "An artificial immune system approach for fault prediction in object-oriented Software," in Proceedings of the 2nd International Conference on Dependability of Computer System, Szklarska, Poland, 2007, pp. 238-245.
B. Henderson-Sellers, Object-Oriented Metrics: Measures of Complexity. Upper Saddle River, NJ: Prentice-Hall, 1996.
K. Dejaeger, T. Verbraken, and B. Baesens, "Toward comprehensible software fault prediction models using Bayesian network classifiers," IEEE Transactions on Software Engineering, vol. 39, no. 2, pp. 237-257, 2013. https://doi.org/10.1109/TSE.2012.20
E. Arisholm, L. C. Briand, and E. B. Johannessen, "A systematic and comprehensive investigation of methods to build and evaluate fault prediction models," Journal of System and Software, vol. 83, no. 1, pp. 2-17, 2010. https://doi.org/10.1016/j.jss.2009.06.055
T. J. McCabe, "A complexity measure," IEEE Transactions on Software Engineering, vol. 2, no. 4, pp. 308-320, 1976.
A. H. Watson, D. R. Wallace, and T. J. McCabe, Structured Testing: A Testing Methodology Using the Cyclomatic Complexity Metric. Gaithersburg, MD: National Institute of Standards and Technology, 1996.
M. H. Halstead, Elements of Software Science. New York, NY: North-Holland, 1977.
C. Catal and B. Diri, "A systematic review of software fault prediction studies," Expert Systems Applications, vol. 36, no. 4, pp. 7346-7354, 2009. https://doi.org/10.1016/j.eswa.2008.10.027
J. Kennedy and R. Eberhart, "Particle swarm optimization," in Proceedings of 4th IEEE International Conference on Neural Networks, Perth, Australia, 1995, pp. 1942-1948.
Y. Shi and R. C. Eberhart, "A modified particle swarm optimizer," in Proceedings of IEEE International Conference on Evolutionary Computation, Anchorage, AK, 1998, pp. 69-73.
F. Wilcoxon, "Individual comparisons by ranking methods," Biometrics Bulletin, vol. 1, no. 6, pp. 80-83, 1945. https://doi.org/10.2307/3001968
Y. Singh, A. Kaur, and R. Malhotra, "Application of support vector machine to predict fault prone classes," ACM SIGSOFT Software Engineering Notes, vol. 34, no. 1, pp. 1-6, 2009.
H. M. Olague, L. H. Etzkorn, S. Gholston, and S. Quattlebaum, "Empirical validation of three software metrics suites to predict fault-proneness of object-oriented classes developed using highly iterative or agile software development processes," IEEE Transactions on software Engineering, vol. 33, no. 6, pp. 402-419, 2007. https://doi.org/10.1109/TSE.2007.1015
G. J. Pai and J. B. Dugan, "Empirical analysis of software fault content and fault proneness using Bayesian methods," IEEE Transactions on Software Engineering, vol. 33, no. 10, pp. 675-686, 2007. https://doi.org/10.1109/TSE.2007.70722
S. Kanmani, V. R. Uthariaraj, V. Sankaranarayanan, and P. Thambidurai, "Object-oriented software fault prediction using neural networks," Information and Software Technology, vol. 49, no. 5, pp. 483-492, 2007. https://doi.org/10.1016/j.infsof.2006.07.005
D. Azar and J. Vybihal, "An ant colony optimization algorithm to improve software quality prediction models: case of class stability," Information and Software Technology, vol. 53, no. 4, pp. 388-393, 2011. https://doi.org/10.1016/j.infsof.2010.11.013
S. Di Martino, F. Ferrucci, C. Gravino, and F. Sarro, "A genetic algorithm to configure support vector machines for predicting fault-prone components," in Product-Focused Software Process Improvement. Heidelberg: Springer, 2011, pp. 247-261.
A. Okutan and O. T. Yildiz, "Software defect prediction using Bayesian networks," Empirical Software Engineering, vol. 19, no. 1, pp. 154-181, 2014. https://doi.org/10.1007/s10664-012-9218-8
Y. Zhou, B. Xu, and H. Leung, "On the ability of complexity metrics to predict fault-prone classes in object oriented systems," Journal of Systems and Software, vol. 83, no. 4, pp. 660-674, 2010. https://doi.org/10.1016/j.jss.2009.11.704
Y. Zhou and H. Leung, "Empirical analysis of object oriented design metrics for predicting high severity faults," IEEE Transactions on Software Engineering, vol. 32, no. 10, pp. 771-784, 2006. https://doi.org/10.1109/TSE.2006.102
D. Bowes, T. Hall, M. Harman, Y. Jia, F. Sarro, and F. Wu, "Mutation-aware fault prediction," in Proceedings of the 25th International Symposium on Software Testing and Analysis, Saarbrucken, Germany, 2016, pp. 330-341.
R. Malhotra and R. Raje, "An empirical comparison of machine learning techniques for software defect prediction," in Proceedings of the 8th International Conference on Bioinspired Information and Communications Technologies, Boston, MA, 2014, pp. 320-327.
R. Malhotra, N. Pritam, K. Nagpal, and P. Upmanyu, "Defect collection and reporting system for Git based open source software," in Proceedings of the International Conference on Data Mining and Intelligent Computing, New Delhi, India, 2014, pp. 1-7.
P. A. Devijver and J. Kittler, Pattern Recognition: A Statistical Approach. Englewood Cliffs, NJ: Prentice-Hall, 1982.
A. Bhattacharyya, "On a measure of divergence between two statistical populations defined by their probability distributions," Bulletin of the Calcutta Mathematical Society, vol. 35, pp. 99-109, 1943.
H. Chernoff, "A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations," The Annals of Mathematical Statistics, vol. 23, no. 4, pp. 493-507, 1952. https://doi.org/10.1214/aoms/1177729330
E. Patrick and F. Fisher, "Nonparametric feature selection," IEEE Transactions in Information Theory, vol. 15, no. 4, pp. 577-584, 1969. https://doi.org/10.1109/TIT.1969.1054354
Z. Xu, J. Liu, Z. Yang, G. An, and X. Jia, "The impact of feature selection on defect prediction performance: an empirical comparison," in Proceedings of the 27th International Symposium on Software Reliability Engineering, Ottawa, Canada, 2016, pp. 309-320.
M. Stone, "Cross-validatory choice and assessment of statistical predictions," Journal of the Royal Statistical Society Series B (Methodological), vol. 36, no. 2, pp. 111-114, 1974.
W. Fu, T. Menzies, and X. Shen, "Tuning for software analytics: is it really necessary?," in Information and Software Technology, vol. 76, pp. 135-146, 2016. https://doi.org/10.1016/j.infsof.2016.04.017
F. Sarro, S. Di Martino, F. Ferrucci, and C. Gravino, "A further analysis on the use of genetic algorithm to configure support vector machines for inter-release fault prediction," in Proceedings of the 27th Annual ACM Symposium on Applied Computing, Trento, Italy, 2012, pp. 1215-1220.
A. Arcuri and G. Fraser, "On parameter tuning in search based software engineering," in Search Based Software Engineering. Heidelberg: Springer, 2011, pp. 33-47.
M. Friedman, "A comparison of alternative tests of significance for the problem of m rankings," The Annals of Mathematical Statistics, vol. 11, no. 1, pp. 86-92, 1940. https://doi.org/10.1214/aoms/1177731944
P. B. Nemenyi, "Distribution-free multiple comparisons," Ph.D. dissertation, Princeton University, Princeton, NJ, 1963.
S. C. Misra and V. C. Bhavsar, "Relationships between selected software measures and latent bug-density: guidelines for improving quality," in Computational Science and Its Applications. Heidelberg: Springer, 2003, pp. 724-732.
D. Glasberg, K. El Emam, W. Melo, and N. Madhavji, Validating Object-Oriented Design Metrics on a Commercial Java Application. Ottawa, Canada: National Research Council of Canada, 2000.
S. M. A. Shah, M. Morisio, and M. Torchiano, "An overview of software defect density: a scoping study," in Proceedings of the 19th Asia-Pacific Software Engineering Conference, Hong Kong, China, 2012, pp. 406-415.
B. Ghotra, S. McIntosh, and A. E. Hassan, "Revisiting the impact of classification techniques on the performance of defect prediction models," in Proceedings of the 37th International Conference on Software Engineering, Florence, Italy, 2015, pp. 789-800.

Journal of Information Processing Systems

Analyzing Machine Learning Techniques for Fault Prediction Using Web Applications

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)