DOI QR코드

DOI QR Code

A Comparative Study of Phishing Websites Classification Based on Classifier Ensemble

  • Tama, Bayu Adhi (Dept. of IT Convergence and Application Engineering, Pukyong National University) ;
  • Rhee, Kyung-Hyune (Dept. of IT Convergence and Application Engineering, Pukyong National University)
  • 투고 : 2018.02.01
  • 심사 : 2018.04.13
  • 발행 : 2018.05.31

초록

Phishing website has become a crucial concern in cyber security applications. It is performed by fraudulently deceiving users with the aim of obtaining their sensitive information such as bank account information, credit card, username, and password. The threat has led to huge losses to online retailers, e-business platform, financial institutions, and to name but a few. One way to build anti-phishing detection mechanism is to construct classification algorithm based on machine learning techniques. The objective of this paper is to compare different classifier ensemble approaches, i.e. random forest, rotation forest, gradient boosted machine, and extreme gradient boosting against single classifiers, i.e. decision tree, classification and regression tree, and credal decision tree in the case of website phishing. Area under ROC curve (AUC) is employed as a performance metric, whilst statistical tests are used as baseline indicator of significance evaluation among classifiers. The paper contributes the existing literature on making a benchmark of classifier ensembles for web phishing detection.

키워드

참고문헌

  1. A.P.W. Group, White Paper: Phishing Response Trends, Technical Report, 2017.
  2. S.C. Jeeva and E.B. Rajsingh, "Intelligent Phishing URL Detection Using Association Rule Mining," Human-Centric Computing and Information Sciences, Vol. 6, No. 1, pp. 1-19, 2016. https://doi.org/10.1186/s13673-016-0060-7
  3. B.A. Tama and K.H. Rhee, "Performance Analysis of Multiple Classifier System in DoS Attack Detection," Proceeding of International Workshop on Information Security Applications, pp. 339-347, 2015.
  4. K.S. Komariah, C. Machbub, A.S. Prihatmanto, and B.-K. Shin, "A Study on Efficient Market Hyphothesis to Predict Exchange Rate Trends Using Sentiment Analysis of Twitter Data," Journal of Korea Multimedia Society, Vol. 19, No. 7, pp. 1107-1115, 2016. https://doi.org/10.9717/kmms.2016.19.7.1107
  5. N.C. Oza and K. Tumer, "Classier Ensembles: Select Real-World Applications," Information Fusion, Vol. 9, No. 1, pp. 4-20, 2008. https://doi.org/10.1016/j.inffus.2007.07.002
  6. D.H. Wolpert, "The Lack of a Priori Distinctions Between Learning Algorithms," Neural Computation, Vol. 8, No. 7, pp. 1341-1390, 1996. https://doi.org/10.1162/neco.1996.8.7.1341
  7. L. Breiman, "Random Forests," Machine Learning, Vol. 45, No. 1, pp. 5-32, 2001. https://doi.org/10.1023/A:1010933404324
  8. J.J. Rodriguez, L.I. Kuncheva, and C.J. Alonso, "Rotation Forest: A New Classifier Ensemble Method," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 28, No. 10, pp. 1619-1630, 2006. https://doi.org/10.1109/TPAMI.2006.211
  9. J.H. Friedman, "Greedy Function Approximation: A Gradient Boosting Machine," The Annals of Statistics, Vol. 29, No. 5, pp. 1189-1232, 2001. https://doi.org/10.1214/aos/1013203451
  10. T. Chen and C. Guestrin, "XGboost: A Scalable Tree Boosting System," Proceeding of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785-794, 2016.
  11. J.R. Quinlan, C4.5: Programs for Machine Learning, Calif : Morgan Kaufmann Publishers, San Mateo, 2014.
  12. W.Y. Loh, "Classification and Regression Trees," Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, Vol. 1, No. 1, pp. 14-23, 2011. https://doi.org/10.1002/widm.8
  13. C.J. Mantas and J. Abellan, "Credal-C4.5: Decision Tree Based on Imprecise Probabilities to Classify Noisy Data," Expert Systems with Applications, Vol. 41, No. 10, pp. 4625-4637, 2014. https://doi.org/10.1016/j.eswa.2014.01.017
  14. R.B. Basnet, S. Mukkamala, and A.H. Sung, "Detection of Phishing Attacks: A Machine Learning Approach," Soft Computing Applications in Industry, Vol. 226, pp. 373-383, 2008.
  15. M. Aburrous, M.A. Hossain, K. Dahal, and F. Thabtah, "Intelligent Phishing Detection System for E-Banking Using Fuzzy Data Mining," Expert Systems with Applications, Vol. 37, No. 12, pp. 7913-7921, 2010. https://doi.org/10.1016/j.eswa.2010.04.044
  16. M. Lichman, UCI Machine Learning Repository, 2013. (accessed Jan., 8, 2018)
  17. F. Thabtah, R.M. Mohammad, and L. Mc Cluskey, "A Dynamic Self-Structuring Neural Network Model to Combat Phishing," Proceeding of Neural Networks 2016 International Joint Conference on IEEE, pp. 4221-4226, 2016.
  18. R.M. Mohammad, F. Thabtah, and L. Mc Cluskey, "Predicting Phishing Websites Based on Self-Structuring Neural Network," Neural Computing and Applications, Vol. 25, No. 2, pp. 443-458, 2014. https://doi.org/10.1007/s00521-013-1490-z
  19. M. Dadkhah, M. Dadkhah, S. Shamshirband, S. Shamshirband, and A.W.A. Wahab "A Hybrid Approach for Phishing Web Site Detection," The Electronic Library, Vol. 34, No. 6, pp. 927-944, 2016. https://doi.org/10.1108/EL-07-2015-0132
  20. R.M. Mohammad, F. Thabtah, and L. Mc Cluskey, "Intelligent Rule-Based Phishing Websites Classification," IET Information Security, Vol. 8, No. 3, pp. 153-160, 2014. https://doi.org/10.1049/iet-ifs.2013.0202
  21. A. Hodzic, J. Kevric, and A. Karadag, "Com-Parison of Machine Learning Techniques in Phishing Website Classification," Proceeding of International Conference on Economic and Social Sciences, pp. 249-256, 2016.
  22. F. Thabtah and N. Abdelhamid, "Deriving Correlated Sets of Website Features for Phishing Detection: A Computational Intelligence Approach," Journal of Information and Knowledge Management, Vol. 15, No. 04, pp. 1-17, 2016.
  23. E.S.M. El-Alfy, "Detection of Phishing Websites Based on Probabilistic Neural Networks and K-Medoids Clustering," The Computer Journal, Vol. 60, No. 12, pp. 1-5, 2017. https://doi.org/10.1093/comjnl/bxw050
  24. K.D. Rajab, "New Hybrid Features Selection Method: A Case Study on Websites Phishing," Security and Communication Networks, Vol. 2017, pp. 1-10, 2017.
  25. R. Quinlan, Data Mining Tools See5 and C5.0, 2004. http://www.rulequest.com/see5-info.html (accessed Jan., 8, 2018)
  26. J. Abellan and S. Moral, "Building Classification Trees Using the Total Uncertainty Criterion," International Journal of Intelligent Systems, Vol. 18, No. 12, pp. 1215-1225, 2003. https://doi.org/10.1002/int.10143
  27. J. Demsar, "Statistical Comparisons of Classifiers Over Multiple Data Sets," Journal of Machine Learning Research, Vol. 7, No. Jan, pp. 1-30, 2006.