A Comparative Study of Phishing Websites Classification Based on Classifier Ensemble

Tama, Bayu Adhi;Rhee, Kyung-Hyune;

doi:10.9717/kmms.2018.21.5.617

한국멀티미디어학회논문지 (Journal of Korea Multimedia Society)

제21권5호
/
Pages.617-625
/
2018
/
1229-7771(pISSN)
/
2384-0102(eISSN)

한국멀티미디어학회 (Korea Multimedia Society)

DOI QR Code

A Comparative Study of Phishing Websites Classification Based on Classifier Ensemble

Tama, Bayu Adhi (Dept. of IT Convergence and Application Engineering, Pukyong National University) ;
Rhee, Kyung-Hyune (Dept. of IT Convergence and Application Engineering, Pukyong National University)

투고 : 2018.02.01
심사 : 2018.04.13
발행 : 2018.05.31

https://doi.org/10.9717/kmms.2018.21.5.617 인용 PDF KSCI

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

Phishing website has become a crucial concern in cyber security applications. It is performed by fraudulently deceiving users with the aim of obtaining their sensitive information such as bank account information, credit card, username, and password. The threat has led to huge losses to online retailers, e-business platform, financial institutions, and to name but a few. One way to build anti-phishing detection mechanism is to construct classification algorithm based on machine learning techniques. The objective of this paper is to compare different classifier ensemble approaches, i.e. random forest, rotation forest, gradient boosted machine, and extreme gradient boosting against single classifiers, i.e. decision tree, classification and regression tree, and credal decision tree in the case of website phishing. Area under ROC curve (AUC) is employed as a performance metric, whilst statistical tests are used as baseline indicator of significance evaluation among classifiers. The paper contributes the existing literature on making a benchmark of classifier ensembles for web phishing detection.

키워드

참고문헌

A.P.W. Group, White Paper: Phishing Response Trends, Technical Report, 2017.
S.C. Jeeva and E.B. Rajsingh, "Intelligent Phishing URL Detection Using Association Rule Mining," Human-Centric Computing and Information Sciences, Vol. 6, No. 1, pp. 1-19, 2016. https://doi.org/10.1186/s13673-016-0060-7
B.A. Tama and K.H. Rhee, "Performance Analysis of Multiple Classifier System in DoS Attack Detection," Proceeding of International Workshop on Information Security Applications, pp. 339-347, 2015.
K.S. Komariah, C. Machbub, A.S. Prihatmanto, and B.-K. Shin, "A Study on Efficient Market Hyphothesis to Predict Exchange Rate Trends Using Sentiment Analysis of Twitter Data," Journal of Korea Multimedia Society, Vol. 19, No. 7, pp. 1107-1115, 2016. https://doi.org/10.9717/kmms.2016.19.7.1107
N.C. Oza and K. Tumer, "Classier Ensembles: Select Real-World Applications," Information Fusion, Vol. 9, No. 1, pp. 4-20, 2008. https://doi.org/10.1016/j.inffus.2007.07.002
D.H. Wolpert, "The Lack of a Priori Distinctions Between Learning Algorithms," Neural Computation, Vol. 8, No. 7, pp. 1341-1390, 1996. https://doi.org/10.1162/neco.1996.8.7.1341
L. Breiman, "Random Forests," Machine Learning, Vol. 45, No. 1, pp. 5-32, 2001. https://doi.org/10.1023/A:1010933404324
J.J. Rodriguez, L.I. Kuncheva, and C.J. Alonso, "Rotation Forest: A New Classifier Ensemble Method," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 28, No. 10, pp. 1619-1630, 2006. https://doi.org/10.1109/TPAMI.2006.211
J.H. Friedman, "Greedy Function Approximation: A Gradient Boosting Machine," The Annals of Statistics, Vol. 29, No. 5, pp. 1189-1232, 2001. https://doi.org/10.1214/aos/1013203451
T. Chen and C. Guestrin, "XGboost: A Scalable Tree Boosting System," Proceeding of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785-794, 2016.
J.R. Quinlan, C4.5: Programs for Machine Learning, Calif : Morgan Kaufmann Publishers, San Mateo, 2014.
W.Y. Loh, "Classification and Regression Trees," Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, Vol. 1, No. 1, pp. 14-23, 2011. https://doi.org/10.1002/widm.8
C.J. Mantas and J. Abellan, "Credal-C4.5: Decision Tree Based on Imprecise Probabilities to Classify Noisy Data," Expert Systems with Applications, Vol. 41, No. 10, pp. 4625-4637, 2014. https://doi.org/10.1016/j.eswa.2014.01.017
R.B. Basnet, S. Mukkamala, and A.H. Sung, "Detection of Phishing Attacks: A Machine Learning Approach," Soft Computing Applications in Industry, Vol. 226, pp. 373-383, 2008.
M. Aburrous, M.A. Hossain, K. Dahal, and F. Thabtah, "Intelligent Phishing Detection System for E-Banking Using Fuzzy Data Mining," Expert Systems with Applications, Vol. 37, No. 12, pp. 7913-7921, 2010. https://doi.org/10.1016/j.eswa.2010.04.044
M. Lichman, UCI Machine Learning Repository, 2013. (accessed Jan., 8, 2018)
F. Thabtah, R.M. Mohammad, and L. Mc Cluskey, "A Dynamic Self-Structuring Neural Network Model to Combat Phishing," Proceeding of Neural Networks 2016 International Joint Conference on IEEE, pp. 4221-4226, 2016.
R.M. Mohammad, F. Thabtah, and L. Mc Cluskey, "Predicting Phishing Websites Based on Self-Structuring Neural Network," Neural Computing and Applications, Vol. 25, No. 2, pp. 443-458, 2014. https://doi.org/10.1007/s00521-013-1490-z
M. Dadkhah, M. Dadkhah, S. Shamshirband, S. Shamshirband, and A.W.A. Wahab "A Hybrid Approach for Phishing Web Site Detection," The Electronic Library, Vol. 34, No. 6, pp. 927-944, 2016. https://doi.org/10.1108/EL-07-2015-0132
R.M. Mohammad, F. Thabtah, and L. Mc Cluskey, "Intelligent Rule-Based Phishing Websites Classification," IET Information Security, Vol. 8, No. 3, pp. 153-160, 2014. https://doi.org/10.1049/iet-ifs.2013.0202
A. Hodzic, J. Kevric, and A. Karadag, "Com-Parison of Machine Learning Techniques in Phishing Website Classification," Proceeding of International Conference on Economic and Social Sciences, pp. 249-256, 2016.
F. Thabtah and N. Abdelhamid, "Deriving Correlated Sets of Website Features for Phishing Detection: A Computational Intelligence Approach," Journal of Information and Knowledge Management, Vol. 15, No. 04, pp. 1-17, 2016.
E.S.M. El-Alfy, "Detection of Phishing Websites Based on Probabilistic Neural Networks and K-Medoids Clustering," The Computer Journal, Vol. 60, No. 12, pp. 1-5, 2017. https://doi.org/10.1093/comjnl/bxw050
K.D. Rajab, "New Hybrid Features Selection Method: A Case Study on Websites Phishing," Security and Communication Networks, Vol. 2017, pp. 1-10, 2017.
R. Quinlan, Data Mining Tools See5 and C5.0, 2004. http://www.rulequest.com/see5-info.html (accessed Jan., 8, 2018)
J. Abellan and S. Moral, "Building Classification Trees Using the Total Uncertainty Criterion," International Journal of Intelligent Systems, Vol. 18, No. 12, pp. 1215-1225, 2003. https://doi.org/10.1002/int.10143
J. Demsar, "Statistical Comparisons of Classifiers Over Multiple Data Sets," Journal of Machine Learning Research, Vol. 7, No. Jan, pp. 1-30, 2006.

한국멀티미디어학회논문지 (Journal of Korea Multimedia Society)

A Comparative Study of Phishing Websites Classification Based on Classifier Ensemble

초록

키워드

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)