DOI QR코드

DOI QR Code

Design of detection method for malicious URL based on Deep Neural Network

뉴럴네트워크 기반에 악성 URL 탐지방법 설계

  • Kwon, Hyun (Department of Electrical Engineering, Korea Military Academy) ;
  • Park, Sangjun (Department of Electrical Engineering, Korea Military Academy) ;
  • Kim, Yongchul (Department of Electrical Engineering, Korea Military Academy)
  • 권현 (육군사관학교 전자공학과) ;
  • 박상준 (육군사관학교 전자공학과) ;
  • 김용철 (육군사관학교 전자공학과)
  • Received : 2021.04.15
  • Accepted : 2021.05.20
  • Published : 2021.05.28

Abstract

Various devices are connected to the Internet, and attacks using the Internet are occurring. Among such attacks, there are attacks that use malicious URLs to make users access to wrong phishing sites or distribute malicious viruses. Therefore, how to detect such malicious URL attacks is one of the important security issues. Among recent deep learning technologies, neural networks are showing good performance in image recognition, speech recognition, and pattern recognition. This neural network can be applied to research that analyzes and detects patterns of malicious URL characteristics. In this paper, performance analysis according to various parameters was performed on a method of detecting malicious URLs using neural networks. In this paper, malicious URL detection performance was analyzed while changing the activation function, learning rate, and neural network structure. The experimental data was crawled by Alexa top 1 million and Whois to build the data, and the machine learning library used TensorFlow. As a result of the experiment, when the number of layers is 4, the learning rate is 0.005, and the number of nodes in each layer is 100, the accuracy of 97.8% and the f1 score of 92.94% are obtained.

사물인터넷 등을 통하여 각종 기기들이 인터넷으로 연결되어 있고 이로 인하여 인터넷을 이용한 공격이 발생하고 있다. 그러한 공격 중 악성 URL를 이용하여 사용자에게 잘못된 피싱 사이트로 접속하게 하거나 악성 바이러스를 유포하는 공격들이 있다. 이러한 악성 URL 공격을 탐지하는 방법은 중요한 보안 이슈 중에 하나이다. 최근 딥러닝 기술 중 뉴럴네트워크는 이미지 인식, 음성 인식, 패턴 인식 등에 좋은 성능을 보여주고 있고 이러한 뉴럴네트워크를 이용하여 악성 URL 탐지하는 분야가 연구되고 있다. 본 논문에서는 뉴럴네트워크를 이용한 악성 URL 탐지 성능을 각 파라미터 및 구조에 따라서 성능을 분석하였다. 뉴럴네트워크의 활성화함수, 학습률, 뉴럴네트워크 모델 등 다양한 요소들에 따른 악성 URL 탐지 성능에 어떠한 영향을 미치는 지 분석하였다. 실험 데이터는 Alexa top 1 million과 Whois에서 크롤링하여 데이터를 구축하였고 머신러닝 라이브러리는 텐서플로우를 사용하였다. 실험결과로 층의 개수가 4개이고 학습률이 0.005이고 각 층마다 노드의 개수가 100개 일 때, 97.8%의 accuracy와 92.94%의 f1 score를 갖는 것을 볼 수 있었다.

Keywords

Acknowledgement

This work was supported by 2021 (21-center-2) research fund of Korea Military Academy (Cyber Warfare Research Center).

References

  1. P. Zhao & S. C. Hoi. (2013, August). Cost-sensitive online active learning with application to malicious URL detection. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 919-927). DOI : 10.1145/2487575.2487647
  2. F. Yu. (2015). Malicious url detection algorithm based on bm pattern matching. International Journal of Security and Its Applications, 9(9), 33-44. https://doi.org/10.14257/ijsia.2015.9.9.04
  3. J. Klensin. (2003). Role of the domain name system (dns). Internet Request for Comments: RFC, 3467.
  4. M. Anthony & P. L. Bartlett. (2009). Neural network learning: Theoretical foundations. cambridge university press.
  5. S. Yadav, A. K. K. Reddy, A. N. Reddy & S. Ranjan. (2012). Detecting algorithmically generated domain-flux attacks with DNS traffic analysis. IEEE/Acm Transactions on Networking, 20(5), 1663-1677. DOI : 10.1109/TNET.2012.2184552
  6. L. Dolberg, J. Francois & T. Engel. (2012). Efficient multidimensional aggregation for large scale monitoring. In 26th Large Installation System Administration Conference ({LISA} 12) (pp. 163-180).
  7. Y. Shi, G. Chen & J. Li. (2018). Malicious domain name detection based on extreme machine learning. Neural Processing Letters, 48(3), 1347-1357. DOI : 10.1007/s11063-017-9666-7
  8. X. Sun, M. Tong, J. Yang, L. Xinran & L. Heng. (2019). Hindom: A robust malicious domain detection system based on heterogeneous information network with transductive classification. In 22nd International Symposium on Research in Attacks, Intrusions and Defenses ({RAID} 2019) (pp. 399-412).
  9. L. Bilge, S. Sen, D. Balzarotti, E. Kirda & C. Kruegel. (2014). Exposure: A passive dns analysis service to detect and report malicious domains. ACM Transactions on Information and System Security (TISSEC), 16(4), 1-28. DOI : 10.1145/2584679
  10. B. Rahbarinia, R. Perdisci & M. Antonakakis. (2016). Efficient and accurate behavior-based tracking of malware-control domains in large ISP networks. ACM Transactions on Privacy and Security (TOPS), 19(2), 1-31. DOI : 10.1145/2960409
  11. J. Yuan, G. Chen, S. Tian & X. Pei. (2021). Malicious URL Detection Based on a Parallel Neural Joint Model. IEEE Access, 9, 9464-9472. DOI : 10.1109/ACCESS.2021.3049625.
  12. R. Patgiri, A. Biswas & S. Nayak. (2021). deepBF: Malicious URL detection using Learned Bloom Filter and Evolutionary Deep Learning. arXiv preprint arXiv:2103.12544.
  13. B. M. Kim, Y. W. Han, G. Y. Kim, Y. B. Kim & H. J. Kim. (2020). Development of Rule-Based Malicious URL Detection Library Considering User Experiences. Journal of the Korea Institute of Information Security & Cryptology, 30(3), 481-491. DOI : 10.13089/JKIISC.2020.30.3.481
  14. D. F. Specht. (1990). Probabilistic neural networks. Neural networks, 3(1), 109-118. https://doi.org/10.1016/0893-6080(90)90049-Q
  15. D. M. Kline & V. L. Berardi. (2005). Revisiting squared-error and cross-entropy functions for training neural network classifiers. Neural Computing & Applications, 14(4), 310-318. DOI : 10.1007/s00521-005-0467-y
  16. S. Du et al. (2019, May). Gradient descent finds global minima of deep neural networks. In International Conference on Machine Learning (pp. 1675-1685). PMLR.
  17. M. Abadi et al. (2016). Tensorflow: A system for large-scale machine learning. In 12th {USENIX} symposium on operating systems design and implementation ({OSDI} 16) (pp. 265-283).
  18. https://www.alexa.com
  19. https://gnso.icann.org
  20. N. Hason, A. Dvir & C. Hajaj. (2020, July). Robust Malicious Domain Detection. In International Symposium on Cyber Security Cryptography and Machine Learning (pp. 45-61). Springer, Cham. DOI : 10.1007/978-3-030-49785-9_4
  21. D. P. Kingma & J. Ba. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
  22. L. Bottou. (2010). Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT'2010 (pp. 177-186). Physica-Verlag HD. DOI : 10.1007/978-3-7908-2604-3_16
  23. A. Creswell et al. (2018). Generative adversarial networks: An overview. IEEE Signal Processing Magazine, 35(1), 53-65. DOI : 10.1109/MSP.2017.2765202
  24. E. Kodirov, T. Xiang & S. Gong. (2017). Semantic autoencoder for zero-shot learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3174-3183).
  25. H. Kwon, H. Yoon & D. Choi. (2019). Restricted evasion attack: Generation of restricted-area adversarial example. IEEE Access, 7, 60908-60919. DOI : 10.1109/ACCESS.2019.2915971
  26. H. Kwon, Y. Kim, H. Yoon & D. Choi. (2018). Random untargeted adversarial example on deep neural network. Symmetry, 10(12), 738. DOI : 10.3390/sym10120738
  27. H. Kwon, H. Yoon & K. W. Park. (2019, November). POSTER: Detecting audio adversarial example through audio modification. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security (pp. 2521-2523). DOI : 10.1145/3319535.3363246
  28. H. Kwon, Y. Kim, K. W. Park, H. Yoon & D. Choi. (2018). Advanced ensemble adversarial example on unknown deep neural network classifiers. IEICE TRANSACTIONS on Information and Systems, 101(10), 2485-2500. DOI : 10.1587/transinf.2018EDP7073
  29. H. Kwon, H. Yoon & K. W. Park. (2020). Acoustic-decoy: Detection of adversarial examples through audio modification on speech recognition system. Neurocomputing, 417, 357-370. DOI : 10.1016/j.neucom.2020.07.101