DOI QR코드

DOI QR Code

A Classification Model for Attack Mail Detection based on the Authorship Analysis

작성자 분석 기반의 공격 메일 탐지를 위한 분류 모델

  • Received : 2017.10.25
  • Accepted : 2017.11.01
  • Published : 2017.12.31

Abstract

Recently, attackers using malicious code in cyber security have been increased by attaching malicious code to a mail and inducing the user to execute it. Especially, it is dangerous because it is easy to execute by attaching a document type file. The author analysis is a research area that is being studied in NLP (Neutral Language Process) and text mining, and it studies methods of analyzing authors by analyzing text sentences, texts, and documents in a specific language. In case of attack mail, it is created by the attacker. Therefore, by analyzing the contents of the mail and the attached document file and identifying the corresponding author, it is possible to discover more distinctive features from the normal mail and improve the detection accuracy. In this pager, we proposed IADA2(Intelligent Attack mail Detection based on Authorship Analysis) model for attack mail detection. The feature vector that can classify and detect attack mail from the features used in the existing machine learning based spam detection model and the features used in the author analysis of the document and the IADA2 detection model. We have improved the detection models of attack mails by simply detecting term features and extracted features that reflect the sequence characteristics of words by applying n-grams. Result of experiment show that the proposed method improves performance according to feature combinations, feature selection techniques, and appropriate models.

최근 사이버보안에서 악성코드를 이용한 공격은 메일에 악성코드를 첨부하여 이를 사용자가 실행하도록 유도하여 공격을 수행하는 형태가 늘어나고 있다. 특히 문서형태의 파일을 첨부하여 사용자가 쉽게 실행하게 되어 위험하다. 저자 분석은 NLP(Neutral Language Process) 및 텍스트 마이닝 분야에서 연구되어지고 있는 분야이며, 특정 언어로 이루어진 텍스트 문장, 글, 문서를 분석하여 작성한 저자를 분석하는 방법들은 연구하는 분야이다. 공격 메일의 경우 일정 공격자에 의해 작성되어지기 때문에 메일 내용 및 첨부된 문서 파일을 분석하여 해당 저자를 식별하면 정상메일과 더욱 구별된 특징들을 발견할 수 있으며, 탐지 정확도를 향상시킬 수 있다. 본 논문에서는 기존의 기계학습 기반의 스팸메일 탐지 모델에서 사용되는 특징들과 문서의 저자 분석에 사용되는 특징들로부터 공격메일을 분류 및 탐지를 할 수 있는 feature vector 및 이에 적합한 IADA2(Intelligent Attack mail Detection based on Authorship Analysis)탐지 모델을 제안하였다. 단순히 단어 기반의 특징들로 탐지하던 스팸메일 탐지 모델들을 개선하고, n-gram을 적용하여 단어의 시퀀스 특성을 반영한 특징을 추출하였다. 실험결과, 특징의 조합과 특징선택 기법, 적합한 모델들에 따라 성능이 개선됨을 검증할 수 있었으며, 제안하는 모델의 성능의 우수성과 개선 가능성을 확인할 수 있었다.

Keywords

References

  1. Nir Nissim, Aviad Cohen, and Yuval Elovici, "ALDOCX: Detection of Unknown Malicious Microsoft Office Documents Using Designated Active Learning Methods Based on New Structural Feature Extraction Methodology," IEEE Transactions on Information Forensics and Security, vol.12, no.3, pp.631-646, 2017 https://doi.org/10.1109/tifs.2016.2631905
  2. Nathan Rosenblum, Xiaojin Zhu, Barton P. Miller, "Who Wrote This Code? Identifying the Authors of Program Binaries," Proceedings of the 16th European conference on Research in computer security, pp.172-189, 2011 https://doi.org/10.1007/978-3-642-23822-2_10
  3. Rong Zheng, Jiexun Li, Hsinchun Chen, and Zan Huang, "A Framework for Authorship Identification of Online Messages: Writing-Style Features and Classification Techniques," Journal of the Association for Information Science and Technology, vol.57, no.3, pp.378-393, 2006 https://doi.org/10.1002/asi.20316
  4. Ruan, Guangchen, and Ying Tan. "A three-layer back-propagation neural network for spam detection using artificial immune concentration." Soft computing, vol.14, no.2, pp.139-150, 2010 https://doi.org/10.1007/s00500-009-0440-2
  5. Shih, Dong-Her, Hsiu-Sen Chiang, and C. David Yen. "Classification methods in the detection of new malicious emails." Information Sciences, vol.172, no.1, pp.241-261, 2005 https://doi.org/10.1016/j.ins.2004.06.003
  6. Al-Shboul, Bashar Awad, et al. "Voting-based classification for e-mail spam detection." Journal of ICT Research and Applications, vol.10, no.1, pp.26-42, 2016 https://doi.org/10.1016/j.comnet.2008.11.012
  7. De Vel, Olivier. "Mining e-mail authorship." Proceeding of Workshop on Text Mining, ACM International Conference on Knowledge Discovery and Data Mining (KDD'2000), 2000 http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.36.6277
  8. Alsmadi, Izzat, and Ikdam Alhami. "Clustering and classification of email contents." Journal of King Saud University-Computer and Information Sciences vol.27, no.1, pp.46-57, 2015 https://doi.org/10.1016/j.jksuci.2014.03.014
  9. Ahmed Abbasi and Hsinchun Chen, "Applying Authorship Analysis to Extremist-Group Web Forum Messages," IEEE Intelligent Systems, vol.20, no.5, pp.67-75, 2005 https://doi.org/10.1109/mis.2005.81
  10. Smutz, Charles, and Angelos Stavrou. "Malicious PDF detection using metadata and structural features." Proceedings of the 28th annual computer security applications conference. ACM, 2012 https://doi.org/10.1145/2420950.2420987
  11. Digital Bread Crumbs, Focusing Seven Clues To Identifying Who's Behind Advanced Cyber Attack, FireEye Report, RPT.DB.EN-US.082014, 2014
  12. https://www.python.org/
  13. http://scikit-learn.org/stable/
  14. K. Bache and M. Lichman, "UCI machine learning repository," 2013.
  15. Vapnik, V., The nature of statistical learning theory. Springer-Verlag New York, 2000
  16. Altman, N. S., "An introduction to kernel and nearestneighbor nonparametric regression." The American Statistician, vol.46, no.3, pp.175-185, 1992 https://doi.org/10.2307/2685209
  17. Kaminski, B.; Jakubczyk, M.; Szufel, P. "A framework for sensitivity analysis of decision trees". Central European Journal of Operations Research, 2017 https://doi.org/10.4135/9781412971980.n103
  18. Ho, Tin Kam "Random Decision Forests," Proceedings of the 3rd International Conference on Document Analysis and Recognition, pp. 278-282, 1995 https://doi.org/10.1109/icdar.1995.598994
  19. Rosenblatt, Frank. x. Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms. Spartan Books, Washington DC, 1961
  20. Monowar H. Bhuyan, D. K. Bhattacharyya, and J. K. Kalita, "Network Anomaly Detection: Methods, Systems and Tools," IEEE Communications Surveys & Tutorials, Vol.16, No.1, pp.303-336, 2014 https://doi.org/10.1109/surv.2013.052213.00046
  21. Rocha, Anderson, et al. "Authorship attribution for social media forensics." IEEE Transactions on Information Forensics and Security, Vol.12, No.1, pp.5-33, 2017 https://doi.org/10.1109/tifs.2016.2603960
  22. Alsulami, Bander, et al. "Source Code Authorship Attribution Using Long Short-Term Memory Based Networks." European Symposium on Research in Computer Security, 2017 https://doi.org/10.1007/978-3-319-66402-6_6
  23. Singh, Shashi Pal, et al. "Intelligent Text Mining Model for English Language Using Deep Neural Network." International Conference on Information and Communication Technology for Intelligent Systems, Springer, 2017 https://doi.org/10.1007/978-3-319-63645-0_54
  24. Hong, Sung-Sam, Jong-Hwan Kong, and Myung-Mook Han. "The Adaptive SPAM Mail Detection System using Clustering based on Text Mining." KSII Transactions on Internet and Information Systems (TIIS), vol.8, no.6, pp.2186-2196, 2014 https://doi.org/10.3837/tiis.2014.06.022