DOI QR코드

DOI QR Code

Security tendency analysis techniques through machine learning algorithms applications in big data environments

빅데이터 환경에서 기계학습 알고리즘 응용을 통한 보안 성향 분석 기법

  • 최도현 (숭실대학교 컴퓨터학과) ;
  • 박중오 (동양미래대학 정보통신공학과)
  • Received : 2015.07.20
  • Accepted : 2015.09.20
  • Published : 2015.09.28

Abstract

Recently, with the activation of the industry related to the big data, the global security companies have expanded their scopes from structured to unstructured data for the intelligent security threat monitoring and prevention, and they show the trend to utilize the technique of user's tendency analysis for security prevention. This is because the information scope that can be deducted from the existing structured data(Quantify existing available data) analysis is limited. This study is to utilize the analysis of security tendency(Items classified purpose distinction, positive, negative judgment, key analysis of keyword relevance) applying the machine learning algorithm($Na{\ddot{i}}ve$ Bayes, Decision Tree, K-nearest neighbor, Apriori) in the big data environment. Upon the capability analysis, it was confirmed that the security items and specific indexes for the decision of security tendency could be extracted from structured and unstructured data.

최근 빅데이터 관련 산업 활성화에 따라 글로벌 보안 업체들은 지능적인 보안 위협 모니터링과 예방을 위해 분석 데이터의 범위를 정형/비정형 데이터로 확대하고, 보안 예방을 목적으로 사용자의 성향 분석 기법을 활용하려는 추세이다. 이는 기존 정형 데이터(기존 수치화 가능한 자료)의 분석 결과에서 추론할 수 있는 정보의 범위가 한정적이기 때문이다. 본 논문은 빅데이터 환경에서 기계학습 알고리즘($Na{\ddot{i}}ve$ Bayes, Decision Tree, K-nearest neighbor, Apriori)을 효율적으로 응용하여 보안 성향(목적 별 항목 분류, 긍정 부정 판단, 핵심 키워드 연관성 분석)을 분석하는데 활용한다. 성능 분석 결과 보안 성향 판단을 위한 보안항목 및 특정 지표를 정형/비정형 데이터에서 추출할 수 있음을 확인하였다.

Keywords

References

  1. TechNavio, Global Threat Intelligence Security Market 2014-2018, TechNavio (Infiniti Research Ltd.), 2014.
  2. Lee-Moongoo, Bae-Chunsock, Next Generation Convergence Security Framework for Advanced Persistent Threat, Journal of The Institute of Electronics Engineers of Korea, Vol. 50, No. 9, pp 92-99, 2013.
  3. Jeon-Deokjo, Park-Donggue, Analysis Model for Prediction of Cyber Threats by Utilizing Big Data Technology, JKIIT, Vol. 12, No. 5, pp. 81-100, 2014.
  4. Chung-Yongwook, Noh-Bongnam, The weight analysis research in developing a similarity classification problem of malicious code based on attributes, Journal of The Korea Institute of Information Security & Cryptology, Vol. 23, No. 3, pp. 501-514, 2013. https://doi.org/10.13089/JKIISC.2013.23.3.501
  5. Park-Hyeongyu, Situation awareness based intelligent security technology research and development trends, Institute for Information & communications Technology Promotion, p.18, ICT Planning Series Week Technology Trends, 2015.
  6. Im-Sujong, Min-Okgi, Machine Learning Technology Trends for Big Data Processing, Electronics and Telecommunications Research Institute, p.56, Electronics and Telecommunications Trends, 2012.
  7. Mitchell, An Introduction to Genetic Algorithms, p.48, The MIT Press, 1996.
  8. Lee-Jaegu, Lee-Taehoon, Yoon-Sungro, Machine Learning for Big Data analysis, Korean Institute of Communication and Information Sciences, Vol. 31, No. 11, pp 14-26, 2014.
  9. Jang-Byeongtak Next-Generation Machine Learning Technologies, Korean Institute of Information Scientists and Engineers, Vol. 25, No. 3, pp 96-107, 2007.
  10. Steven Bird, Ewan Klein, and Edward Loper, Natural Language Processing with Python, p.201, O'Reilly Media, 2014.
  11. Ethem Alpaydin, Introduction to Machine Learning, second edition, pp 20-32, The MIT Press, 2010.
  12. Mitchell, Tom Michael, The discipline of machine learning, Machine Learning Department technical report, p.6, 2006.
  13. Andrew McCallum, and Kamal Nigam, A comparison of event models for naive bayes text classification, AAAI-98 workshop on learning for text categorization, Vol. 752, pp. 41-48, 1998.
  14. S. B. Kotsiantis, Supervised machine learning: A review of classification techniques, An International Journal of Computing and Informatics, Vol. 31, No. 3, pp. 3-24, 2007.
  15. Blum, Avrim L and Pat Langley. Selection of relevant features and examples in machine learning, Artificial intelligence 97.1, pp. 245-271, 1997. https://doi.org/10.1016/S0004-3702(97)00063-5
  16. Dietterich, Thomas G, An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization, Machine learning 40.2, pp. 139-157, 2000. https://doi.org/10.1023/A:1007607513941
  17. Zhang, Min-Ling, and Zhi-Hua Zhou, ML-KNN: A lazy learning approach to multi-label learning, Pattern recognition 40.7, pp. 2038-2048, 2007. https://doi.org/10.1016/j.patcog.2006.12.019
  18. Jovanoski, Viktor, and Nada Lavrac, Classification rule learning with APRIORI-C, Springer Berlin Heidelberg, pp. 44-51, 2001.