DOI QR코드

DOI QR Code

An Intelligent Intrusion Detection Model Based on Support Vector Machines and the Classification Threshold Optimization for Considering the Asymmetric Error Cost

비대칭 오류비용을 고려한 분류기준값 최적화와 SVM에 기반한 지능형 침입탐지모형

  • Lee, Hyeon-Uk (Graduate School of Business IT, Kookmin University) ;
  • Ahn, Hyun-Chul (School of Management Information Systems, Kookmin University)
  • 이현욱 (국민대학교 비즈니스IT전문대학원) ;
  • 안현철 (국민대학교 경영대학 경영정보학부)
  • Received : 2011.11.16
  • Accepted : 2011.12.12
  • Published : 2011.12.31

Abstract

As the Internet use explodes recently, the malicious attacks and hacking for a system connected to network occur frequently. This means the fatal damage can be caused by these intrusions in the government agency, public office, and company operating various systems. For such reasons, there are growing interests and demand about the intrusion detection systems (IDS)-the security systems for detecting, identifying and responding to unauthorized or abnormal activities appropriately. The intrusion detection models that have been applied in conventional IDS are generally designed by modeling the experts' implicit knowledge on the network intrusions or the hackers' abnormal behaviors. These kinds of intrusion detection models perform well under the normal situations. However, they show poor performance when they meet a new or unknown pattern of the network attacks. For this reason, several recent studies try to adopt various artificial intelligence techniques, which can proactively respond to the unknown threats. Especially, artificial neural networks (ANNs) have popularly been applied in the prior studies because of its superior prediction accuracy. However, ANNs have some intrinsic limitations such as the risk of overfitting, the requirement of the large sample size, and the lack of understanding the prediction process (i.e. black box theory). As a result, the most recent studies on IDS have started to adopt support vector machine (SVM), the classification technique that is more stable and powerful compared to ANNs. SVM is known as a relatively high predictive power and generalization capability. Under this background, this study proposes a novel intelligent intrusion detection model that uses SVM as the classification model in order to improve the predictive ability of IDS. Also, our model is designed to consider the asymmetric error cost by optimizing the classification threshold. Generally, there are two common forms of errors in intrusion detection. The first error type is the False-Positive Error (FPE). In the case of FPE, the wrong judgment on it may result in the unnecessary fixation. The second error type is the False-Negative Error (FNE) that mainly misjudges the malware of the program as normal. Compared to FPE, FNE is more fatal. Thus, when considering total cost of misclassification in IDS, it is more reasonable to assign heavier weights on FNE rather than FPE. Therefore, we designed our proposed intrusion detection model to optimize the classification threshold in order to minimize the total misclassification cost. In this case, conventional SVM cannot be applied because it is designed to generate discrete output (i.e. a class). To resolve this problem, we used the revised SVM technique proposed by Platt(2000), which is able to generate the probability estimate. To validate the practical applicability of our model, we applied it to the real-world dataset for network intrusion detection. The experimental dataset was collected from the IDS sensor of an official institution in Korea from January to June 2010. We collected 15,000 log data in total, and selected 1,000 samples from them by using random sampling method. In addition, the SVM model was compared with the logistic regression (LOGIT), decision trees (DT), and ANN to confirm the superiority of the proposed model. LOGIT and DT was experimented using PASW Statistics v18.0, and ANN was experimented using Neuroshell 4.0. For SVM, LIBSVM v2.90-a freeware for training SVM classifier-was used. Empirical results showed that our proposed model based on SVM outperformed all the other comparative models in detecting network intrusions from the accuracy perspective. They also showed that our model reduced the total misclassification cost compared to the ANN-based intrusion detection model. As a result, it is expected that the intrusion detection model proposed in this paper would not only enhance the performance of IDS, but also lead to better management of FNE.

Acknowledgement

Supported by : 국민대학교

References

  1. 김선웅, 안현철, "Support Vector Machines와 유전자 알고리즘을 이용한 지능형 트레이딩 시스템 개발", 지능정보연구, 16권 1호(2010), 71 -92.
  2. 김성준, "의사결정나무에서 다중 목표변수를 고려한", 한국퍼지 및 지능시스템학회 추계학술대회 학술발표논문집, (2003), 243-46.
  3. 김수영, "다변량 판별분석과 로지스틱 회귀분석, 인공신경망 분석을 이용한 호텔 도산 예측", 한국관광학회지, 30권 2호(2006), 53-75.
  4. 김한성, 권영희, 차성덕, "SVM 기반의 신분위장 탐 지기법", 정보보호학회논문지, 13권 5호(2003), 91-104.
  5. 박성갑, 통합보안관리를 위한 네트워크 기반의 국방 침입방지 시스템에 관한 연구, 석사학위논문, 연세대, 2005.
  6. 박정민, Support Vector Machine 을 이용한 기업 부도예측, 석사학위논문, 한국과학기술원, 2003.
  7. 손태식, 서정우, 서정택, 문종섭, 최홍민, "Support Vector Machine 기반 TCP/IP 헤더의 은닉채널 탐지에 관한 연구", 정보보호학회논문지, 14권 1호(2004), 35-45.
  8. 심홍기, 김승권, "인공신경망을 이용한 대대전투간 작전지속능력 예측", 지능정보연구, 14권 3호 (2008), 25-39.
  9. 안현철, 데이터마이닝을 활용한 인터넷 쇼핑몰의 상품 추천 시스템 개발, 석사학위논문, 한국과학기술원, 2002.
  10. 안현철, 김경재, 한인구, "효과적인 고객관계관리 를 위한 사례기반추론 동시 최적화 모형", 지능정보연구, 11권 2호(2005a), 175-195.
  11. 안현철, 김경재, 한인구, "Support Vector Machine 을 이용한 고객구매예측모형", 지능정보연구, 11권 3호(2005b), 69-81.
  12. 안현철, 이형용, "투자 의사결정 지원을 위한 유전자 알고리즘 기반의 다중 인공지능 기법 결합 모형", e-비즈니스연구, 10권 1호(2009), 267 -288.
  13. 엄남경, 우성희, 이상호, "SVM과 의사결정트리를 이용한 혼합형 침입탐지 모델", 정보처리학회 논문지, 14권 1호(2007), 1-6.
  14. 엄남경, 우성희, 이상호, "SVM과 데이터마이닝을 이용한 혼합형 침입탐지 모델", 한국퍼지 및 지능시스템학회 춘계학술대회 학술발표논문집, 16권 1호(2006), 283-286
  15. 이수용, 이일병, "Fuzzy 이론과 SVM을 이용한 KOSPI 200 지수 패턴분류기", 한국증권학회 제4차 정기학술발표회논문집, (2002), 787- 809.
  16. 이승태, 김성신, "의사결정나무를 이용한 생물의 행동 패턴 구분과 인식", 한국퍼지 및 지능시스템학회 추계학술대회 학술발표논문집, 15 권 2호(2005), 225-228.
  17. 이영찬, "인공신경망과 Support Vector Machine 의 기업부도예측 성과 비교", 한국지능정보시스템학회 춘계학술대회논문집, (2004), 211- 218.
  18. 이형용, "한국 주가지수 등락 예측을 위한 유전자 알고리즘 기반 인공지능 예측기법 결합모형", Enture Journal of Information Technology, 7권 2호(2008), 33-43.
  19. 홍태호, 김진완, "데이터마이닝의 비대칭 오류비용을 이용한 지능형 침입탐지 시스템 개발", 정보시스템연구, 15권 4호(2006), 211-224.
  20. 홍태호, 김진완, "침입탐지 시스템이 비대칭 오류비용을 이용한 데이터마이닝의 적용전략", 한국지능정보시스템학회 추계학술대회논문집, (2005), 251-257.
  21. 홍태호, 김진완, 김유일, "데이터마이닝 기법을 활용한 침입탐지 시스템에 관한 연구", 대한산업공학회/한국경영과학회 춘계학술대회, SA7- 10-SA7-13, 2004.
  22. 홍태호, 신택수, "Using Estimated Probability from Support Vector Machines for Credit Rating in IT Industry", 한국지능정보시스템학회-웹코리아포럼 공동추계정기학술대회, (2005), 509-515.
  23. Berry, M. J. A. and G. Linoff, Data Mining Techniques : For Marketing, Sales and Customer Support, Wiley Computer Publishing, 1997.
  24. Breiman, L., J. Friedman, R Olshen, and C. Stone,Classification and Regression Trees. Champman and Hall, New York, NY, 1984.
  25. Chen, R.-C., K.-F. Cheng, Y.-H. Chen, and C.-F. Hsieh, "Using Rough Set and Support Vector Machine for Network Intrusion Detection System", First Asian Conference on Intelligent Information and Database Systems, (2009), 465-470.
  26. Chen, W.-H., S.-H. Hsu, and H.-P. Shen, "Application of SVM and ANN for intrusion detection", Computer and Operations Research, Vol.32(2005), 2617-2634. https://doi.org/10.1016/j.cor.2004.03.019
  27. Debar, H., M. Becker, and D. Siboni, "A Neural Network Component for an Intrusion Detection System", Proceedings of 1992 IEEE Computer Society Symposium Research in Security and Privacy, (1992), 240-250.
  28. Fletcher, D. and E. Goss, "Forecasting with Neural networks and Application using Bankruptcy Data", Information and Management, Vol.24(1993), 159-167. https://doi.org/10.1016/0378-7206(93)90064-Z
  29. Hearst, M. A., S. T. Dumais, E. Osman, J. Platt, and B. Scholkopf, "Support vector machines", IEEE Intelligent System, Vol.13, No.4 (1998), 18-28 https://doi.org/10.1109/5254.708428
  30. Joachims, T., "Text categorization with support vector machines", Proceedings of the European Conference on Machine Learning(EC ML) , (1998), 137-142.
  31. Joo, D., T. Hong, and I. Han, "The neural network models for IDS based on the asymmetric costs of false negative errors and false positive errors", Expert Systems with Applications, Vol.25(2003), 69-75. https://doi.org/10.1016/S0957-4174(03)00007-1
  32. Kass, G. V., "An Exploratory Technique for Investigating Large Quantities of Categorical Data", Applied Statistics, Vol.29, No.2(1980), 119-127. https://doi.org/10.2307/2986296
  33. Kim, K.-J., "Financial time series forecasting using support vector machines", Neurocomputing, Vol.55, No.1/2(2003), 307-319. https://doi.org/10.1016/S0925-2312(03)00372-2
  34. Kim, K.-J. and W. B. Lee, "Stock market prediction using artificial neural networks with optimal feature transformation", Neural Computing and Applications, Vol.13, No.3(2004), 255-260. https://doi.org/10.1007/s00521-004-0428-x
  35. Lee, S.-Y. and O.-S. Kim, "The network model for Detection Systems based on data mining and the false errors", International Journal of Fuzzy Logic and Intelligent Systems, Vol.6, No.2(2006), 173-177. https://doi.org/10.5391/IJFIS.2006.6.2.173
  36. Osuna, E., R. Freund, and F. Girosi, "Training support vector machines : an application to face detection", Proceedings of Computer Vision and Pattern Recognition, (1997), 130 -136.
  37. Platt, J., "Probabilistic outputs for support vector machines and comparison to regularized likelihood methods", In A. J. Smola, P. L. Bartlett, B. Scholkopf, and D. Schuurmans, editors, Advances in Large Margin Classifiers, Cambridge, MA MIT Press, 2000.
  38. Quinlan, J. R., C4.5 : Programs for Machine Learning, Morgan Kaufmann Publishers, 1993.
  39. Sollich, P., "Bayesian Methods for Support Vector Machines : Evidence and Predictive Class Probabilities", Machine Learning, Vol.46, No. 1/3(2002), 21-52. https://doi.org/10.1023/A:1012489924661
  40. Tay, F. E. J. and L. J. Cao, "Modified support vector machines in financial time series forecasting", Neurocomputing, Vol.48(2002), 847-861. https://doi.org/10.1016/S0925-2312(01)00676-2
  41. Vapnik, V., Statistical Learning Theory, Wiley, 1998.