Traffic Classification Using Machine Learning Algorithms in Practical Network Monitoring Environments

실제 네트워크 모니터링 환경에서의 ML 알고리즘을 이용한 트래픽 분류

  • 정광본 (포항공과대학교 컴퓨터공학과) ;
  • 최미정 (포항공과대학교 컴퓨터공학과) ;
  • 김명섭 (고려대학교 컴퓨터정보학과) ;
  • 원영준 (포항공과대학교 컴퓨터공학과) ;
  • 홍원기 (포항공과대학교 컴퓨터공학과)
  • Published : 2008.08.31

Abstract

The methodology of classifying traffics is changing from payload based or port based to machine learning based in order to overcome the dynamic changes of application's characteristics. However, current state of traffic classification using machine learning (ML) algorithms is ongoing under the offline environment. Specifically, most of the current works provide results of traffic classification using cross validation as a test method. Also, they show classification results based on traffic flows. However, these traffic classification results are not useful for practical environments of the network traffic monitoring. This paper compares the classification results using cross validation with those of using split validation as the test method. Also, this paper compares the classification results based on flow to those based on bytes. We classify network traffics by using various feature sets and machine learning algorithms such as J48, REPTree, RBFNetwork, Multilayer perceptron, BayesNet, and NaiveBayes. In this paper, we find the best feature sets and the best ML algorithm for classifying traffics using the split validation.

Traffic classification의 방법은 동적으로 변하는 application의 변화에 대처하기 위하여 페이로드나 port를 기반으로 하는 것에서 ML 알고리즘을 기반으로 하는 것으로 변하여 가고 있다. 그러나 현재의 ML 알고리즘을 이용한 traffic classification 연구는 offline 환경에 맞추어 진행되고 있다. 특히, 현재의 기존 연구들은 testing 방법으로 cross validation을 이용하여 traffic classification을 수행하고 있으며, traffic flow를 기반으로 classification 결과를 제시하고 있다. 본 논문에서는 testing방법으로 cross validation과 split validation을 이용했을 때, traffic classification의 정확도 결과를 비교한다. 또한 바이트를 기반으로 한 classification의 결과와 flow를 기반으로 한 classification의 결과를 비교해 본다. 본 논문에서는 J48, REPTree, RBFNetwork, Multilayer perceptron, BayesNet, NaiveBayes와 같은 ML 알고리즘과 다양한 feature set을 이용하여 트래픽을 분류한다. 그리고 split validation을 이용한 traffic classification에 적합한 최적의 ML 알고리즘과 feature set을 제시한다.

Keywords

References

  1. Machine Learning Lab in The University of Waikato, "Weka", [Online] Available: http://www.cs.waikato.ac.nz/ml
  2. Jeffrey Erman, Martin Arlitt, Anirban Mahanti, "Traffic Classification Using Clustering Algorithms", SIGCOMM'06 Workshops, Pisa, Italy, Sep. 2006, pp.281-286
  3. Se‐Hee Han, Myung‐Sup Kim, Hong‐Taek Ju and James W. Hong, "The Architecture of NG‐MON: A Passive Network Monitoring System", IFIP/IEEE International Workshop on Distributed Systems: Operations and Management, LNCS 2506, Montreal, Canada, Oct. 2002, pp.16-27
  4. Jeffrey Erman, Anirban Mahanti, Martin Arlitt, "Internet Traffic Identification using Machine Learning", IEEE Global Telecommunications Conference, California, USA, Nov.-Dec. 2006, pp.1-6
  5. Thuy T. T. Nguyen, Grenville Armitage, "Training on multiple sub‐flows to optimize the use of Machine Learning classifiers in real world IP networks", IEEE Conference on Local Computer Networks, Tampa, Florida, USA, Nov. 2006, pp. 369-376
  6. N. Williams, S. Zander, G. Armitage, "A Preliminary Performance Comparison of Five Machine Learning Algorithms for Practical IP Traffic Flow Classification", SIGCOMM Computer Communication Review, Oct. 2006, pp.7-15.
  7. Andrew W. Moore, Denis Zuev, "Internet Traffic Classification Using Bayesian Analysis Techniques", SIGMETRICS'05, Banff, Alberta, Canada, Jun. 2005, pp.50-60
  8. Junghun Park, Hsiao‐Rong Tyan, and C. C. Jay Kuo, "Inetnet Traffic Classification For Scalable QoS Provision", IEEE International Conference on Multimedia and Expo, Jul. 2006, pp.1221-1224
  9. Junghun Park, Hsiao‐Rong Tyan, C.‐C. Jay Kuo, "GA‐Based Internet Traffic Classification Technique for QoS Provisioning", International Conference on Intelligent Information Hiding and Multimedia, Pasadena, California, USA, Dec. 2006, pp.251-254
  10. Etheral, http://www.ethereal.com
  11. Andrew Moore, Denis Zuev and Michael Crogan, "Discriminators for use in flow‐based classification", Technical Report, Intel Research Cambridge, 2005
  12. Jeffrey Erman, Anirban Mahanti, Martin Arlitt, "Byte Me: A Case for byte accuracy in Traffic Classification", MineNet'07, J San Diego, California, USA, Jun. 2007, pp.35-37
  13. Ethem Alpaydin, "Introduction to Machine Learning", MIT Press, 2004
  14. Artificial Neural Network, http://en.wikipedia.org/wiki/Artificial_neural_n etwork
  15. Lei Yu and Huan Liu, "Feature selection for high-dimensional data: A fast correlation-based filter solution", Proceedings of the International Conference on Machine Learning, Washington, DC, USA, Aug. 2003, pp.856-863
  16. Sebastian Zander, Thuy Nguyen, Grenville Armitage, "Automated Traffic Classification and Application Identification using Machine Learning", Proceedings of the IEEE Conference on Local Computer Networks, Sydney, Australia, Nov. 2005, pp.250-257