DOI QR코드

DOI QR Code

Measure Correlation Analysis of Network Flow Based On Symmetric Uncertainty

  • Dong, Shi (Computer Science and Engineering, Southeast University) ;
  • Ding, Wei (Computer Science and Engineering, Southeast University) ;
  • Chen, Liang (Key Laboratory of Computer Network and Information Integration, Southeast University)
  • Received : 2011.10.10
  • Accepted : 2012.05.16
  • Published : 2012.06.30

Abstract

In order to improve the accuracy and universality of the flow metric correlation analysis, this paper firstly analyzes the characteristics of Internet flow metrics as random variables, points out the disadvantages of Pearson Correlation Coefficient which is used to measure the correlation between two flow metrics by current researches. Then a method based on Symmetrical Uncertainty is proposed to measure the correlation between two flow metrics, and is extended to measure the correlation among multi-variables. Meanwhile, the simulation and polynomial fitting method are used to reveal the threshold value between different correlation degrees for SU method. The statistical analysis results on the common flow metrics using several traces show that Symmetrical Uncertainty can not only represent the correct aspects of Pearson Correlation Coefficient, but also make up for its shortcomings, thus achieve the purpose of measuring flow metric correlation quantitatively and accurately. On the other hand, reveal the actual relationship among fourteen common flow metrics.

Keywords

References

  1. Kun-chan Lan, and John Heidemann, "A measurement study of correlations of Internet flow characteristics," Computer Networks, vol.50, no.1, pp.46-62, Jan.2006. https://doi.org/10.1016/j.comnet.2005.02.008
  2. Zhou Mingzhong. Study of Large-scale network IP flows behavior characteristics and measurement algorithms. Jiangsu: Southeast University, 2006 Nanjing.
  3. Gregor Maier, Anja Feldmann, Vern Paxson, and Mark Allman. "On dominant characteristics of residential broadband internet traffic," in Proc. of the 9th ACM SIGCOMM conference on Internet measurement conference ACM, pp.90-102,2009.
  4. Felix Hernandez Campos, J. S. Marron, Sidney I. Resnick, and Kevin Jeffay. "Extremal dependence: Internet Traffic Applications," Stochastic Models, vol.21, no.1, pp.1-35, 2005. https://doi.org/10.1081/STM-200046446
  5. Cheolwoo Park, Felix Hernandez-Campos, J. S. Marron, Kevin Jeffay, and F. Donelson Smith. "Analysis of dependence among size, rate, and duration in internet flows," Annals of Applied Statistics. 2010.
  6. C.Dewes, A.Wichmann, A.Feldmann. "An analysis of internet chat systems," in Proc. of ACM SIGCOMM , pp.51-64, 2003.
  7. S. Saroiu, P. K. Gummadi, and S. D. Gribble. "A measurement study of peer-to-peer file sharing systems," in Proc. of Multimedia Computing and Networking 2002, pp.156-170, 2002.
  8. Kurt Tutschku. "A measurement-based traffic profile of the edonkey file-sharing service," in Proc. of the 5th annual Passive and Active Measurement Workshop, pp.12-21, 2004.
  9. Louis Plissonneau, Jean-Laurent Costeux, Patrick Brown. "Analysis of Peer-to-Peer Traffic on ADSL," In: Proc. of the 6th annual Passive and Active Measurements Workshop (PAM'05). Boston, USA, pp.69-82, 2005.
  10. Weinstein, Eric W. Correlation Coefficient.http://mathworld.wolfram.com/CorrelationCoefficient.html. 2005-5-25/2006-4-15.
  11. StatSoft, Inc. Basic Statistics. http://www.statsoft.com/textbook/basic-statistics/. 2009.
  12. M A Hall. "Correlation-based feature selection for discrete and numeric class machine learning," in Proc. of the 17th International Conference on Machine Learning, pp.359-366, 2000.
  13. Mark A. Hall. Correlation-based Feature Selection for Machine Learning. New Zealand: The University of Waikato. 1999.
  14. P. Mitra, C. A. Murthy, and S. K. Pal. "Unsupervised Feature Selection Using Feature Similarity," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.24, no.3, pp.301-312,2002. https://doi.org/10.1109/34.990133
  15. Jing Yuan, Zhu Li, and Ruixi Yuan. "Information entropy based clustering method for unsupervised internet traffic classification", in Proc. of IEEE International Conference on Communications, pp.1588-1592, 2008.
  16. D. A. Bell and H. Wang "A formalism for relevance and its application in feature subset selection," Machine Learning, vol.41, no.2, pp.175-195, 2000. https://doi.org/10.1023/A:1007612503587
  17. Lei Yu, and Huan Liu. "efficient feature selection via analysis of relevance and redundancy," Journal of Machine Learning Research, vol.5, no.2, pp.1205-1224, 2004.
  18. Qu, G., Hariri, S., and Yousif, M. "A new dependency and correlation analysis for features", IEEE Transactions on Knowledge and Data Engineering, vol.17, no.9, pp.1199-1207, Sep.2005. https://doi.org/10.1109/TKDE.2005.136
  19. T Ganchev, P Zervas, N Fakotakis, and G Kokkinakis. "Benchmarking feature selection techniques on the speaker verification task," in Proc. of CSNDSP'06, pp.314-318, 2006.
  20. Qu Wei, Zhu Shibing et al. the information theory and applications, Beijing: Tsinghua University press. 2005.
  21. W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery. Numerical Recipes in C, London: Cambridge University Press, 1988.
  22. CAIDA. NLANR PMA. http://pma.nlanr.net. 2002-09-11/2005-04.
  23. Ye Cinan, and Cao Weili. Mathematical statistics with applications, Mechanical industry press. 2004.
  24. Shuang Hong Yang, and Bao-Gang Hu, "discriminative feature selection by nonparametric bayes error minimization," IEEE Transactions on Knowledge and Data Engineering, Apr.2011.
  25. Huawen Liu, Jigui Sun, Lei Liu, and Huijie Zhang, "Feature selection with dynamic mutual information," Pattern Recognition, vol.42, no.7, pp.1330-1339, Jul.2009. https://doi.org/10.1016/j.patcog.2008.10.028