DOI QR코드

DOI QR Code

시스템 결함 분석을 위한 이벤트 로그 연관성에 관한 연구

Correlation Analysis of Event Logs for System Fault Detection

  • 박주원 (한국과학기술정보연구원 슈퍼컴퓨팅본부) ;
  • 김은혜 (한국전자통신연구원 초연결통신연구소) ;
  • 염재근 (한국과학기술정보연구원 슈퍼컴퓨팅본부) ;
  • 김성호 (한국과학기술정보연구원 슈퍼컴퓨팅본부)
  • 투고 : 2016.04.18
  • 심사 : 2016.06.17
  • 발행 : 2016.06.30

초록

To identify the cause of the error and maintain the health of system, an administrator usually analyzes event log data since it contains useful information to infer the cause of the error. However, because today's systems are huge and complex, it is almost impossible for administrators to manually analyze event log files to identify the cause of an error. In particular, as OpenStack, which is being widely used as cloud management system, operates with various service modules being linked to multiple servers, it is hard to access each node and analyze event log messages for each service module in the case of an error. For this, in this paper, we propose a novel message-based log analysis method that enables the administrator to find the cause of an error quickly. Specifically, the proposed method 1) consolidates event log data generated from system level and application service level, 2) clusters the consolidated data based on messages, and 3) analyzes interrelations among message groups in order to promptly identify the cause of a system error. This study has great significance in the following three aspects. First, the root cause of the error can be identified by collecting event logs of both system level and application service level and analyzing interrelations among the logs. Second, administrators do not need to classify messages for training since unsupervised learning of event log messages is applied. Third, using Dynamic Time Warping, an algorithm for measuring similarity of dynamic patterns over time increases accuracy of analysis on patterns generated from distributed system in which time synchronization is not exactly consistent.

키워드

참고문헌

  1. Feinerer, I. and Hornik, K., Package tm, Tech. Rep., CRAN, 2013.
  2. Gerhards, R., The syslog protocol, RFC 5424, 2009.
  3. Joo, W.-M. and Choi, J.Y., Curriculum mining analysis using clustering-based process mining, Journal of Society of Korea Industrial and Systems Engineering, 2015, Vol. 38, No. 4, pp. 45-55. https://doi.org/10.11627/jkise.2015.38.4.45
  4. Kaufman, K. and Rousseeuw, P.J., Finding groups in data : An introduction to cluster analysis, John Wiley and Sons, 2009.
  5. Oliner, A. and Stearley, J., What supercomputers say : A study of five system logs, in Proc. of 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2007, Edinburgh, pp. 575-584.
  6. OpenStack, OpenStack open source cloud computing software, [Online]. Available at : http://www.openstack.org/.
  7. Park, Y.-S., Yoon, B.-N., and Lim, J-H., An empirical study on faults prediction for large scale telecommunication software, Journal of the Korean Society for Quality Management, 1999, Vol. 27, No. 2, pp. 263-276.
  8. Pitakrat, T., Grunert, J., Kabierschke, O., Keller, F., and Hoorn, A., A framework for system event classification and prediction by means of machine learning, in Proc. of the 8th International Conference on Performance Evaluation Methodologies and Tools. ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering), 2014, pp. 173-180.
  9. R development core team, R : A language and environment for statistical computing, [Online], Available : http://www.r-project.org.
  10. RSYSLOG : The rocket-fast system for log processing. Available: http://www.rsyslog.com.
  11. Sahoo, R.K., Oliner, A.J., Rish, I., Gupta, M., Moreira, J.E., MA, S., Vilalta, R., and Sivasubramaniam, A., Critical event prediction for proactive management in large-scale computer clusters, in Proc. of the 9th ACM SIGKDD international conference on Knowledge discovery and data mining, 2003, pp. 426-435.
  12. Sakoe, H. and Chiba, S., Dynamic programming algorithm optimization for spoken word recognition, IEEE Transactions on Acoustics, Speech and Signal Processing, 1978, Vol. 26, No. 1, pp. 43-49. https://doi.org/10.1109/TASSP.1978.1163055
  13. Yoo, J., Module communization for product platform design using clustering analysis, Journal of Society of Korea Industrial and Systems Engineering, 2014, Vol. 37, No. 3, pp. 89-98. https://doi.org/10.11627/jkise.2014.37.3.89
  14. Zheng, Z., Lan, Z., Gupta, R., Coghlan, S., and Beckman, P., A practical failure prediction with location and lead time for Blue Gene/P, in Proc. of International Conference on Dependable Systems and Networks Workshops (DSN-W), 2010, pp. 15-22.
  15. Zheng, Z., Lan, Z., Park, B.H., and Geist, A., System log pre-processing to improve failure prediction, in Proc. of IEEE/IFIP International Conference on Dependable Systems and Networks, 2009, pp. 572-577.