On the Significance of Domain-Specific Pretrained Language Models for Log Anomaly Detection

로그 이상 탐지를 위한 도메인별 사전 훈련 언어 모델 중요성 연구

  • Lelisa Adeba Jilcha (ISAA Lab., Dept. of AI Convergence Network, Ajou University) ;
  • Deuk-Hun Kim (Inst. for Computiong and Informatics Research, Ajou University) ;
  • Jin Kwak (Dept. of AI Convergence Network, Ajou University)
  • 레리사 아데바 질차 (아주대학교 AI융합네트워크학과, 정보보호응용 및 보증연구실) ;
  • 김득훈 (아주대학교 소프트웨어융합연구소) ;
  • 곽진 (아주대학교 사이버보안학과)
  • Published : 2024.05.23

Abstract

Pretrained language models (PLMs) are extensively utilized to enhance the performance of log anomaly detection systems. Their effectiveness lies in their capacity to extract valuable semantic information from logs, thereby strengthening the detection performance. Nonetheless, challenges arise due to discrepancies in the distribution of log messages, hindering the development of robust and generalizable detection systems. This study investigates the structural and distributional variation across various log message datasets, underscoring the crucial role of domain-specific PLMs in overcoming the said challenge and devising robust and generalizable solutions.

Keywords

Acknowledgement

이 논문은 2024년도 정부(과학기술정보통신부)의 재원으로 정보통신기획평가원의 지원을 받아 수행된 연구임 (No.2021-0-01806, 스마트공장 보안 내재화 및 보안관리 기술 개발)

References

  1. S. He, J. Zhu, P. He and M. R. Lyu, "Loghub: A large collection of system log datasets towards automated log analytics," arXiv:2008.06448, 2020.
  2. J. Lou, Q. Fu, S. Yang, Y Xu and J. Li, "Mining invariants from console logs for system problem detection," ATC'10: Proc. of the USENIX Annual Technical Conference, Boston, USA, Jun. 2010.
  3. W. Meng et al., "LogAnomaly: Unsupervised detection of sequential and quantitative anomalies in unstructured logs," Proc. 28th Int. Joint Conf. Artif. Intell. (IJCAI), Vienna, Austria, Aug. 2019, pp. 4739-4745.
  4. X. Zhang et al., "Robust log-based anomaly detection on unstable log data," Proc. 27th ACM Joint Meeting Eur. Softw. Eng. Conf. Symp. Foundations Softw. Eng., Tallinn, Estonia, Aug. 2019, pp. 807-817.
  5. S. Chen and H. Liao, "BERT-log: Anomaly detection for system logs based on pre-trained language model," Appl. Artif. Intell., vol. 36, no. 1, pp. e2145642-1-e2145642-23, Dec. 2022.
  6. M. Du, F. Li, G. Zheng and V. Srikumar, "Deeplog: Anomaly detection and diagnosis from system logs through deep learning", ACM SIGSAC conference on computer and communications security, Dallas, USA, Oct. 2017, pp. 1285-1298.
  7. J. Devlin, M.-W. Chang, K. Lee and K. Toutanova, "BERT: Pre-training of deep bidirectional transformers for language understanding," Proc. Conf. North Amer. Chapter Assoc. Comput. Linguistics, Minneapolis, Jun. 2019, USA, pp. 4171-4186.