An Evaluation Study on Artificial Intelligence Data Validation Methods and Open-source Frameworks

인공지능 데이터 품질검증 기술 및 오픈소스 프레임워크 분석 연구

  • Yun, Changhee (AI Future Strategy Center, National Information-Society Agency) ;
  • Shin, Hokyung (School of Computer Science and Engineering, Kyungpook National University) ;
  • Choo, Seung-Yeon (School of Architecture, Kyungpook National University) ;
  • Kim, Jaeil (School of Computer Science and Engineering, Kyungpook National University)
  • 윤창희 ;
  • 신호경 ;
  • 추승연 ;
  • 김재일
  • Received : 2021.09.23
  • Accepted : 2021.10.19
  • Published : 2021.10.30


In this paper, we investigate automated data validation techniques for artificial intelligence training, and also disclose open-source frameworks, such as Google's TensorFlow Data Validation (TFDV), that support automated data validation in the AI model development process. We also introduce an experimental study using public data sets to demonstrate the effectiveness of the open-source data validation framework. In particular, we presents experimental results of the data validation functions for schema testing and discuss the limitations of the current open-source frameworks for semantic data. Last, we introduce the latest studies for the semantic data validation using machine learning techniques.