Research on the development of automated tools to de-identify personal information of data for AI learning - Based on video data -

인공지능 학습용 데이터의 개인정보 비식별화 자동화 도구 개발 연구 - 영상데이터기반 -

  • 이현주 (동국대학교, 기술창업학과) ;
  • 이승엽 (동국대학교, 기술창업학과) ;
  • 전병훈 (동국대학교 기술창업학과)
  • Received : 2023.05.08
  • Accepted : 2023.06.21
  • Published : 2023.06.30

Abstract

Recently, de-identification of personal information, which has been a long-cherished desire of the data-based industry, was revised and specified in August 2020. It became the foundation for activating data called crude oil[2] in the fourth industrial era in the industrial field. However, some people are concerned about the infringement of the basic rights of the data subject[3]. Accordingly, a development study was conducted on the Batch De-Identification Tool, a personal information de-identification automation tool. In this study, first, we developed an image labeling tool to label human faces (eyes, nose, mouth) and car license plates of various resolutions to build data for training. Second, an object recognition model was trained to run the object recognition module to perform de-identification of personal information. The automated personal information de-identification tool developed as a result of this research shows the possibility of proactively eliminating privacy violations through online services. These results suggest possibilities for data-based industries to maximize the value of data while balancing privacy and utilization.

최근 데이터 기반 산업계의 오랜 숙원이었던 개인정보 비식별화가 2020년 8월 데이터3법[1]이 개정되어 명시화 되었다. 4차 산업시대의 원유[2]라 불리는 데이터를 산업 분야에서 활성화할 수 있는 기틀이 되었다. 하지만, 일각에서는 비식별개인정보(personally non-identifiable information)가 정보주체의 기본권 침해를 우려하고 있는 실정이다[3]. 이에 개인정보 비식별화 자동화 도구인 Batch De-Identification Tool을 개발 연구를 수행하였다. 본 연구에서는 첫 번째로, 학습용 데이터 구축을 위해 사람 얼굴(눈, 코, 입) 및 다양한 해상도의 자동차 번호판 등을 라벨링하는 이미지 라벨링 도구를 개발하였다. 두 번째로, 객체 인식 모델을 학습하여 객체 인식 모듈을 실행함으로써 개인정보 비식별화를 수행할 수 있도록 하였다. 본 연구의 결과로 개발된 개인정보 비식별화 자동화 도구는 온라인 서비스를 통해 개인정보 침해 요소를 사전에 제거할 수 있는 가능성을 보여주었다. 이러한 결과는 데이터 기반 산업계에서 개인정보 보호와 활용의 균형을 유지하면서도 데이터의 가치를 극대화할 수 있는 가능성을 제시하고 있다

Keywords

References

  1. S. D. Moon, "A study on the reformation of personal information protection law in the era of the 4th industrial revolution: focused on three data laws.", Domestic master's thesis Hanyang University Graduate School of Public Policy, 2021.
  2. J. H. Lee, "Crude oil of the 4th industrial revolution, 'data specialist company' to lead the revitalization of the data economy, on-site visit and conference held", Ministry of Science and ICT, press release, Apr. 2018.
  3. S. J. Jong, "Legal review on protection and use of the personally non-identifiable information", 2010.
  4. D. H. Kim, S. S. Kim, "A New Scheme for Risk Assessment Based on Data Context for De-Identification of Personal Information", Journal of The Korea Institute of Information Security & Cryptology, VOL.30, NO.4, Aug. 2020.
  5. J. H. Jang, Y. J. Gim, "Policy direction for improving the effectiveness of AI learning data business", NIA, IT & Future Strategy Report, 2020.
  6. W. J. Moon and 7 others, "Effect of Machine Learning Education Focused on Data Labeling on Computational Thinking of Elementary School Students", Journal of The Korean Association of Information Education Vol. 25, No. 2, pp. 327-335, April 2021. https://doi.org/10.14352/jkaie.2021.25.2.327
  7. H. C. Yang and 4 others, "A Guide to Using Personal Information De-identification Technology for Big Data Utilization ver 1.0", NIA, 2015.
  8. S. T. Oh and 6 others, "Video data anonymization technology and evaluation method", NIA, 2019
  9. H. W. Jung, "Support to strengthen data privacy protection for artificial intelligence (AI) learning", Personal Information Protection Committee, press release, Jun, 2021.
  10. Mei Wang, Weihong Deng, Deep Face Recognition: A Survey, In Neurocomputing 14 March 2021 429:215-244 https://doi.org/10.1016/j.neucom.2020.10.081
  11. Jiankang Deng , Jia Guo, Niannan Xue, Stefanos Zafeiriou,(2019) ArcFace: Additive Angular Margin Loss for Deep Face Recognition
  12. MAXTED Co., Ltd, "Web-based artificial intelligence data labeling service: Max Data Platform", 2021
  13. Alejandro Pena and 4 others, "Facial Expressions as a Vulnerability in Face Recognition", 2020