• Title/Summary/Keyword: Data de-identification

Search Result 119, Processing Time 0.028 seconds

De-identification of Medical Information and Issues (의료정보 비식별화와 해결과제)

  • Woo, SungHee
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2017.10a
    • /
    • pp.552-555
    • /
    • 2017
  • It is de-identification that emerged to find the trade-off between the use of big data and the protection of personal information. In particular, in the field of medical that deals with various semi-identifier information and sensitive information, de-identification must be performed in order to use medical consultation such as EMR and voice, KakaoTalk, and SNS. However, there is no separate law for medical information protection and legislation for de-identification. Therefore, in this study, we present the current status of de-identification of personal information, the status and case of de-identification of medical information, and finally we provide issues and solutions for medial information protection and de-identification.

  • PDF

Secure De-identification and Data Sovereignty Management of Decentralized SSI using Restructured ZKP (재구성된 영지식 증명을 활용한 탈중앙형 자기 주권 신원의 안전한 비식별화 및 데이터 주권 관리)

  • Cho, Kang-Woo;Jeon, Mi-Hyeon;Shin, Sang Uk
    • Journal of Digital Convergence
    • /
    • v.19 no.8
    • /
    • pp.205-217
    • /
    • 2021
  • Decentralized SSI(Self Sovereign Identity) has become an alternative to a new digital identity solution, but an efficient de-identification technique has not been proposed due to the unique algorithmic characteristics of data transactions. In this study, to ensure the decentralized operation of SSI, we propose a de-identification technique that does not remove identifiers by restructuring the verification results of ZKP (Zero Knowledge Proof) into a form that can be provided to the outside by the verifier. In addition, it is possible to provide restructured de-identification data without the consent of data subject by proposing the concept of differential sovereignty management for each entity participating in verification. As a result, the proposed model satisfies the domestic personal information protection law in a decnetralized SSI, in addition provides secure and efficient de-identification processing and sovereignty management.

A study on the method of measuring the usefulness of De-Identified Information using Personal Information

  • Kim, Dong-Hyun
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.6
    • /
    • pp.11-21
    • /
    • 2022
  • Although interest in de-identification measures for the safe use of personal information is growing at home and abroad, cases where de-identified information is re-identified through insufficient de-identification measures and inferences are occurring. In order to compensate for these problems and discover new technologies for de-identification measures, competitions to compete on the safety and usefulness of de-identified information are being held in Korea and Japan. This paper analyzes the safety and usefulness indicators used in these competitions, and proposes and verifies new indicators that can measure usefulness more efficiently. Although it was not possible to verify through a large population due to a significant shortage of experts in the fields of mathematics and statistics in the field of de-identification processing, very positive results could be derived for the necessity and validity of new indicators. In order to safely utilize the vast amount of public data in Korea as de-identified information, research on these usefulness metrics should be continuously conducted, and it is expected that more active research will proceed starting with this thesis.

A Study on Impacts of De-identification on Machine Learning's Biased Knowledge (머신러닝 편향성 관점에서 비식별화의 영향분석에 대한 연구)

  • Soohyeon Ha;Jinsong Kim;Yeeun Son;Gaeun Won;Yujin Choi;Soyeon Park;Hyung-Jong Kim;Eunsung Kang
    • Journal of the Korea Society for Simulation
    • /
    • v.33 no.2
    • /
    • pp.27-35
    • /
    • 2024
  • We aimed to shed light on the issue of perpetuating societal disparities by analyzing the impact of inherent biases present in datasets used for training artificial intelligence models on the predictions generated by Artificial Intelligence(AI). Therefore, to examine the influence of data bias on AI models, we constructed an original dataset containing biases related to gender wage gaps and subsequently created a de-identified dataset. Additionally, by utilizing the decision tree algorithm, we compared the outputs of AI models trained on both the original and de-identified datasets, aiming to analyze how data de-identification affects the biases in the results produced by artificial intelligence models. Through this, our goal was to highlight the significant role of data de-identification not only in safeguarding individual privacy but also in addressing biases within the data.

Research on Artificial Intelligence Based De-identification Technique of Personal Information Area at Video Data (영상데이터의 개인정보 영역에 대한 인공지능 기반 비식별화 기법 연구)

  • In-Jun Song;Cha-Jong Kim
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.19 no.1
    • /
    • pp.19-25
    • /
    • 2024
  • This paper proposes an artificial intelligence-based personal information area object detection optimization method in an embedded system to de-identify personal information in video data. As an object detection optimization method, first, in order to increase the detection rate for personal information areas when detecting objects, a gyro sensor is used to collect the shooting angle of the image data when acquiring the image, and the image data is converted into a horizontal image through the collected shooting angle. Based on this, each learning model was created according to changes in the size of the image resolution of the learning data and changes in the learning method of the learning engine, and the effectiveness of the optimal learning model was selected and evaluated through an experimental method. As a de-identification method, a shuffling-based masking method was used, and double-key-based encryption of the masking information was used to prevent restoration by others. In order to reuse the original image, the original image could be restored through a security key. Through this, we were able to secure security for high personal information areas and improve usability through original image restoration. The research results of this paper are expected to contribute to industrial use of data without personal information leakage and to reducing the cost of personal information protection in industrial fields using video through de-identification of personal information areas included in video data.

A Study on Metering Data De-identification Method for Smart Grid Privacy Protection (스마트그리드 개인정보보호를 위한 미터링 데이터 비식별화 방안 연구)

  • Lee, Donghyeok;Park, Namje
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.26 no.6
    • /
    • pp.1593-1603
    • /
    • 2016
  • In the smart grid environment, there are various security threats. In particular, exposure of smart meter data can lead to serious privacy violation. In this paper, we propose a method for de-identification method of metering data. The proposed method is to de-identify the time data and the numeric data, respectively. Therefore, it can't analyze the pattern information from the metering data. In addition, there is an advantage that the query is available, such as the range of search in the database for statistical analysis.

The De-identification Technique Using Data Grouping in Relational Database (관계형 데이터베이스에서 데이터 그룹화를 이용한 익명화 처리 기법)

  • Park, Jun-Bum;Jin, Seung-Hun;Choi, Daeseon
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.25 no.3
    • /
    • pp.493-500
    • /
    • 2015
  • Personal information exposed in the Internet is increasing by the public data opening and sharing, vitalization of SNS(Social Network Service) and growth of information shared between users. Exposed personal information in the Internet can infringe upon targeted users using linkage attack or background attack. To prevent these attack De-identification models were appeared a few years ago. The 'k-anonymity' has been introduced in the first place, and the '${\ell}$-diversity' and 't-closeness' have been followed up as solutions, and diverse algorithms have been being suggested for performance improvement nowadays. However, industry or public sectors actually needs a whole solution as a system for the de-identification process rather than performance of the de-identification algorithm. This paper explains a way of de-identification techique for 'k-anonymity', '${\ell}$-diversity', and 't-closeness' algorithm using QI(Quasi-Identifier) grouping method in the relational database.

A study on Data Context-Based Risk Measurement Method for Pseudonymized Information Processing

  • Kim, Dong-Hyun
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.6
    • /
    • pp.53-63
    • /
    • 2022
  • Recently, as digital transformation due to the COVID-19 pandemic accelerates, data to improve individual quality of life is being used in large quantities, and more reinforced non-identification processing procedures are required to utilize the most valuable personal information among data. In Korea, procedures for de-identification measures are presented through amendments to laws and guidelines, but there is no methodology to measure the level of de-identification in the field due to ambiguous processing standards and subjective risk measurement methods. This paper compares and analyzes the current status of policy and guidelines related to de-identification measures proposed at home and abroad to derive complementary points, suggests a data context-based risk measurement method centered on pseudonymized information processing, and verifies its validity. As a result of verification through Delphi survey and focus group interview (FGI), it was confirmed that the need for the proposed methodology and the validity of the indicators were high.

Multi-type object detection-based de-identification technique for personal information protection (개인정보보호를 위한 다중 유형 객체 탐지 기반 비식별화 기법)

  • Ye-Seul Kil;Hyo-Jin Lee;Jung-Hwa Ryu;Il-Gu Lee
    • Convergence Security Journal
    • /
    • v.22 no.5
    • /
    • pp.11-20
    • /
    • 2022
  • As the Internet and web technology develop around mobile devices, image data contains various types of sensitive information such as people, text, and space. In addition to these characteristics, as the use of SNS increases, the amount of damage caused by exposure and abuse of personal information online is increasing. However, research on de-identification technology based on multi-type object detection for personal information protection is insufficient. Therefore, this paper proposes an artificial intelligence model that detects and de-identifies multiple types of objects using existing single-type object detection models in parallel. Through cutmix, an image in which person and text objects exist together are created and composed of training data, and detection and de-identification of objects with different characteristics of person and text was performed. The proposed model achieves a precision of 0.724 and mAP@.5 of 0.745 when two objects are present at the same time. In addition, after de-identification, mAP@.5 was 0.224 for all objects, showing a decrease of 0.4 or more.

A study on the policy of de-identifying unstructured data for the medical data industry (의료 데이터 산업을 위한 비정형 데이터 비식별화 정책에 관한 연구)

  • Sun-Jin Lee;Tae-Rim Park;So-Hui Kim;Young-Eun Oh;Il-Gu Lee
    • Convergence Security Journal
    • /
    • v.22 no.4
    • /
    • pp.85-97
    • /
    • 2022
  • With the development of big data technology, data is rapidly entering a hyperconnected intelligent society that accelerates innovative growth in all industries. The convergence industry, which holds and utilizes various high-quality data, is becoming a new growth engine, and big data is fused to various traditional industries. In particular, in the medical field, structured data such as electronic medical record data and unstructured medical data such as CT and MRI are used together to increase the accuracy of disease prediction and diagnosis. Currently, the importance and size of unstructured data are increasing day by day in the medical industry, but conventional data security technologies and policies are structured data-oriented, and considerations for the security and utilization of unstructured data are insufficient. In order for medical treatment using big data to be activated in the future, data diversity and security must be internalized and organically linked at the stage of data construction, distribution, and utilization. In this paper, the current status of domestic and foreign data security systems and technologies is analyzed. After that, it is proposed to add unstructured data-centered de-identification technology to the guidelines for unstructured data and technology application cases in the industry so that unstructured data can be actively used in the medical field, and to establish standards for judging personal information for unstructured data. Furthermore, an object feature-based identification ID that can be used for unstructured data without infringing on personal information is proposed.