• Title/Summary/Keyword: statistical disclosure control

Search Result 6, Processing Time 0.019 seconds

Release of Microdata and Statistical Disclosure Control Techniques (마이크로데이터 제공과 통계적 노출조절기법)

  • Kim, Kyu-Seong
    • Communications for Statistical Applications and Methods
    • /
    • v.16 no.1
    • /
    • pp.1-11
    • /
    • 2009
  • When micro data are released to users, record by record data are disclosed and the disclosure risk of respondent's information is inevitable. Statistical disclosure control techniques are statistical tools to reduce the risk of disclosure as well as to increase data utility in case of data release. In this paper, we reviewed the concept of disclosure and disclosure risk as well as statistical disclosure control techniques and then investigated selection strategies of a statistical disclosure control technique related with data utility. The risk-utility frontier map method was illustrated as an example. Finally, we listed some check points at each step when microdata are released.

Application of a Statistical Disclosure Control Techniques Based on Multiplicative Noise (승법잡음모형을 이용한 통계적 노출조절기법의 적용)

  • Kim, Young-Won;Kim, Tae-Yeon;Ki, Kye-Nam
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.1
    • /
    • pp.127-136
    • /
    • 2011
  • Multiplicative noise model is the one of popular method for masking continuous variables. In this paper, we propose the transformation on the variable to which random noise was multiplied. An advantage of the masking method using proposed transformation is that the masking data users can obtain the unbiased values of mean and variance of original (unmasked) data. We also consider the data utility and correlation structure of variables when we apply the proposed multiplicative noise scheme. To investigate the properties of the method of masking based on multiplicative noise, a simulation study has been conducted using the 2008 Householder Income and Expenditure Survey data.

Statistical disclosure control for public microdata: present and future (마이크로데이터 공표를 위한 통계적 노출제어 방법론 고찰)

  • Park, Min-Jeong;Kim, Hang J.
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.6
    • /
    • pp.1041-1059
    • /
    • 2016
  • The increasing demand from researchers and policy makers for microdata has also increased related privacy and security concerns. During the past two decades, a large volume of literature on statistical disclosure control (SDC) has been published in international journals. This review paper introduces relatively recent SDC approaches to the communities of Korean statisticians and statistical agencies. In addition to the traditional masking techniques (such as microaggregation and noise addition), we introduce an online analytic system, differential privacy, and synthetic data. For each approach, the application example (with pros and cons, as well as methodology) is highlighted, so that the paper can assist statical agencies that seek a practical SDC approach.

Differential Privacy in Practice

  • Nguyen, Hiep H.;Kim, Jong;Kim, Yoonho
    • Journal of Computing Science and Engineering
    • /
    • v.7 no.3
    • /
    • pp.177-186
    • /
    • 2013
  • We briefly review the problem of statistical disclosure control under differential privacy model, which entails a formal and ad omnia privacy guarantee separating the utility of the database and the risk due to individual participation. It has born fruitful results over the past ten years, both in theoretical connections to other fields and in practical applications to real-life datasets. Promises of differential privacy help to relieve concerns of privacy loss, which hinder the release of community-valuable data. This paper covers main ideas behind differential privacy, its interactive versus non-interactive settings, perturbation mechanisms, and typical applications found in recent research.

Enhanced Regular Expression as a DGL for Generation of Synthetic Big Data

  • Kai, Cheng;Keisuke, Abe
    • Journal of Information Processing Systems
    • /
    • v.19 no.1
    • /
    • pp.1-16
    • /
    • 2023
  • Synthetic data generation is generally used in performance evaluation and function tests in data-intensive applications, as well as in various areas of data analytics, such as privacy-preserving data publishing (PPDP) and statistical disclosure limit/control. A significant amount of research has been conducted on tools and languages for data generation. However, existing tools and languages have been developed for specific purposes and are unsuitable for other domains. In this article, we propose a regular expression-based data generation language (DGL) for flexible big data generation. To achieve a general-purpose and powerful DGL, we enhanced the standard regular expressions to support the data domain, type/format inference, sequence and random generation, probability distributions, and resource reference. To efficiently implement the proposed language, we propose caching techniques for both the intermediate and database queries. We evaluated the proposed improvement experimentally.

A Study on the Privacy Concern of e-commerce Users: Focused on Information Boundary Theory (전자상거래 이용자의 프라이버시 염려에 관한 연구 : 정보경계이론을 중심으로)

  • Kim, Jong-Ki;Oh, Da-Woon
    • The Journal of Information Systems
    • /
    • v.26 no.2
    • /
    • pp.43-62
    • /
    • 2017
  • Purpose This study provided empirical support for the model that explain the formation of privacy concerns in the perspective of Information Boundary Theory. This study investigated an integrated model suggesting that privacy concerns are formed by the individual's disposition to value privacy, privacy awareness, awareness of privacy policy, and government legislation. The Information Boundary Theory suggests that the boundaries of information space dependends on the individual's personal characteristics and environmental factors of e-commerce. When receiving a request for personal information from e-commerce websites, an individual assesses the risk depending on the risk-control assessment, the perception of intrusion give rise to privacy concerns. Design/methodology/approach This study empirically tested the hypotheses with the data collected in a survey that included the items measuring the constructs in the model. The survey was aimed at university students. and a causal modeling statistical technique(PLS) is used for data analysis in this research. Findings The results of the survey indicated significant relationships among environmental factors of e-commerce websites, individual's personal privacy characteristics and privacy concerns. Both individual's awareness of institutional privacy assurance on e-commerce and the privacy characteristics affect the risk-control assessment towards information disclosure, which becomes an essential components of privacy concerns.