• 제목/요약/키워드: microdata

검색결과 59건 처리시간 0.016초

Study on a Measurement of Disclosure Risk of Microdata by Similarity

  • Cho, Hyeon-Kwan;Kwon, Dae-Hong;Lee, Suk-Hoon
    • 응용통계연구
    • /
    • 제25권5호
    • /
    • pp.743-755
    • /
    • 2012
  • Researchers using various of statistical data want to obtain microdata for a detailed analysis. Institutes need to provide microdata after masking processes for sensitive data. Many researchers have used the proportion of unique identity for the measurement of disclosure risk. We proposed a new measurement of disclosure risk that considers the case that all identities are the same or similar. As an application example, we compare the newly proposed measurement and the existing measurement using 10667 data in 'Korea Household Income and Expenditure Survey data for 2010'.

인구주택총조사 마이크로자료의 개인정보 노출제한방법 (A Method of Masking for 2005 Korean Census Microdata)

  • 정동명;정미옥
    • 응용통계연구
    • /
    • 제21권2호
    • /
    • pp.313-325
    • /
    • 2008
  • 통계이용자들의 마이크로자료 제공요구가 갈수록 증가하고 있으며 통계작성기관도 마이크로자료의 제공을 위해 노력을 기울이고 있는 실정이다. 그러나 마이크로자료에는 응답자의 개인정보가 많이 담겨 있으므로 자료를 그대로 제공할 경우 개인정보가 노출 될 가능성이 높기 때문에 자료제공시 적절한 방법으로 노출을 제한시켜 주어야만 한다. 본 논문에서는 마이크로자료 제공시 발생하는 응답자의 정보노출에 대한 개념과 이를 제한하는 방법 등을 소개하고, 2005년에 통계청에서 실시한 인구주택총조사의 2% 마이크로자료 제공을 위해 다양한 노출제한방법을 적용하여 자료파일을 작성하는 과정을 설명하였다. 즉, 10% 표본조사결과를 모집단으로 하고 계통추출한 표본을 대상으로 외부인이 식별할 가능성이 높은 12개 항목을 key 변수로 선정한 후, 각 변수의 조합별 유일성을 파악하고 노출위험을 계산하였다. 그 결과 2% 표본을 통한 정보의 축소는 물론 그룹화, 코딩 등을 포함한 일련의 방법들을 적용함으로써 인구주택총조사 마이크로자료의 개인정보 노출을 제한하는데 상당한 효과가 있음을 알 수 있었다.

다중대체와 재현자료 작성 (Multiple imputation and synthetic data)

  • 김정연;박민정
    • 응용통계연구
    • /
    • 제32권1호
    • /
    • pp.83-97
    • /
    • 2019
  • 사회가 발전함에 따라 이용자의 다양한 분석 요구에 대응하기 위해 개인 단위로 구성된 마이크로데이터 제공이 증가했다. 나아가 센서스, 행정자료와 같은 전수자료를 마이크로데이터 형태로 제공받아 연구하고자 하는 요구 역시 커지고 있다. 정책결정, 학술목적 등을 위한 마이크로데이터 분석은 가치 창출 측면에서 대단히 바람직하다. 하지만 자료 유용성이 확보된 마이크로데이터 제공은 개인정보가 노출될 가능성이라는 위험을 가질 수 밖에 없다. 이에, 자료의 유용성을 확보하면서 개인정보보호를 보장할 수 있는 여러 방법들이 고려되어 왔다. 이러한 방법 중 하나로 재현자료(synthetic data)를 생성해서 활용하는 방법이 연구되어 왔다. 본 논문은 재현자료 생성과 관련된 방법론 및 주의사항을 소개하여, 재현자료의 이해를 도모하고자 한다. 이를 위해 재현자료 작성에 필수적인 다중대체, 베이지안 예측 모형 및 베이지안 붓스트랩 등의 개념들을 먼저 설명하고, 완전 재현자료 및 부분 재현자료에 대해 살펴본다. 특히, 재현자료 작성을 심도 깊이 이해하기 위해 순차회귀 다중대체(sequential regression multivariate imputation)를 이용해 경시적(longitudinal) 자료를 재현자료로 작성하는 구체적 사례를 살펴본다.

Investigations into Coarsening Continuous Variables

  • Jeong, Dong-Myeong;Kim, Jay-J.
    • 응용통계연구
    • /
    • 제23권2호
    • /
    • pp.325-333
    • /
    • 2010
  • Protection against disclosure of survey respondents' identifiable and/or sensitive information is a prerequisite for statistical agencies that release microdata files from their sample surveys. Coarsening is one of popular methods for protecting the confidentiality of the data. Grouped data can be released in the form of microdata or tabular data. Instead of releasing the data in a tabular form only, having microdata available to the public with interval codes with their representative values greatly enhances the utility of the data. It allows the researchers to compute covariance between the variables and build statistical models or to run a variety of statistical tests on the data. It may be conjectured that the variance of the interval data is lower that of the ungrouped data in the sense that the coarsened data do not have the within interval variance. This conjecture will be investigated using the uniform and triangular distributions. Traditionally, midpoint is used to represent all the values in an interval. This approach implicitly assumes that the data is uniformly distributed within each interval. However, this assumption may not hold, especially in the last interval of the economic data. In this paper, we will use three distributional assumptions - uniform, Pareto and lognormal distribution - in the last interval and use either midpoint or median for other intervals for wage and food costs of the Statistics Korea's 2006 Household Income and Expenditure Survey(HIES) data and compare these approaches in terms of the first two moments.

Limiting Attribute Disclosure in Randomization Based Microdata Release

  • Guo, Ling;Ying, Xiaowei;Wu, Xintao
    • Journal of Computing Science and Engineering
    • /
    • 제5권3호
    • /
    • pp.169-182
    • /
    • 2011
  • Privacy preserving microdata publication has received wide attention. In this paper, we investigate the randomization approach and focus on attribute disclosure under linking attacks. We give efficient solutions to determine optimal distortion parameters, such that we can maximize utility preservation while still satisfying privacy requirements. We compare our randomization approach with l-diversity and anatomy in terms of utility preservation (under the same privacy requirements) from three aspects (reconstructed distributions, accuracy of answering queries, and preservation of correlations). Our empirical results show that randomization incurs significantly smaller utility loss.

2012년 주거실태조사에 나타난 청년 임차가구의 지역별 주거 실태 비교 (Comparisons of Young Renter Households' Housing Situation by Locations Reflected in the 2012 Korea Housing Survey)

  • 이현정
    • 한국주거학회논문집
    • /
    • 제26권1호
    • /
    • pp.81-90
    • /
    • 2015
  • The purpose of this study was to investigate housing characteristics of young renter households by locations using licensed microdata of the 2012 Korea Housing Survey. There were 1,020,216 renter households (weighted count) headed by persons between 20 and 34 years of age, and their housing characteristics were compared statistically across their residential locations (Capital Region, metropolitan cities, other areas). Major findings are as follows: (1) Capital Region young renters had the worst housing quality to have the greatest proportion of households living in units failed to meet national minimum housing standards, and/or in basement or semi-basement units; (2) Capital Region young renters had the greatest proportion of households that had housing cost burdens; and (3) 37.3% of young renter households in metropolitan areas and 33.5% in Capital Region were found to receive family support in order to afford current rental costs.

마이크로데이터 제공과 통계적 노출조절기법 (Release of Microdata and Statistical Disclosure Control Techniques)

  • 김규성
    • Communications for Statistical Applications and Methods
    • /
    • 제16권1호
    • /
    • pp.1-11
    • /
    • 2009
  • 마이크로데이터를 이용자에게 제공하면 레코드 단위의 데이터가 노출되고 응답자의 정보 노출위험이 불가피하다. 통계적 노출조절기법은 통계데이터 제공시 노출위험을 줄이면서 데이터 유용성을 높이기 위한 통계적 기법이다. 본 논문에서는 노출과 노출위험, 그리고 통계적 노출조절기법을 고찰하였고 데이터 유용성과 연관하여 노출조절기법 선택 전략을 살펴보았으며, '위험-유용성 경계 지도' 방법의 예를 알아보았다. 마지막으로 마이크로데이터를 이용자에게 제공할 때 단계별로 검토할 사항을 알아보았다.

A Study on Performing Join Queries over K-anonymous Tables

  • Kim, Dae-Ho;Kim, Jong Wook
    • 한국컴퓨터정보학회논문지
    • /
    • 제22권7호
    • /
    • pp.55-62
    • /
    • 2017
  • Recently, there has been an increasing need for the sharing of microdata containing information regarding an individual entity. As microdata usually contains sensitive information on an individual, releasing it directly for public use may violate existing privacy requirements. Thus, to avoid the privacy problems that occur through the release of microdata for public use, extensive studies have been conducted in the area of privacy-preserving data publishing (PPDP). The k-anonymity algorithm, which is the most popular method, guarantees that, for each record, there are at least k-1 other records included in the released data that have the same values for a set of quasi-identifier attributes. Given an original table, the corresponding k-anonymous table is obtained by generalizing each record in the table into an indistinguishable group, called the equivalent class, by replacing the specific values of the quasi-identifier attributes with more general values. However, query processing over the anonymized data is a very challenging task, due to generalized attribute values. In particular, the problem becomes more challenging with an equi-join query (which is the most common type of query in data analysis tasks) over k-anonymous tables, since with the generalized attribute values, it is hard to determine whether two records can be joinable. Thus, to address this challenge, in this paper, we develop a novel scheme that is able to effectively perform an equi-join between k-anonymous tables. The experiment results show that, through the proposed method, significant gains in accuracy over using a naive scheme can be achieved.

미국 대도시권역 공동주택 임차가구의 주거 만족도 영향 요인 (Influences on Housing Satisfaction of Multifamily Housing Renter Households in the U.S. Metropolitan Statistical Areas)

  • 이현정
    • 한국주거학회논문집
    • /
    • 제23권2호
    • /
    • pp.125-133
    • /
    • 2012
  • The purpose of this study was to explore characteristics and housing satisfaction of multifamily renter households in metropolitan areas using 2009 American Housing Survey public-use microdata. A total of 8,139 multifamily renter household residing in metropolitan statistical areas were selected for data analysis. The findings are as follows: (1) In comparison with other types of households in the metropolitan areas, multifamily renter households tended to show a smaller household size, younger householders, a greater proportion of households with householders who have never married, or have been widowed, divorced or separated; (2) housing cost related variables such as monthly rent or rent per square footage were found not to have significant influence on housing satisfaction of multifamily renter households in metropolitan areas; (3) factors influencing housing satisfaction of multifamily renter households with householder's age 34 years or younger were neighborhood satisfaction, householder's race, structure age and per-person unit size; and (4) neighborhood satisfaction was found to have the strongest influence on housing satisfaction of multifamily renter households in metropolitan areas.

미국 20-30대 1-2인가구의 주거비 부담 실태 (Housing Cost Burden of Single- or Two-person Households in Their 20s and 30s in the United States)

  • 이현정
    • 한국주거학회논문집
    • /
    • 제23권2호
    • /
    • pp.69-77
    • /
    • 2012
  • The purpose of this study was to explore housing cost burden of young single- or two-person households in the United States who have recently moved for job-related reasons. Total 580 households were selected from 2009 American Housing Survey public-use microdata for data analysis. The findings are as follows: (1) Targeted single-person households were characterized as younger households with higher educational attainment, lower household income, and greater proportion of renters, multifamily housing residents and households with housing cost burden than other households; (2) two-person households showed a higher income level and lower housing cost burden; (3) characteristics that showed significant influences on housing cost burden were household size, householder's age, gender, race and educational attainment, household income level and tenure type; and (4) a linear combination of household size, household income, whether or not a low-income household, residency in metropolitan area, and home structural type were found to be most efficient to predict a single- or two-person household's housing cost burden regardless of the household size.