• Title/Summary/Keyword: Model-based imputation

Search Result 32, Processing Time 0.02 seconds

Imputation Methods for the Population and Housing Census 2000 in Korea

  • Kim, Young-Won;Ryu, Jeabok;Park, Jinwoo;Lee, Jaewon
    • Communications for Statistical Applications and Methods
    • /
    • v.10 no.2
    • /
    • pp.575-583
    • /
    • 2003
  • We proposed imputation strategies for the Population and Housing Census 2000 in Korea. The total area of floor space and marital status which have relatively high non-response rates in the Census are considered to develope the effective missing value imputation procedures. The Classification and Regression Tree(CART) is employed to construct the imputation cells for hot-deck imputation, as well as to predict missing value by model-based approach. We compare three imputation methods which include CART model-based imputation, hot-deck imputation based on CART and logical hot-deck imputation proposed by The Korea National Statistical Office. The results suggest that the proposed hot-deck imputation based on CART is very efficient and strongly recommendable.

Missing Value Imputation Method Using CART : For Marital Status in the Population and Housing Census (CART를 활용한 결측값 대체방법 : 인구주택총조사 혼인상태 항목을 중심으로)

  • 김영원;이주원
    • Survey Research
    • /
    • v.4 no.2
    • /
    • pp.1-21
    • /
    • 2003
  • We proposed imputation strategies for marital status in the Population and Housing Census 2000 in Korea to illustrate the effective missing value imputation methods for social survey. The marital status which have relatively high non-response rates in the Census are considered to develope the effective missing value imputation procedures. The Classification and Regression Tree(CART)is employed to construct the imputation cells for hot-deck imputation, as well as to predict the missing value by model-based approach. We compare to imputation methods which include the CART model-based imputation and the sequential hot-deck imputation based on CART. Also we check whether different modeling for each region provides the more improved results. The results suggest that the proposed hot-deck imputation based on CART is very efficient and strongly recommendable. And the results show that different modeling for each region is not necessary.

  • PDF

Comparison of imputation methods for item nonresponses in a panel study (패널자료에서의 항목무응답 대체 방법 비교)

  • Lee, Hyejung;Song, Juwon
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.3
    • /
    • pp.377-390
    • /
    • 2017
  • When conducting a survey, item nonresponse occurs if the respondent does not respond to some items. Since analysis based only on completely observed data may cause biased results, imputation is often conducted to analyze data in its complete form. The panel study is a survey method that examines changes of responses over time. In panel studies, there has been a preference for using information from response values of previous waves when the imputation of item nonresponses is performed; however, limited research has been conducted to support this preference. Therefore, this study compares the performance of imputation methods according to whether or not information from previous waves is utilized in the panel study. Among imputation methods that utilize information from previous responses, we consider ratio imputation, imputation based on the linear mixed model, and imputation based on the Bayesian linear mixed model approach. We compare the results from these methods against the results of methods that do not use information from previous responses, such as mean imputation and hot deck imputation. Simulation results show that imputation based on the Bayesian linear mixed model performs best and yields small biases and high coverage rates of the 95% confidence interval even at higher nonresponse rates.

A Study on Automatic Missing Value Imputation Replacement Method for Data Processing in Digital Data (디지털 데이터에서 데이터 전처리를 위한 자동화된 결측 구간 대치 방법에 관한 연구)

  • Kim, Jong-Chan;Sim, Chun-Bo;Jung, Se-Hoon
    • Journal of Korea Multimedia Society
    • /
    • v.24 no.2
    • /
    • pp.245-254
    • /
    • 2021
  • We proposed the research on an analysis and prediction model that allows the identification of outliers or abnormality in the data followed by effective and rapid imputation of missing values was conducted. This model is expected to analyze efficiently the problems in the data based on the calibrated raw data. As a result, a system that can adequately utilize the data was constructed by using the introduced KNN + MLE algorithm. With this algorithm, the problems in some of the existing KNN-based missing data imputation algorithms such as ignoring the missing values in some data sections or discarding normal observations were effectively addressed. A comparative evaluation was performed between the existing imputation approaches such as K-means, KNN, MEI, and MI as well as the data missing mechanisms including MCAR, MAR, and NI to check the effectiveness/efficiency of the proposed algorithm, and its superiority in all aspects was confirmed.

Missing Value Imputation Technique for Water Quality Dataset

  • Jin-Young Jun;Youn-A Min
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.4
    • /
    • pp.39-46
    • /
    • 2024
  • Many researchers make efforts to evaluate water quality using various models. Such models require a dataset without missing values, but in real world, most datasets include missing values for various reasons. Simple deletion of samples having missing value(s) could distort distribution of the underlying data and pose a significant risk of biasing the model's inference when the missing mechanism is not MCAR. In this study, to explore the most appropriate technique for handing missing values in water quality data, several imputation techniques were experimented based on existing KNN and MICE imputation with/without the generative neural network model, Autoencoder(AE) and Denoising Autoencoder(DAE). The results shows that KNN and MICE combined imputation without generative networks provides the closest estimated values to the true values. When evaluating binary classification models based on support vector machine and ensemble algorithms after applying the combined imputation technique to the observed water quality dataset with missing values, it shows better performance in terms of Accuracy, F1 score, RoC-AuC score and MCC compared to those evaluated after deleting samples having missing values.

A New Method for Imputation of Missing Genotype using Linkage Disequilibrium and Haplotype Information (결측치가 존재하는 유전형 자료에서의 연관불균형과 일배체형을 사용한 결측치 대치 방법)

  • Park Yun-Ju;Kim Young-Jin;Park Jung-Sun;Kim Kuchan;Koh Insong;Jung Ho-Youl
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.2
    • /
    • pp.99-107
    • /
    • 2005
  • In this paper, wc propose a now missing imputation method for minimizing loss of information linkage disequilibrium-based and haplotype-based imputation method, which estimate missing values of the data based on the specificity of Single Nucleotide Polymorphism(SNP) genotype data. Method for imputing data is needed to minimize the loss of information caused by experimental missing data. In general, missing imputation of biological data has used major allele imputation method. but this approach is not optima]. 1'his method has high error rates of missing values estimation since the characteristics of the genotype data are not considered not take into consideration the specific structure of the data. In this paper, we show the results of the comparative evaluation of our model methods and major imputation method for the estimation of missing values.

A comparison of imputation methods using nonlinear models (비선형 모델을 이용한 결측 대체 방법 비교)

  • Kim, Hyein;Song, Juwon
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.4
    • /
    • pp.543-559
    • /
    • 2019
  • Data often include missing values due to various reasons. If the missing data mechanism is not MCAR, analysis based on fully observed cases may an estimation cause bias and decrease the precision of the estimate since partially observed cases are excluded. Especially when data include many variables, missing values cause more serious problems. Many imputation techniques are suggested to overcome this difficulty. However, imputation methods using parametric models may not fit well with real data which do not satisfy model assumptions. In this study, we review imputation methods using nonlinear models such as kernel, resampling, and spline methods which are robust on model assumptions. In addition, we suggest utilizing imputation classes to improve imputation accuracy or adding random errors to correctly estimate the variance of the estimates in nonlinear imputation models. Performances of imputation methods using nonlinear models are compared under various simulated data settings. Simulation results indicate that the performances of imputation methods are different as data settings change. However, imputation based on the kernel regression or the penalized spline performs better in most situations. Utilizing imputation classes or adding random errors improves the performance of imputation methods using nonlinear models.

Imputation of Multiple Missing Values by Normal Mixture Model under Markov Random Field: Application to Imputation of Pixel Values of Color Image (마코프 랜덤 필드 하에서 정규혼합모형에 의한 다중 결측값 대체기법: 색조영상 결측 화소값 대체에 응용)

  • Kim, Seung-Gu
    • Communications for Statistical Applications and Methods
    • /
    • v.16 no.6
    • /
    • pp.925-936
    • /
    • 2009
  • There very many approaches to impute missing values in the iid. case. However, it is hardly found the imputation techniques in the Markov random field(MRF) case. In this paper, we show that the imputation under MRF is just to impute by fitting the normal mixture model(NMM) under several practical assumptions. Our multivariate normal mixture model based approaches under MRF is applied to impute the missing pixel values of 3-variate (R, G, B) color image, providing a technique to smooth the imputed values.

Imputation for Binary or Ordered Categorical Traits Based on the Bayesian Threshold Model (베이지안 분계점 모형에 의한 순서 범주형 변수의 대체)

  • Lee Seung-Chun
    • The Korean Journal of Applied Statistics
    • /
    • v.18 no.3
    • /
    • pp.597-606
    • /
    • 2005
  • The nonresponse in sample survey causes a problem when it comes time to analyze dataset in public-use files where the user has only complete-data methods available and has limited information about the reasons for nonresponse. Recently imputation for nonresponse is becoming a standard approach for handling nonresponse and various imputation methods have been devised . However, most imputation methods concern with continuous traits while many interesting features are measured by binary or ordered categorical scales in sample survey. In this note. an imputation method for ignorable nonresponse in binary or ordered categorical traits is considered.

A study on the imputation solution for missing speed data on UTIS by using adaptive k-NN algorithm (적응형 k-NN 기법을 이용한 UTIS 속도정보 결측값 보정처리에 관한 연구)

  • Kim, Eun-Jeong;Bae, Gwang-Soo;Ahn, Gye-Hyeong;Ki, Yong-Kul;Ahn, Yong-Ju
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.13 no.3
    • /
    • pp.66-77
    • /
    • 2014
  • UTIS(Urban Traffic Information System) directly collects link travel time in urban area by using probe vehicles. Therefore it can estimate more accurate link travel speed compared to other traffic detection systems. However, UTIS includes some missing data caused by the lack of probe vehicles and RSEs on road network, system failures, and other factors. In this study, we suggest a new model, based on k-NN algorithm, for imputing missing data to provide more accurate travel time information. New imputation model is an adaptive k-NN which can flexibly adjust the number of nearest neighbors(NN) depending on the distribution of candidate objects. The evaluation result indicates that the new model successfully imputed missing speed data and significantly reduced the imputation error as compared with other models(ARIMA and etc). We have a plan to use the new imputation model improving traffic information service by applying UTIS Central Traffic Information Center.