• Title/Summary/Keyword: Data imputation

Search Result 201, Processing Time 0.02 seconds

Large tests of independence in incomplete two-way contingency tables using fractional imputation

  • Kang, Shin-Soo;Larsen, Michael D.
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.4
    • /
    • pp.971-984
    • /
    • 2015
  • Imputation procedures fill-in missing values, thereby enabling complete data analyses. Fully efficient fractional imputation (FEFI) and multiple imputation (MI) create multiple versions of the missing observations, thereby reflecting uncertainty about their true values. Methods have been described for hypothesis testing with multiple imputation. Fractional imputation assigns weights to the observed data to compensate for missing values. The focus of this article is the development of tests of independence using FEFI for partially classified two-way contingency tables. Wald and deviance tests of independence under FEFI are proposed. Simulations are used to compare type I error rates and Power. The partially observed marginal information is useful for estimating the joint distribution of cell probabilities, but it is not useful for testing association. FEFI compares favorably to other methods in simulations.

A Missing Data Imputation by Combining K Nearest Neighbor with Maximum Likelihood Estimation for Numerical Software Project Data (K-NN과 최대 우도 추정법을 결합한 소프트웨어 프로젝트 수치 데이터용 결측값 대치법)

  • Lee, Dong-Ho;Yoon, Kyung-A;Bae, Doo-Hwan
    • Journal of KIISE:Software and Applications
    • /
    • v.36 no.4
    • /
    • pp.273-282
    • /
    • 2009
  • Missing data is one of the common problems in building analysis or prediction models using software project data. Missing imputation methods are known to be more effective missing data handling method than deleting methods in small software project data. While K nearest neighbor imputation is a proper missing imputation method in the software project data, it cannot use non-missing information of incomplete project instances. In this paper, we propose an approach to missing data imputation for numerical software project data by combining K nearest neighbor and maximum likelihood estimation; we also extend the average absolute error measure by normalization for accurate evaluation. Our approach overcomes the limitation of K nearest neighbor imputation and outperforms on our real data sets.

Performance Comparison of Classication Methods with the Combinations of the Imputation and Gene Selection Methods

  • Kim, Dong-Uk;Nam, Jin-Hyun;Hong, Kyung-Ha
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.6
    • /
    • pp.1103-1113
    • /
    • 2011
  • Gene expression data is obtained through many stages of an experiment and errors produced during the process may cause missing values. Due to the distinctness of the data so called 'small n large p', genes have to be selected for statistical analysis, like classification analysis. For this reason, imputation and gene selection are important in a microarray data analysis. In the literature, imputation, gene selection and classification analysis have been studied respectively. However, imputation, gene selection and classification analysis are sequential processing. For this aspect, we compare the performance of classification methods after imputation and gene selection methods are applied to microarray data. Numerical simulations are carried out to evaluate the classification methods that use various combinations of the imputation and gene selection methods.

Imputation Procedures in Weibull Regression Analysis in the presence of missing values

  • Kim Soon-kwi;Jeong Bong-Bin
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2001.11a
    • /
    • pp.143-148
    • /
    • 2001
  • A dataset having missing observations is often completed by using imputed values. In this paper the performances and accuracy of complete case methods and four imputation procedures are evaluated when missing values exist only on the response variables in the Weibull regression model. Our simulation results show that compared to other imputation procedures, in particular, hotdeck and Weibull regression imputation procedure can be well used to compensate for missing data. In addition an illustrative real data is given.

  • PDF

Technical Trends of Time-Series Data Imputation (시계열 데이터 결측치 처리 기술 동향)

  • Kim, E.D.;Ko, S.K.;Son, S.C.;Lee, B.T.
    • Electronics and Telecommunications Trends
    • /
    • v.36 no.4
    • /
    • pp.145-153
    • /
    • 2021
  • Data imputation is a crucial issue in data analysis because quality data are highly correlated with the performance of AI models. Particularly, it is difficult to collect quality time-series data for uncertain situations (for example, electricity blackout, delays for network conditions). Thus, it is necessary to research effective methods of time-series data imputation. Many studies on time-series data imputation can be divided into 5 parts, including statistical based, matrix-based, regression-based, deep learning (RNN and GAN) based methodologies. This study reviews and organizes these methodologies. Recently, deep learning-based imputation methods are developed and show excellent performance. However, it is associated to some computational problems that make it difficult to use in real-time system. Thus, the direction of future work is to develop low computational but high-performance imputation methods for application in the real field.

Comparison of Five Single Imputation Methods in General Missing Pattern

  • Kang, Shin-Soo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.15 no.4
    • /
    • pp.945-955
    • /
    • 2004
  • 'Complete-case analysis' is easy to carry out and it may be fine with small amount of missing data. However, this method is not recommended in general because the estimates are usually biased and not efficient. There are numerous alternatives to complete-case analysis. One alternative is the single imputation. Some of the most common single imputation methods are reviewed and the performances are compared by simulation studies.

  • PDF

A New Method for Imputation of Missing Genotype using Linkage Disequilibrium and Haplotype Information (결측치가 존재하는 유전형 자료에서의 연관불균형과 일배체형을 사용한 결측치 대치 방법)

  • Park Yun-Ju;Kim Young-Jin;Park Jung-Sun;Kim Kuchan;Koh Insong;Jung Ho-Youl
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.2
    • /
    • pp.99-107
    • /
    • 2005
  • In this paper, wc propose a now missing imputation method for minimizing loss of information linkage disequilibrium-based and haplotype-based imputation method, which estimate missing values of the data based on the specificity of Single Nucleotide Polymorphism(SNP) genotype data. Method for imputing data is needed to minimize the loss of information caused by experimental missing data. In general, missing imputation of biological data has used major allele imputation method. but this approach is not optima]. 1'his method has high error rates of missing values estimation since the characteristics of the genotype data are not considered not take into consideration the specific structure of the data. In this paper, we show the results of the comparative evaluation of our model methods and major imputation method for the estimation of missing values.

Application of Multiple Imputation Method in Analyzing Data with Missing Continuous Covariates

  • Ghasemizadeh Tamar, S.;Ganjali, M.
    • The Korean Journal of Applied Statistics
    • /
    • v.21 no.4
    • /
    • pp.659-664
    • /
    • 2008
  • Missing continuous covariates are pervasive in the use of generalized linear models for medical data. Multiple imputation is the most common and easy-to-do method of dealing with missing covariate data. However, there are always serious warnings in using this method. There should be concern to make imputed values more proper. In this paper, proper imputation from posterior predictive distribution is developed for implementing with arbitrary priors. We use empirical distribution of the posterior for approximating the posterior predictive distribution, to sample from it. This method is preferable in comparison with a presented imputation method of us which uses a full model to impute missing values using available software. The proposed methods are implemented on glucocorticoid data.

Improvement of Collaborative Filtering Algorithm Using Imputation Methods

  • Jeong, Hyeong-Chul;Kwak, Min-Jung;Noh, Hyun-Ju
    • Journal of the Korean Data and Information Science Society
    • /
    • v.14 no.3
    • /
    • pp.441-450
    • /
    • 2003
  • Collaborative filtering is one of the most widely used methodologies for recommendation system. Collaborative filtering is based on a data matrix of each customer's preferences and frequently, there exits missing data problem. We introduced two imputation approach (multiple imputation via Markov Chain Monte Carlo method and multiple imputation via bootstrap method) to improve the prediction performance of collaborative filtering and evaluated the performance using EachMovie data.

  • PDF

Fully Efficient Fractional Imputation for Incomplete Contingency Tables

  • Kang, Shin-Soo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.15 no.4
    • /
    • pp.993-1002
    • /
    • 2004
  • Imputation procedures such as fully efficient fractional imputation(FEFI) or multiple imputation(MI) can be used to construct complete contingency tables from samples with partially classified responses. Variances of FEFI estimators of population proportions are derived. Simulation results, when data are missing completely at random, reveal that FEFI provides more efficient estimates of population than either multiple imputation(MI) based on data augmentation or complete case analysis, but neither FEFI nor MI provides an improvement over complete-case(CC) analysis with respect to accuracy of estimation of some parameters for association between two variables like $\theta_{i+}\theta_{+i}-\theta_{ij}$ and log odds-ratio.

  • PDF