• Title/Summary/Keyword: missing data estimation method

Search Result 88, Processing Time 0.021 seconds

A Missing Data Imputation by Combining K Nearest Neighbor with Maximum Likelihood Estimation for Numerical Software Project Data (K-NN과 최대 우도 추정법을 결합한 소프트웨어 프로젝트 수치 데이터용 결측값 대치법)

  • Lee, Dong-Ho;Yoon, Kyung-A;Bae, Doo-Hwan
    • Journal of KIISE:Software and Applications
    • /
    • v.36 no.4
    • /
    • pp.273-282
    • /
    • 2009
  • Missing data is one of the common problems in building analysis or prediction models using software project data. Missing imputation methods are known to be more effective missing data handling method than deleting methods in small software project data. While K nearest neighbor imputation is a proper missing imputation method in the software project data, it cannot use non-missing information of incomplete project instances. In this paper, we propose an approach to missing data imputation for numerical software project data by combining K nearest neighbor and maximum likelihood estimation; we also extend the average absolute error measure by normalization for accurate evaluation. Our approach overcomes the limitation of K nearest neighbor imputation and outperforms on our real data sets.

Estimation in the exponential distribution under progressive Type I interval censoring with semi-missing data

  • Shin, Hyejung;Lee, Kwangho
    • Journal of the Korean Data and Information Science Society
    • /
    • v.23 no.6
    • /
    • pp.1271-1277
    • /
    • 2012
  • In this paper, we propose an estimation method of the parameter in an exponential distribution based on a progressive Type I interval censored sample with semi-missing observation. The maximum likelihood estimator (MLE) of the parameter in the exponential distribution cannot be obtained explicitly because the intervals are not equal in length under the progressive Type I interval censored sample with semi-missing data. To obtain the MLE of the parameter for the sampling scheme, we propose a method by which progressive Type I interval censored sample with semi-missing data is converted to the progressive Type II interval censored sample. Consequently, the estimation procedures in the progressive Type II interval censored sample can be applied and we obtain the MLE of the parameter and survival function. It will be shown that the obtained estimators have good performance in terms of the mean square error (MSE) and mean integrated square error (MISE).

EXTENSION OF FACTORING LIKELIHOOD APPROACH TO NON-MONOTONE MISSING DATA

  • Kim, Jae-Kwang
    • Journal of the Korean Statistical Society
    • /
    • v.33 no.4
    • /
    • pp.401-410
    • /
    • 2004
  • We address the problem of parameter estimation in multivariate distributions under ignorable non-monotone missing data. The factoring likelihood method for monotone missing data, termed by Rubin (1974), is extended to a more general case of non-monotone missing data. The proposed method is algebraically equivalent to the Newton-Raphson method for the observed likelihood, but avoids the burden of computing the first and the second partial derivatives of the observed likelihood. Instead, the maximum likelihood estimates and their information matrices for each partition of the data set are computed separately and combined naturally using the generalized least squares method.

Variance estimation for distribution rate in stratified cluster sampling with missing values

  • Heo, Sunyeong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.2
    • /
    • pp.443-449
    • /
    • 2017
  • Estimation of population proportion like the distribution rate of LED TV and the prevalence of a disease are often estimated based on survey sample data. Population proportion is generally considered as a special form of population mean. In complex sampling like stratified multistage sampling with unequal probability sampling, the denominator of mean may be random variable and it is estimated like ratio estimator. In this research, we examined the estimation of distribution rate based on stratified multistage sampling, and determined some numerical outcomes using stratified random sample data with about 25% of missing observations. In the data used for this research, the survey weight was determined by deterministic way. So, the weights are not random variable, and the population distribution rate and its variance estimator can be estimated like population mean estimation. When the weights are not random variable, if one estimates the variance of proportion estimator using ratio method, then the variances may be inflated. Therefore, in estimating variance for population proportion, we need to examine the structure of data and survey design before making any decision for estimation methods.

Compressive sensing-based two-dimensional scattering-center extraction for incomplete RCS data

  • Bae, Ji-Hoon;Kim, Kyung-Tae
    • ETRI Journal
    • /
    • v.42 no.6
    • /
    • pp.815-826
    • /
    • 2020
  • We propose a two-dimensional (2D) scattering-center-extraction (SCE) method using sparse recovery based on the compressive-sensing theory, even with data missing from the received radar cross-section (RCS) dataset. First, using the proposed method, we generate a 2D grid via adaptive discretization that has a considerably smaller size than a fully sampled fine grid. Subsequently, the coarse estimation of 2D scattering centers is performed using both the method of iteratively reweighted least square and a general peak-finding algorithm. Finally, the fine estimation of 2D scattering centers is performed using the orthogonal matching pursuit (OMP) procedure from an adaptively sampled Fourier dictionary. The measured RCS data, as well as simulation data using the point-scatterer model, are used to evaluate the 2D SCE accuracy of the proposed method. The results indicate that the proposed method can achieve higher SCE accuracy for an incomplete RCS dataset with missing data than that achieved by the conventional OMP, basis pursuit, smoothed L0, and existing discrete spectral estimation techniques.

Partitioning likelihood method in the analysis of non-monotone missing data

  • Kim Jae-Kwang
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2004.11a
    • /
    • pp.1-8
    • /
    • 2004
  • We address the problem of parameter estimation in multivariate distributions under ignorable non-monotone missing data. The factoring likelihood method for monotone missing data, termed by Robin (1974), is extended to a more general case of non-monotone missing data. The proposed method is algebraically equivalent to the Newton-Raphson method for the observed likelihood, but avoids the burden of computing the first and the second partial derivatives of the observed likelihood Instead, the maximum likelihood estimates and their information matrices for each partition of the data set are computed separately and combined naturally using the generalized least squares method. A numerical example is also presented to illustrate the method.

  • PDF

A New Method for Imputation of Missing Genotype using Linkage Disequilibrium and Haplotype Information (결측치가 존재하는 유전형 자료에서의 연관불균형과 일배체형을 사용한 결측치 대치 방법)

  • Park Yun-Ju;Kim Young-Jin;Park Jung-Sun;Kim Kuchan;Koh Insong;Jung Ho-Youl
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.2
    • /
    • pp.99-107
    • /
    • 2005
  • In this paper, wc propose a now missing imputation method for minimizing loss of information linkage disequilibrium-based and haplotype-based imputation method, which estimate missing values of the data based on the specificity of Single Nucleotide Polymorphism(SNP) genotype data. Method for imputing data is needed to minimize the loss of information caused by experimental missing data. In general, missing imputation of biological data has used major allele imputation method. but this approach is not optima]. 1'his method has high error rates of missing values estimation since the characteristics of the genotype data are not considered not take into consideration the specific structure of the data. In this paper, we show the results of the comparative evaluation of our model methods and major imputation method for the estimation of missing values.

Development of a Model Combining Covariance Matrices Derived from Spatial and Temporal Data to Estimate Missing Rainfall Data (공간 데이터와 시계열 데이터로부터 유도된 공분산행렬을 결합한 강수량 결측값 추정 모형)

  • Sung, Chan Yong
    • Journal of Environmental Science International
    • /
    • v.22 no.3
    • /
    • pp.303-308
    • /
    • 2013
  • This paper proposed a new method for estimating missing values in time series rainfall data. The proposed method integrated the two most widely used estimation methods, general linear model(GLM) and ordinary kriging(OK), by taking a weighted average of covariance matrices derived from each of the two methods. The proposed method was cross-validated using daily rainfall data at thirteen rain gauges in the Hyeong-san River basin. The goodness-of-fit of the proposed method was higher than those of GLM and OK, which can be attributed to the weighting algorithm that was designed to minimize errors caused by violations of assumptions of the two existing methods. This result suggests that the proposed method is more accurate in missing values in time series rainfall data, especially in a region where the assumptions of existing methods are not met, i.e., rainfall varies by season and topography is heterogeneous.

On statistical Computing via EM Algorithm in Logistic Linear Models Involving Non-ignorable Missing data

  • Jun, Yu-Na;Qian, Guoqi;Park, Jeong-Soo
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2005.11a
    • /
    • pp.181-186
    • /
    • 2005
  • Many data sets obtained from surveys or medical trials often include missing observations. When these data sets are analyzed, it is general to use only complete cases. However, it is possible to have big biases or involve inefficiency. In this paper, we consider a method for estimating parameters in logistic linear models involving non-ignorable missing data mechanism. A binomial response and normal exploratory model for the missing data are used. We fit the model using the EM algorithm. The E-step is derived by Metropolis-hastings algorithm to generate a sample for missing data and Monte-carlo technique, and the M-step is by Newton-Raphson to maximize likelihood function. Asymptotic variances of the MLE's are derived and the standard error and estimates of parameters are compared.

  • PDF

Analysis of Missing Data Using an Empirical Bayesian Method (경험적 베이지안 방법을 이용한 결측자료 연구)

  • Yoon, Yong Hwa;Choi, Boseung
    • The Korean Journal of Applied Statistics
    • /
    • v.27 no.6
    • /
    • pp.1003-1016
    • /
    • 2014
  • Proper missing data imputation is an important procedure to obtain superior results for data analysis based on survey data. This paper deals with both a model based imputation method and model estimation method. We utilized a Bayesian method to solve a boundary solution problem in which we applied a maximum likelihood estimation method. We also deal with a missing mechanism model selection problem using forecasting results and a comparison between model accuracies. We utilized MWPE(modified within precinct error) (Bautista et al., 2007) to measure prediction correctness. We applied proposed ML and Bayesian methods to the Korean presidential election exit poll data of 2012. Based on the analysis, the results under the missing at random mechanism showed superior prediction results than under the missing not at random mechanism.