• Title/Summary/Keyword: Data imputation

Search Result 199, Processing Time 0.019 seconds

Regression Analysis of Doubly censored data using Gibbs Sampler for the Incubation period

  • Yoo Hanna;Lee Jae Won
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2004.11a
    • /
    • pp.237-241
    • /
    • 2004
  • In standard time-to-event or survival analysis, the occurrence times of the event of interest are observed exactly or are right-censored. However in certain situations such as the AIDS data, the incubation period which is the time between HIV infection time and the diagnosis of AIDS is usually doubly censored. That is the HIV infection time Is interval censored and also the time of the diagnosis of AIDS is right censored. In this paper, we Impute the Interval censored infection time using the conditional mean imputation and estimate the coefficient factor of the regression analysis for the incubation period using Gibbs sampler. We applied parametric and semi-parametric methods for the analysis of the Incubation period and compared the results.

  • PDF

A modified estimating equation for a binary time varying covariate with an interval censored changing time

  • Kim, Yang-Jin
    • Communications for Statistical Applications and Methods
    • /
    • v.23 no.4
    • /
    • pp.335-341
    • /
    • 2016
  • Interval censored failure time data often occurs in an observational study where a subject is followed periodically. Instead of observing an exact failure time, two inspection times that include it are made available. Several methods have been suggested to analyze interval censored failure time data (Sun, 2006). In this article, we are concerned with a binary time-varying covariate whose changing time is interval censored. A modified estimating equation is proposed by extending the approach suggested in the presence of a missing covariate. Based on simulation results, the proposed method shows a better performance than other simple imputation methods. ACTG 181 dataset were analyzed as a real example.

Cluster Analysis of Incomplete Microarray Data with Fuzzy Clustering

  • Kim, Dae-Won
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.17 no.3
    • /
    • pp.397-402
    • /
    • 2007
  • In this paper, we present a method for clustering incomplete Microarray data using alternating optimization in which a prior imputation method is not required. To reduce the influence of imputation in preprocessing, we take an alternative optimization approach to find better estimates during iterative clustering process. This method improves the estimates of missing values by exploiting the cluster Information such as cluster centroids and all available non-missing values in each iteration. The clustering results of the proposed method are more significantly relevant to the biological gene annotations than those of other methods, indicating its effectiveness and potential for clustering incomplete gene expression data.

A Study on Shape Variability in Canonical Correlation Biplot with Missing Values (결측값이 있는 정준상관 행렬도의 형상변동 연구)

  • Hong, Hyun-Uk;Choi, Yong-Seok;Shin, Sang-Min;Ka, Chang-Wan
    • The Korean Journal of Applied Statistics
    • /
    • v.23 no.5
    • /
    • pp.955-966
    • /
    • 2010
  • Canonical correlation biplot is a useful biplot for giving a graphical description of the data matrix which consists of the association between two sets of variables, for detecting patterns and displaying results found by more formal methods of analysis. Nevertheless, when some values are missing in data, most biplots are not directly applicable. To solve this problem, we estimate the missing data using the median, mean, EM algorithm and MCMC imputation methods according to missing rates. Even though we estimate the missing values of biplot of incomplete data, we have different shapes of biplots according to the imputation methods and missing rates. Therefore we use a RMS(root mean square) which was proposed by Shin et al. (2007) and PS(procrustes statistic) for measuring and comparing the shape variability between the original biplots and the estimated biplots.

Modelling Missing Traffic Volume Data using Circular Probability Distribution (순환확률분포를 이용한 교통량 결측자료 보정 모형)

  • Kim, Hyeon-Seok;Im, Gang-Won;Lee, Yeong-In;Nam, Du-Hui
    • Journal of Korean Society of Transportation
    • /
    • v.25 no.4
    • /
    • pp.109-121
    • /
    • 2007
  • In this study, an imputation model using circular probability distribution was developed in order to overcome problems of missing data from a traffic survey. The existing ad-hoc or heuristic, model-based and algorithm-based imputation techniques were reviewed through previous studies, and then their limitations for imputing missing traffic volume data were revealed. The statistical computing language 'R' was employed for model construction, and a mixture of von Mises probability distribution, which is classified as symmetric, and unimodal circular probability were finally fitted on the basis of traffic volume data at survey stations in urban and rural areas, respectively. The circular probability distribution model largely proved to outperform a dummy variable regression model in regards to various evaluation conditions. It turned out that circular probability distribution models depict circularity of hourly volumes well and are very cost-effective and robust to changes in missing mechanisms.

Structural health monitoring data reconstruction of a concrete cable-stayed bridge based on wavelet multi-resolution analysis and support vector machine

  • Ye, X.W.;Su, Y.H.;Xi, P.S.;Liu, H.
    • Computers and Concrete
    • /
    • v.20 no.5
    • /
    • pp.555-562
    • /
    • 2017
  • The accuracy and integrity of stress data acquired by bridge heath monitoring system is of significant importance for bridge safety assessment. However, the missing and abnormal data are inevitably existed in a realistic monitoring system. This paper presents a data reconstruction approach for bridge heath monitoring based on the wavelet multi-resolution analysis and support vector machine (SVM). The proposed method has been applied for data imputation based on the recorded data by the structural health monitoring (SHM) system instrumented on a prestressed concrete cable-stayed bridge. The effectiveness and accuracy of the proposed wavelet-based SVM prediction method is examined by comparing with the traditional autoregression moving average (ARMA) method and SVM prediction method without wavelet multi-resolution analysis in accordance with the prediction errors. The data reconstruction analysis based on 5-day and 1-day continuous stress history data with obvious preternatural signals is performed to examine the effect of sample size on the accuracy of data reconstruction. The results indicate that the proposed data reconstruction approach based on wavelet multi-resolution analysis and SVM is an effective tool for missing data imputation or preternatural signal replacement, which can serve as a solid foundation for the purpose of accurately evaluating the safety of bridge structures.

Comparison of missing data methods in clustered survival data using Bayesian adaptive B-Spline estimation

  • Yoo, Hanna;Lee, Jae Won
    • Communications for Statistical Applications and Methods
    • /
    • v.25 no.2
    • /
    • pp.159-172
    • /
    • 2018
  • In many epidemiological studies, missing values in the outcome arise due to censoring. Such censoring is what makes survival analysis special and differentiated from other analytical methods. There are many methods that deal with censored data in survival analysis. However, few studies have dealt with missing covariates in survival data. Furthermore, studies dealing with missing covariates are rare when data are clustered. In this paper, we conducted a simulation study to compare results of several missing data methods when data had clustered multi-structured type with missing covariates. In this study, we modeled unknown baseline hazard and frailty with Bayesian B-Spline to obtain more smooth and accurate estimates. We also used prior information to achieve more accurate results. We assumed the missing mechanism as MAR. We compared the performance of five different missing data techniques and compared these results through simulation studies. We also presented results from a Multi-Center study of Korean IBD patients with Crohn's disease(Lee et al., Journal of the Korean Society of Coloproctology, 28, 188-194, 2012).

A Research for Imputation Method of Photovoltaic Power Missing Data to Apply Time Series Models (태양광 발전량 데이터의 시계열 모델 적용을 위한 결측치 보간 방법 연구)

  • Jeong, Ha-Young;Hong, Seok-Hoon;Jeon, Jae-Sung;Lim, Su-Chang;Kim, Jong-Chan;Park, Chul-Young
    • Journal of Korea Multimedia Society
    • /
    • v.24 no.9
    • /
    • pp.1251-1260
    • /
    • 2021
  • This paper discusses missing data processing using simple moving average (SMA) and kalman filter. Also SMA and kalman predictive value are made a comparative study. Time series analysis is a generally method to deals with time series data in photovoltaic field. Photovoltaic system records data irregularly whenever the power value changes. Irregularly recorded data must be transferred into a consistent format to get accurate results. Missing data results from the process having same intervals. For the reason, it was imputed using SMA and kalman filter. The kalman filter has better performance to observed data than SMA. SMA graph is stepped line graph and kalman filter graph is a smoothing line graph. MAPE of SMA prediction is 0.00737%, MAPE of kalman prediction is 0.00078%. But time complexity of SMA is O(N) and time complexity of kalman filter is O(D2) about D-dimensional object. Accordingly we suggest that you pick the best way considering computational power.

데이터 퓨전 : 개념, 문제, 대안

  • 한상훈;하덕주;최종후
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2004.11a
    • /
    • pp.277-281
    • /
    • 2004
  • 최근 마케팅 현업에서 마이크로 마케팅(Micro Marketing)이 마케팅 기법의 화두로 등장하면서 데이터 퓨전(Data Fusion) 또는 데이터 인리치먼트(Data Enrichment)가 각광받는 영역으로 등장하고 있다. 본 연구에서는 데이터 퓨전의 개념과 그를 둘러싸고 있는 통계적 문제와 그 대안에 대하여 논의한다.

  • PDF

Modified BLS Weight Adjustment (수정된 BLS 가중치보정법)

  • Park, Jung-Joon;Cho, Ki-Jong;Lee, Sang-Eun;Shin, Key-Il
    • Communications for Statistical Applications and Methods
    • /
    • v.18 no.3
    • /
    • pp.367-376
    • /
    • 2011
  • BLS weight adjustment is a widely used method for business surveys with non-responses and outliers. Recent surveys show that the non-response weight adjustment of the BLS method is the same as the ratio imputation method. In this paper, we suggested a modified BLS weight adjustment method by imputing missing values instead of using weight adjustment for non-response. Monthly labor survey data is used for a small Monte-Carlo simulation and we conclude that the suggested method is superior to the original BLS weight adjustment method.