• Title/Summary/Keyword: missing data imputation

Search Result 143, Processing Time 0.025 seconds

Considering of the Rainfall Effect in Missing Traffic Volume Data Imputation Method (누락교통량자료 보정방법에서 강우의 영향 고려)

  • Kim, Min-Heon;Oh, Ju-Sam
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.14 no.2
    • /
    • pp.1-13
    • /
    • 2015
  • Traffic volume data is basic information that is used in a wide variety of fields. Existing missing traffic volume data imputation method did not take the effect on the rainfall. This research analyzed considering of the rainfall effect in missing traffic volume data imputation method. In order to consider the effect of rainfall, established the following assumption. When missing of traffic volume data generated in rainy days it would be more accurate to use only the traffic volume data of the past rainy days. To confirm this assumption, compared for accuracy of imputed results at three kinds of imputation method(Unconditional Mean, Auto Regression, Expectation-Maximization Algorithm). The analysis results, the case on consideration of the rainfall effect was more low error occurred.

Improvement of Collaborative Filtering Algorithm Using Imputation Methods

  • Jeong, Hyeong-Chul;Kwak, Min-Jung;Noh, Hyun-Ju
    • Journal of the Korean Data and Information Science Society
    • /
    • v.14 no.3
    • /
    • pp.441-450
    • /
    • 2003
  • Collaborative filtering is one of the most widely used methodologies for recommendation system. Collaborative filtering is based on a data matrix of each customer's preferences and frequently, there exits missing data problem. We introduced two imputation approach (multiple imputation via Markov Chain Monte Carlo method and multiple imputation via bootstrap method) to improve the prediction performance of collaborative filtering and evaluated the performance using EachMovie data.

  • PDF

Comparison of missing data methods in clustered survival data using Bayesian adaptive B-Spline estimation

  • Yoo, Hanna;Lee, Jae Won
    • Communications for Statistical Applications and Methods
    • /
    • v.25 no.2
    • /
    • pp.159-172
    • /
    • 2018
  • In many epidemiological studies, missing values in the outcome arise due to censoring. Such censoring is what makes survival analysis special and differentiated from other analytical methods. There are many methods that deal with censored data in survival analysis. However, few studies have dealt with missing covariates in survival data. Furthermore, studies dealing with missing covariates are rare when data are clustered. In this paper, we conducted a simulation study to compare results of several missing data methods when data had clustered multi-structured type with missing covariates. In this study, we modeled unknown baseline hazard and frailty with Bayesian B-Spline to obtain more smooth and accurate estimates. We also used prior information to achieve more accurate results. We assumed the missing mechanism as MAR. We compared the performance of five different missing data techniques and compared these results through simulation studies. We also presented results from a Multi-Center study of Korean IBD patients with Crohn's disease(Lee et al., Journal of the Korean Society of Coloproctology, 28, 188-194, 2012).

REGRESSION FRACTIONAL HOT DECK IMPUTATION

  • Kim, Jae-Kwang
    • Journal of the Korean Statistical Society
    • /
    • v.36 no.3
    • /
    • pp.423-434
    • /
    • 2007
  • Imputation using a regression model is a method to preserve the correlation among variables and to provide imputed point estimators. We discuss the implementation of regression imputation using fractional imputation. By a suitable choice of fractional weights, the fractional regression imputation can take the form of hot deck fractional imputation, thus no artificial values are constructed after the imputation. A variance estimator, which extends the method of Kim and Fuller (2004), is also proposed. Results from a limited simulation study are presented.

A Study on Imputation using Adjusted Cohen Method

  • Chung, Sung-Suk;Chun, Young-Min;Lee, Sun-Kyung
    • Journal of the Korean Data and Information Science Society
    • /
    • v.17 no.3
    • /
    • pp.871-888
    • /
    • 2006
  • Many studies have been done to develop procedures to deal with missing values. Most common method is to reassign the other values to the missing data. The purpose of our study is to suggest adjusted Cohen methods and to compare the efficiency of them with other methods through a simulation study. The adjusted Cohen methods use an auxiliary variable to arrange ranking of the variable with missing values. It leads to a reduced mean square error(MSE) compared with the Cohen method.

  • PDF

A Comparison of BLS Non-Response Adjustment and Cross-Wave Regression Imputation Methods (BLS 무응답 보정법을 이용한 대체법과 이월대체법에 관한 연구)

  • Lee, Sang-Eun;Shin, Key-Il
    • The Korean Journal of Applied Statistics
    • /
    • v.23 no.5
    • /
    • pp.909-921
    • /
    • 2010
  • Cross-wave regression imputation and carry-over imputation method are generally used in the analysis of panel data with missing values. Recently it is known that the BLS non-response adjust method has good statistical properties. In this paper we show that the BLS method can be considered as an imputation method with a similar formula of a ratio-estimator. In addition, we show that the carry-over imputation and BLS imputation are approximately the same under the assumption that data follow a non-stationary process with drift. Small simulation studies and real data analysis are performed. For the real data analysis, a monthly labor statistic (2007) is used.

Comparison of EM with Jackknife Standard Errors and Multiple Imputation Standard Errors

  • Kang, Shin-Soo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.16 no.4
    • /
    • pp.1079-1086
    • /
    • 2005
  • Most discussions of single imputation methods and the EM algorithm concern point estimation of population quantities with missing values. A second concern is how to get standard errors of the point estimates obtained from the filled-in data by single imputation methods and EM algorithm. Now we focus on how to estimate standard errors with incorporating the additional uncertainty due to nonresponse. There are some approaches to account for the additional uncertainty. The general two possible approaches are considered. One is the jackknife method of resampling methods. The other is multiple imputation(MI). These two approaches are reviewed and compared through simulation studies.

  • PDF

A Modified Grey-Based k-NN Approach for Treatment of Missing Value

  • Chun, Young-M.;Lee, Joon-W.;Chung, Sung-S.
    • Journal of the Korean Data and Information Science Society
    • /
    • v.17 no.2
    • /
    • pp.421-436
    • /
    • 2006
  • Huang proposed a grey-based nearest neighbor approach to predict accurately missing attribute value in 2004. Our study proposes which way to decide the number of nearest neighbors using not only the deng's grey relational grade but also the wen's grey relational grade. Besides, our study uses not an arithmetic(unweighted) mean but a weighted one. Also, GRG is used by a weighted value when we impute missing values. There are four different methods - DU, DW, WU, WW. The performance of WW(Wen's GRG & weighted mean) method is the best of any other methods. It had been proven by Huang that his method was much better than mean imputation method and multiple imputation method. The performance of our study is far superior to that of Huang.

  • PDF

Imputation Model for Link Travel Speed Measurement Using UTIS (UTIS 구간통행속도 결측치 보정모델)

  • Ki, Yong-Kul;Ahn, Gye-Hyeong;Kim, Eun-Jeong;Bae, Kwang-Soo
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.10 no.6
    • /
    • pp.63-73
    • /
    • 2011
  • Travel speed is an important parameter for measuring road traffic. UTIS(Urban Traffic Information System) was developed as a mobile detector for measuring link travel speeds in South Korea. After investigation, we founded that UTIS includes some missing data caused by the lack of probe vehicles on road segments, system failures and etc. Imputation is the practice of filling in missing data with estimated values. In this paper, we suggests a new model for imputing missing data to provide accurate link travel speeds to the public. In the field test, new model showed the travel speed measuring accuracy of 93.6%. Therefore, it can be concluded that the proposed model significantly improves travel speed measuring accuracy.

Outlier Filtering and Missing Data Imputation Algorithm using TCS Data (TCS데이터를 이용한 이상치제거 및 결측보정 알고리즘 개발)

  • Do, Myung-Sik;Lee, Hyang-Mee;NamKoong, Seong
    • Journal of Korean Society of Transportation
    • /
    • v.26 no.4
    • /
    • pp.241-250
    • /
    • 2008
  • With the ever-growing amount of traffic, there is an increasing need for good quality travel time information. Various existing outlier filtering and missing data imputation algorithms using AVI data for interrupted and uninterrupted traffic flow have been proposed. This paper is devoted to development of an outlier filtering and missing data imputation algorithm by using Toll Collection System (TCS) data. TCS travel time data collected from August to September 2007 were employed. Travel time data from TCS are made out of records of every passing vehicle; these data have potential for providing real-time travel time information. However, the authors found that as the distance between entry tollgates and exit tollgates increases, the variance of travel time also increases. Also, time gaps appeared in the case of long distances between tollgates. Finally, the authors propose a new method for making representative values after removal of abnormal and "noise" data and after analyzing existing methods. The proposed algorithm is effective.