• Title/Summary/Keyword: MissingData

Search Result 1,296, Processing Time 0.025 seconds

Estimation using response probability when missing data happen on the second occasion

  • Park, Hyeonah;Na, Seongryong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.1
    • /
    • pp.263-269
    • /
    • 2014
  • When the loss of samples appears under repeated surveys, new samples can often replace missing values. Estimators using response probability can be considered under repeated surveys on two occasions where new samples are selected instead of missing data on the second occasion. We propose a new estimator that uses both respondents and new samples on the second occasion. It is considered for the simulation setting that missing values can happen at the second occasion and are replaced by new samples. We can see that the proposed estimator is more efficient than that using a weighting adjustment method for respondents at the second occasion.

A New Estimation Model for Wireless Sensor Networks Based on the Spatial-Temporal Correlation Analysis

  • Ren, Xiaojun;Sug, HyonTai;Lee, HoonJae
    • Journal of information and communication convergence engineering
    • /
    • v.13 no.2
    • /
    • pp.105-112
    • /
    • 2015
  • The estimation of missing sensor values is an important problem in sensor network applications, but the existing approaches have some limitations, such as the limitations of application scope and estimation accuracy. Therefore, in this paper, we propose a new estimation model based on a spatial-temporal correlation analysis (STCAM). STCAM can make full use of spatial and temporal correlations and can recognize whether the sensor parameters have a spatial correlation or a temporal correlation, and whether the missing sensor data are continuous. According to the recognition results, STCAM can choose one of the most suitable algorithms from among linear interpolation algorithm of temporal correlation analysis (TCA-LI), multiple regression algorithm of temporal correlation analysis (TCA-MR), spatial correlation analysis (SCA), spatial-temporal correlation analysis (STCA) to estimate the missing sensor data. STCAM was evaluated over Intel lab dataset and a traffic dataset, and the simulation experiment results show that STCAM has good estimation accuracy.

A Sparse Data Preprocessing Using Support Vector Regression (Support Vector Regression을 이용한 희소 데이터의 전처리)

  • Jun, Sung-Hae;Park, Jung-Eun;Oh, Kyung-Whan
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.14 no.6
    • /
    • pp.789-792
    • /
    • 2004
  • In various fields as web mining, bioinformatics, statistical data analysis, and so forth, very diversely missing values are found. These values make training data to be sparse. Largely, the missing values are replaced by predicted values using mean and mode. We can used the advanced missing value imputation methods as conditional mean, tree method, and Markov Chain Monte Carlo algorithm. But general imputation models have the property that their predictive accuracy is decreased according to increase the ratio of missing in training data. Moreover the number of available imputations is limited by increasing missing ratio. To settle this problem, we proposed statistical learning theory to preprocess for missing values. Our statistical learning theory is the support vector regression by Vapnik. The proposed method can be applied to sparsely training data. We verified the performance of our model using the data sets from UCI machine learning repository.

A Study on the Influence of a Missing Cell in a Class of Central Composite Designs

  • Park, Sung-Hyun;Noh, Hyun-Gon
    • Journal of the Korean Statistical Society
    • /
    • v.27 no.1
    • /
    • pp.133-152
    • /
    • 1998
  • The central composite design is widely used in the response surface analysis, because it can fit the second order model with small experimental points. In practice, the experimental data are not always obtained on all the points. When there are missing observations, many problems due to the missing cells can occur. In this paper, the influence of a missing cell on the central composite design is discussed. First, the influences of a missing cell on the variances of estimated regression coefficents are compared as $\alpha$ varies. Second, how the average predition variance is affected by a missing sell is discussed. And the influence on rotatability is investigated. Third, the influence of a missing cell on optimality, especially on D-optimality and A-optimality, is examined.

  • PDF

Application of discrete Weibull regression model with multiple imputation

  • Yoo, Hanna
    • Communications for Statistical Applications and Methods
    • /
    • v.26 no.3
    • /
    • pp.325-336
    • /
    • 2019
  • In this article we extend the discrete Weibull regression model in the presence of missing data. Discrete Weibull regression models can be adapted to various type of dispersion data however, it is not widely used. Recently Yoo (Journal of the Korean Data and Information Science Society, 30, 11-22, 2019) adapted the discrete Weibull regression model using single imputation. We extend their studies by using multiple imputation also with several various settings and compare the results. The purpose of this study is to address the merit of using multiple imputation in the presence of missing data in discrete count data. We analyzed the seventh Korean National Health and Nutrition Examination Survey (KNHANES VII), from 2016 to assess the factors influencing the variable, 1 month hospital stay, and we compared the results using discrete Weibull regression model with those of Poisson, negative Binomial and zero-inflated Poisson regression models, which are widely used in count data analyses. The results showed that the discrete Weibull regression model using multiple imputation provided the best fit. We also performed simulation studies to show the accuracy of the discrete Weibull regression using multiple imputation given both under- and over-dispersed distribution, as well as varying missing rates and sample size. Sensitivity analysis showed the influence of mis-specification and the robustness of the discrete Weibull model. Using imputation with discrete Weibull regression to analyze discrete data will increase explanatory power and is widely applicable to various types of dispersion data with a unified model.

A Study on the Imputation for Missing Data in Dual-loop Vehicle Detector System (차량 검지자료 결측 보정처리에 관한 연구 (이력자료 활용방안을 중심으로))

  • Kim, Jeong-Yeon;Lee, Yeong-In;Baek, Seung-Geol;Nam, Gung-Seong
    • Journal of Korean Society of Transportation
    • /
    • v.24 no.7 s.93
    • /
    • pp.27-40
    • /
    • 2006
  • The traffic information is provided, which based on the volume of traffic, speed, occupancy collected through the currently operating Vehicle Detector System(VDS). In addition to the trend in utilization fold of traffic information is increasing gradually with the applied various fields and users. Missing data in Vehicle detector data means series of data transmitted to controller without specific property. The missing data does not have a data property, so excluded at the whole data Process Hence, increasing ratio of missing data in VDS data inflicts unreliable representation of actual traffic situation. This study presented the imputation process due out which applied the methodologies that utilized adjacent stations reference and historical data utilize about missing data. Applied imputation process methodologies to VDS data or SeoHaeAn/Kyongbu Expressway, currently operation VDS, after processes at missing data ratio of an option. Imputation process held presented to per lane-30seconds-period, and morning/afternoon/daily time scope ranges classified, and analyzed an error of imputed data preparing for actual data. The analysis results, an low error occurred relatively in the results of the imputation process way that utilized a historical data compare with adjacent stations reference methods.

Detection and Correction Method of Erroneous Data Using Quantile Pattern and LSTM

  • Hwang, Chulhyun;Kim, Hosung;Jung, Hoekyung
    • Journal of information and communication convergence engineering
    • /
    • v.16 no.4
    • /
    • pp.242-247
    • /
    • 2018
  • The data of K-Water waterworks is collected from various sensors and used as basic data for the operation and analysis of various devices. In this way, the importance of the sensor data is very high, but it contains misleading data due to the characteristics of the sensor in the external environment. However, the cleansing method for the missing data is concentrated on the prediction of the missing data, so the research on the detection and prediction method of the missing data is poor. This is a study to detect wrong data by converting collected data into quintiles and patterning them. It is confirmed that the accuracy of detecting false data intentionally generated from real data is higher than that of the conventional method in all cases. Future research we will prove the proposed system's efficiency and accuracy in various environments.

Adjustment System for Outlier and Missing Value using Data Storage (데이터 저장소를 이용한 이상치 및 결측치 보정 시스템)

  • Gwangho Kim;Neunghoe Kim
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.23 no.5
    • /
    • pp.47-53
    • /
    • 2023
  • With the advent of the 4th Industrial Revolution, diverse and a large amount of data has been accumulated now. The agricultural community has also collected environmental data that affects the growth of crops in smart farms or open fields with sensors. Environmental data has different features depending on where and when they are measured. Studies have been conducted using collected agricultural data to predict growth and yield with statistics and artificial intelligence. The results of these studies vary greatly depending on the data on which they are based. So, studies to enhance data quality have also been continuously conducted for performance improvement. A lot of data is required for high performance, but if there are outlier or missing values in the data, it can greatly affect the results even if the amount is sufficient. So, adjustment of outlier and missing values is essential in the data preprocessing. Therefore, this paper integrates data collected from actual farms and proposes a adjustment system for outlier and missing values based on it.

Imputation of Medical Data Using Subspace Condition Order Degree Polynomials

  • Silachan, Klaokanlaya;Tantatsanawong, Panjai
    • Journal of Information Processing Systems
    • /
    • v.10 no.3
    • /
    • pp.395-411
    • /
    • 2014
  • Temporal medical data is often collected during patient treatments that require personal analysis. Each observation recorded in the temporal medical data is associated with measurements and time treatments. A major problem in the analysis of temporal medical data are the missing values that are caused, for example, by patients dropping out of a study before completion. Therefore, the imputation of missing data is an important step during pre-processing and can provide useful information before the data is mined. For each patient and each variable, this imputation replaces the missing data with a value drawn from an estimated distribution of that variable. In this paper, we propose a new method, called Newton's finite divided difference polynomial interpolation with condition order degree, for dealing with missing values in temporal medical data related to obesity. We compared the new imputation method with three existing subspace estimation techniques, including the k-nearest neighbor, local least squares, and natural cubic spline approaches. The performance of each approach was then evaluated by using the normalized root mean square error and the statistically significant test results. The experimental results have demonstrated that the proposed method provides the best fit with the smallest error and is more accurate than the other methods.

The study on error, missing data and imputation of the smart card data for the transit OD construction (대중교통 OD구축을 위한 대중교통카드 데이터의 오류와 결측 분석 및 보정에 관한 연구)

  • Park, Jun-Hwan;Kim, Soon-Gwan;Cho, Chong-Suk;Heo, Min-Wook
    • Journal of Korean Society of Transportation
    • /
    • v.26 no.2
    • /
    • pp.109-119
    • /
    • 2008
  • The number of card users has grown steadily after the adaption of smart card. Considering the diverse information from smart card data, the increase of card usage rate leads to various useful implications meaning in travel pattern analysis and transportation policy. One of the most important implications is the possibility that the data enables us to generate transit O/D tables easily. In the case of generating transit O/D tables from smart card data, it is necessary to filter data error and/or data missing. Also, the correction of data missing is an important procedure. In this study, it is examined to compute the level of data error and data missing, and to correct data missing for transit O/D generation.