• Title/Summary/Keyword: Missing Value

Search Result 315, Processing Time 0.028 seconds

On the Use of Sequential Adaptive Nearest Neighbors for Missing Value Imputation (순차 적응 최근접 이웃을 활용한 결측값 대치법)

  • Park, So-Hyun;Bang, Sung-Wan;Jhun, Myoung-Shic
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.6
    • /
    • pp.1249-1257
    • /
    • 2011
  • In this paper, we propose a Sequential Adaptive Nearest Neighbor(SANN) imputation method that combines the Adaptive Nearest Neighbor(ANN) method and the Sequential k-Nearest Neighbor(SKNN) method. When choosing the nearest neighbors of missing observations, the proposed SANN method takes the local feature of the missing observations into account as well as reutilizes the imputed observations in a sequential manner. By using a Monte Carlo study and a real data example, we demonstrate the characteristics of the SANN method and its potential performance.

Handling Incomplete Data Problem in Collaborative Filtering System

  • Noh, Hyun-Ju;Kwak, Min-Jung;Han, In-Goo
    • Journal of Intelligence and Information Systems
    • /
    • v.9 no.2
    • /
    • pp.51-63
    • /
    • 2003
  • Collaborative filtering is one of the methodologies that are most widely used for recommendation system. It is based on a data matrix of each customer's preferences of products. There could be a lot of missing values in such preference data matrix. This incomplete data is one of the reasons to deteriorate the accuracy of recommendation system. There are several treatments to deal with the incomplete data problem such as case deletion and single imputation. Those approaches are simple and easy to implement but they may provide biased results. Multiple imputation method imputes m values for each missing value. It overcomes flaws of single imputation approaches through considering the uncertainty of missing values. The objective of this paper is to suggest multiple imputation-based collaborative filtering approach for recommendation system to improve the accuracy in prediction performance. The experimental works show that the proposed approach provides better performance than the traditional Collaborative filtering approach, especially in case that there are a lot of missing values in dataset used for recommendation system.

  • PDF

Imputation of Medical Data Using Subspace Condition Order Degree Polynomials

  • Silachan, Klaokanlaya;Tantatsanawong, Panjai
    • Journal of Information Processing Systems
    • /
    • v.10 no.3
    • /
    • pp.395-411
    • /
    • 2014
  • Temporal medical data is often collected during patient treatments that require personal analysis. Each observation recorded in the temporal medical data is associated with measurements and time treatments. A major problem in the analysis of temporal medical data are the missing values that are caused, for example, by patients dropping out of a study before completion. Therefore, the imputation of missing data is an important step during pre-processing and can provide useful information before the data is mined. For each patient and each variable, this imputation replaces the missing data with a value drawn from an estimated distribution of that variable. In this paper, we propose a new method, called Newton's finite divided difference polynomial interpolation with condition order degree, for dealing with missing values in temporal medical data related to obesity. We compared the new imputation method with three existing subspace estimation techniques, including the k-nearest neighbor, local least squares, and natural cubic spline approaches. The performance of each approach was then evaluated by using the normalized root mean square error and the statistically significant test results. The experimental results have demonstrated that the proposed method provides the best fit with the smallest error and is more accurate than the other methods.

A Study on Nutrition Intake Related to Food Habit and Family Environmental Factor of High School Girls in Seoul (서울시내 일부 여고생의 食行動 및 家族環境과 관련된 營養攝取樣相 조사연구)

  • Kim, Hyong Ran
    • Journal of Environmental Health Sciences
    • /
    • v.12 no.2
    • /
    • pp.49-66
    • /
    • 1986
  • The purpose of this study was to investigate nutrition intake of high school girls related to food habit, physical status, family environmental factor. The survey of 216 high school girls, aged 15 to 17 years old in Seoul area was conducted from April, 21 to 30, 1986. Food habit and family environmental factor were researched by means of questionnaires and nutrition intake was surveyed. by recording the kinds, amounts and ingredients of foods taken by the girls for two days, and height and weight were also measured during the period. The findings are summarized as follows: 1. Mean value of height and weight of the girls were 157.6cm and 50.9kg. 2. Number of family members per household was 5.2. Mean value of father's age was 47.1 and mean value of mother's age was 43.6. 44.9% of the girls had fathers who graduated the college, 41.6% of the girls had mothers who graduated the high school and 29.2% of the girls had mothers who had the job. 3. Breakfast missing rate was high, most of the reason for breakfast missing was 'have no time to eat' and time for breakfast was short. 64.4% of the girls had meal irregularly. 4. Mean daily intake of all nutrients except vitamin A and riboflavin was higher than Recommended Dietary Allowances. Mean caloric intake was 89.8% of R.D.A.. Breakfast intake of energy and most of nutrients was less than snack. Mean meal balance score was 47.9 and mean food diversity score was 13.4. 5. Mother's education level was related to intake of protein and calcium and height. Breakfast and lunch missing and number of snack intake were related with nutrition intake.

  • PDF

Missing Value Estimation and Sensor Fault Identification using Multivariate Statistical Analysis (다변량 통계 분석을 이용한 결측 데이터의 예측과 센서이상 확인)

  • Lee, Changkyu;Lee, In-Beum
    • Korean Chemical Engineering Research
    • /
    • v.45 no.1
    • /
    • pp.87-92
    • /
    • 2007
  • Recently, developments of process monitoring system in order to detect and diagnose process abnormalities has got the spotlight in process systems engineering. Normal data obtained from processes provide available information of process characteristics to be used for modeling, monitoring, and control. Since modern chemical and environmental processes have high dimensionality, strong correlation, severe dynamics and nonlinearity, it is not easy to analyze a process through model-based approach. To overcome limitations of model-based approach, lots of system engineers and academic researchers have focused on statistical approach combined with multivariable analysis such as principal component analysis (PCA), partial least squares (PLS), and so on. Several multivariate analysis methods have been modified to apply it to a chemical process with specific characteristics such as dynamics, nonlinearity, and so on.This paper discusses about missing value estimation and sensor fault identification based on process variable reconstruction using dynamic PCA and canonical variate analysis.

On the use of weighted adaptive nearest neighbors for missing value imputation (가중 적응 최근접 이웃을 이용한 결측치 대치)

  • Yum, Yunjin;Kim, Dongjae
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.4
    • /
    • pp.507-516
    • /
    • 2018
  • Widely used among the various single imputation methods is k-nearest neighbors (KNN) imputation due to its robustness even when a parametric model such as multivariate normality is not satisfied. We propose a weighted adaptive nearest neighbors imputation method that combines the adaptive nearest neighbors imputation method that accounts for the local features of the data in the KNN imputation method and weighted k-nearest neighbors method that are less sensitive to extreme value or outlier among k-nearest neighbors. We conducted a Monte Carlo simulation study to compare the performance of the proposed imputation method with previous imputation methods.

A Study on the Filter of Restoration for Defective Image (손실 영상을 복원하기 위한 여파기에 관한 연구)

  • Lee, Chang-Hee
    • Korean Journal of Digital Imaging in Medicine
    • /
    • v.10 no.1
    • /
    • pp.41-44
    • /
    • 2008
  • This paper will improve the quality of medical imaging to restore defective pixels on how to present the information you want to increase the efficiency, Using the filter is damaged pixel approximation of the same value to get value, but it is difficult to obtaion. How to get value for the restoration of the original imaged as a way to fill a sweater pattern of missing and how to restore the delta using the filter, compared to the extsting method of excellence.

  • PDF

A Study on Imputing the Missing Values of Continuous Traffic Counts (상시조사 교통량 자료의 결측 보정에 관한 연구)

  • Lee, Sang Hyup;Shin, Jae Myong
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.33 no.5
    • /
    • pp.2009-2019
    • /
    • 2013
  • Traffic volumes are the important basic data which are directly used for transportation network planning, highway design, highway management and so forth. They are collected by two types of collection methods, one of which is the continuous traffic counts and the other is the short duration traffic counts. The continuous traffic counts are conducted for 365 days a year using the permanent traffic counter and the short duration traffic counts are conducted for specific day(s). In case of the continuous traffic counts the missing of data occurs due to breakdown or malfunction of the counter from time to time. Thus, the diverse imputation methods have been developed and applied so far. In this study the applied exponential smoothing method, in which the data from the days before and after the missing day are used, is proposed and compared with other imputation methods. The comparison shows that the applied exponential smoothing method enhances the accuracy of imputation when the coefficient of traffic volume variation is low. In addition, it is verified that the variation of traffic volume at the site is an important factor for the accuracy of imputation. Therefore, it is necessary to apply different imputation methods depending upon site and time to raise the reliability of imputation for missing traffic values.

Missing Data Correction and Noise Level Estimation of Observation Matrix (관측행렬의 손실 데이터 보정과 잡음 레벨 추정 방법)

  • Koh, Sung-shik
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.53 no.3
    • /
    • pp.99-106
    • /
    • 2016
  • In this paper, we will discuss about correction method of missing data on noisy observation matrix and uncertainty analysis for the potential noise. In situations without missing data in an observation matrix, this solution is known to be accurately induced by SVD (Singular Value Decomposition). However, usually the several entries of observation matrix have not been observed and other entries have been perturbed by the influence of noise. In this case, it is difficult to find the solution as well as cause the 3D reconstruction error. Therefore, in order to minimize the 3D reconstruction error, above all things, it is necessary to correct reliably the missing data under noise distribution and to give a quantitative evaluation for the corrected results. This paper focuses on a method for correcting missing data using geometrical properties between 2D projected object and 3D reconstructed shape and for estimating a noise level of the observation matrix using ranks of SVD in order to quantitatively evaluate the performance of the correction algorithm.

A Study on the Index Estimation of Missing Real Estate Transaction Cases Using Machine Learning (머신러닝을 활용한 결측 부동산 매매 지수의 추정에 대한 연구)

  • Kim, Kyung-Min;Kim, Kyuseok;Nam, Daisik
    • Journal of the Economic Geographical Society of Korea
    • /
    • v.25 no.1
    • /
    • pp.171-181
    • /
    • 2022
  • The real estate price index plays key roles as quantitative data in real estate market analysis. International organizations including OECD publish the real estate price indexes by country, and the Korea Real Estate Board announces metropolitan-level and municipal-level indexes. However, when the index is set on the smaller spatial unit level than metropolitan and municipal-level, problems occur: missing values. As the spatial scope is narrowed down, there are cases where there are few or no transactions depending on the unit period, which lead index calculation difficult or even impossible. This study suggests a supervised learning-based machine learning model to compensate for missing values that may occur due to no transaction in a specific range and period. The models proposed in our research verify the accuracy of predicting the existing values and missing values.