• Title/Summary/Keyword: Missing Value

Search Result 315, Processing Time 0.031 seconds

Adjustment System for Outlier and Missing Value using Data Storage (데이터 저장소를 이용한 이상치 및 결측치 보정 시스템)

  • Gwangho Kim;Neunghoe Kim
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.23 no.5
    • /
    • pp.47-53
    • /
    • 2023
  • With the advent of the 4th Industrial Revolution, diverse and a large amount of data has been accumulated now. The agricultural community has also collected environmental data that affects the growth of crops in smart farms or open fields with sensors. Environmental data has different features depending on where and when they are measured. Studies have been conducted using collected agricultural data to predict growth and yield with statistics and artificial intelligence. The results of these studies vary greatly depending on the data on which they are based. So, studies to enhance data quality have also been continuously conducted for performance improvement. A lot of data is required for high performance, but if there are outlier or missing values in the data, it can greatly affect the results even if the amount is sufficient. So, adjustment of outlier and missing values is essential in the data preprocessing. Therefore, this paper integrates data collected from actual farms and proposes a adjustment system for outlier and missing values based on it.

Comparison of Data Reconstruction Methods for Missing Value Imputation (결측값 대체를 위한 데이터 재현 기법 비교)

  • Cheongho Kim;Kee-Hoon Kang
    • The Journal of the Convergence on Culture Technology
    • /
    • v.10 no.1
    • /
    • pp.603-608
    • /
    • 2024
  • Nonresponse and missing values are caused by sample dropouts and avoidance of answers to surveys. In this case, problems with the possibility of information loss and biased reasoning arise, and a replacement of missing values with appropriate values is required. In this paper, as an alternative to missing values imputation, we compare several replacement methods, which use mean, linear regression, random forest, K-nearest neighbor, autoencoder and denoising autoencoder based on deep learning. These methods of imputing missing values are explained, and each method is compared by using continuous simulation data and real data. The comparison results confirm that in most cases, the performance of the random forest imputation method and the denoising autoencoder imputation method are better than the others.

Study on Weather Data Interpolation of a Buoy Based on Machine Learning Techniques (기계 학습을 이용한 항로표지 기상 자료의 보간에 관한 연구)

  • Seong-Hun Jeong;Jun-Ik Ma;Seong-Hyun Jo;Gi-Ryun Lim;Jun-Woo Lee;Jun-Hee Han
    • Proceedings of the Korean Institute of Navigation and Port Research Conference
    • /
    • 2022.06a
    • /
    • pp.72-74
    • /
    • 2022
  • Several types of data are collected from buoy due to the development of hardware technology.. However, the collected data are difficult to use due to errors including missing values and outliers depending on mechanical faults and meteorological environment. Therefore, in this study, linear interpolation is performed by adding the missing time data to enable machine learning to the insufficient meteorological data. After the linear interpolation, XGBoost and KNN-regressor, are used to forecast error data and suggested model is evaluated by using real-world data of a buoy.

  • PDF

Error Concealment Method considering Distance and Direction of Motion Vectors in H.264 (움직임벡터의 거리와 방향성을 고려한 H.264 에러 은닉 방법)

  • Son, Nam-Rye;Lee, Guee-Sang
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.34 no.1C
    • /
    • pp.37-47
    • /
    • 2009
  • When H.264 encoded video streams are transmitted over wireless network, packet loss is unavoidable. Responding on this environment, we propose methods to recover missed motion vector in the decoder: At first, A candidate vector set for missing macroblock is estimated from high correlation coefficient of neighboring motion vectors and missing block vectors the algorithm clusters candidate vectors through distances amongst motion vectors of neighboring blocks. Then the optimal candidate vector is determined by the median value of the clustered motion vector set. In next stage, from the candidate vector set, the final candidate vector of missing block is determined it has minimum distortion value considering directions of neighboring pixels' boundary. Test results showed that the proposed algorithm decreases the candidate motion vectors $23{\sim}61%$ and reduces $3{\sim}4sec$ on average processing(decoding) time comparing the existing H.264 codec. The PSNR, in terms of visual quality is similar to existing methods.

Reconstruction and Change Analysis for Temporal Series of Remotely-sensed Data (연속 원격탐사 영상자료의 재구축과 변화 탐지)

  • 이상훈
    • Korean Journal of Remote Sensing
    • /
    • v.18 no.2
    • /
    • pp.117-125
    • /
    • 2002
  • Multitemporal analysis with remotely sensed data is complicated by numerous intervening factors, including atmospheric attenuation and occurrence of clouds that obscure the relationship between ground and satellite observed spectral measurements. Using an adaptive reconstruction system, dynamic compositing approach was developed to recover missing/bad observations. The reconstruction method incorporates temporal variation in physical properties of targets and anisotropic spatial optical properties into image processing. The adaptive system performs the dynamic compositing by obtaining a composite image as a weighted sum of the observed value and the value predicted according to local temporal trend. The proposed system was applied to the sequence of NDVI images of AVHRR observed on the Korean Peninsula from 1999 year to 2000 year. The experiment shows that the reconstructed series can be used as an estimated series with complete data for the observations including bad/missing values. Additionally, the gradient image, which represents the amount of temporal change at the corresponding time, was generated by the proposed system. It shows more clearly temporal variation than the data image series.

Missing Values Estimation for Time Course Gene Expression Data Using the Sequential Partial Least Squares Regression Fitting (순차적 부분최소제곱 회귀적합에 의한 시간경로 유전자 발현 자료의 결측치 추정)

  • Kim, Kyung-Sook;Oh, Mi-Ra;Baek, Jang-Sun;Son, Young-Sook
    • The Korean Journal of Applied Statistics
    • /
    • v.21 no.2
    • /
    • pp.275-290
    • /
    • 2008
  • The size of microarray gene expression data is very big and its observation process is also very complex. Thus missing values are frequently occurred. In this paper we propose the sequential partial least squares(SPLS) regression fitting method to estimate missing values for time course gene expression data that has correlations among observations over time points. The SPLS method is to combine the sequential technique with the partial least squares(PLS) regression fitting method. The usefulness of method proposed is evaluated through some simulation study for three yeast time course data.

Missing Pattern Matching of Rough Set Based on Attribute Variations Minimization in Rough Set (속성 변동 최소화에 의한 러프집합 누락 패턴 부합)

  • Lee, Young-Cheon
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.10 no.6
    • /
    • pp.683-690
    • /
    • 2015
  • In Rough set, attribute missing values have several problems such as reduct and core estimation. Further, they do not give some discernable pattern for decision tree construction. Now, there are several methods such as substitutions of typical attribute values, assignment of every possible value, event covering, C4.5 and special LEMS algorithm. However, they are mainly substitutions into frequently appearing values or common attribute ones. Thus, decision rules with high information loss are derived in case that important attribute values are missing in pattern matching. In particular, there is difficult to implement cross validation of the decision rules. In this paper we suggest new method for substituting the missing attribute values into high information gain by using entropy variation among given attributes, and thereby completing the information table. The suggested method is validated by conducting the same rough set analysis on the incomplete information system using the software ROSE.

Robust multiple imputation method for missings with boundary and outliers (한계와 이상치가 있는 결측치의 로버스트 다중대체 방법)

  • Park, Yousung;Oh, Do Young;Kwon, Tae Yeon
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.6
    • /
    • pp.889-898
    • /
    • 2019
  • The problem of missing value imputation for variables in surveys that include item missing becomes complicated if outliers and logical boundary conditions between other survey items cannot be ignored. If there are outliers and boundaries in a variable including missing values, imputed values based on previous regression-based imputation methods are likely to be biased and not meet boundary conditions. In this paper, we approach these difficulties in imputation by combining various robust regression models and multiple imputation methods. Through a simulation study on various scenarios of outliers and boundaries, we find and discuss the optimal combination of robust regression and multiple imputation method.

Missing values imputation for time course gene expression data using the pattern consistency index adaptive nearest neighbors (시간경로 유전자 발현자료에서 패턴일치지수와 적응 최근접 이웃을 활용한 결측값 대치법)

  • Shin, Heyseo;Kim, Dongjae
    • The Korean Journal of Applied Statistics
    • /
    • v.33 no.3
    • /
    • pp.269-280
    • /
    • 2020
  • Time course gene expression data is a large amount of data observed over time in microarray experiments. This data can also simultaneously identify the level of gene expression. However, the experiment process is complex, resulting in frequent missing values due to various causes. In this paper, we propose a pattern consistency index adaptive nearest neighbors as a method of missing value imputation. This method combines the adaptive nearest neighbors (ANN) method that reflects local characteristics and the pattern consistency index that considers consistent degree for gene expression between observations over time points. We conducted a Monte Carlo simulation study to evaluate the usefulness of proposed the pattern consistency index adaptive nearest neighbors (PANN) method for two yeast time course data.

Recovery of Missing Motion Vectors Using Modified ALA Clustering Algorithm (수정된 ALA 클러스터링 알고리즘을 이용한 손실된 움직임 벡터 복원 방법)

  • Son, Nam-Rye;Lee, Guee-Sang
    • The KIPS Transactions:PartB
    • /
    • v.12B no.7 s.103
    • /
    • pp.755-760
    • /
    • 2005
  • To transmit a video bit stream over low bandwith, such as mobile, channels, encoding algorithms for high bit rate like H.263+ are used. In transmitting video bit-streams, packet losses cause severe degradation in image quality. This paper proposes a new algorithm for the recovery of missing or erroneous motion vectors when H.263+ bit-stream is transmitted. Considering that the missing or erroneous motion vectors are closely related with those of neighboring blocks, this paper proposes a temporal-spatial error concealment algorithm. The proposed approach is that missing or erroneous Motion Vectors(MVs) are recovered by clustering the movements of neighboring blocks by their homogeneity. MVs of neighboring blocks we clustered according to ALA(Average Linkage Algorithm) clustering and a representative value for each cluster is determined to obtain the candidate MV set. By computing the distortion of the candidates, a MV with the minimum distortion is selected. Experimental results show that the proposed algorithm exhibits better performance in subjective and objective evaluation than existing methods.