• Title/Summary/Keyword: 결측정보

Search Result 137, Processing Time 0.032 seconds

A Design of Behavior Recognition method through GAN-based skeleton data generation (GAN 기반 관절 데이터 생성을 통한 행동 인식 방법 설계)

  • Kim, Jinah;Moon, Nammee
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2022.11a
    • /
    • pp.592-593
    • /
    • 2022
  • 다중 데이터 기반의 행동 인식 과정에서 데이터 수집 반경이 비교적 제한되는 영상 데이터의 결측에 대한 보완이 요구된다. 본 논문에서는 6축 센서 데이터를 이용하여 결측된 영상 데이터를 생성함으로써 행동 인식의 성능을 개선하는 방법을 제안한다. 가속도와 자이로 센서로부터 수집된 행동 데이터를 이용하여 GAN(Generative Adversarial Network)을 통해 영상에서의 관절(Skeleton) 움직임에 대한 데이터를 생성하고자 한다. 이를 위해 DeepLabCut 기반 모델 학습을 통해 관절 좌표를 추출하며, 전처리된 센서 시퀀스 데이터를 가지고 GRU 기반 GAN 모델을 통해 관절 좌표에 대한 영상 시퀀스 데이터를 생성한다. 생성된 영상 시퀀스 데이터는 영상 데이터의 결측이 발생했을 때 대신 행동 인식 모델의 입력값으로 활용될 수 있어 성능 향상을 기대할 수 있다.

A comparison of imputation methods using nonlinear models (비선형 모델을 이용한 결측 대체 방법 비교)

  • Kim, Hyein;Song, Juwon
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.4
    • /
    • pp.543-559
    • /
    • 2019
  • Data often include missing values due to various reasons. If the missing data mechanism is not MCAR, analysis based on fully observed cases may an estimation cause bias and decrease the precision of the estimate since partially observed cases are excluded. Especially when data include many variables, missing values cause more serious problems. Many imputation techniques are suggested to overcome this difficulty. However, imputation methods using parametric models may not fit well with real data which do not satisfy model assumptions. In this study, we review imputation methods using nonlinear models such as kernel, resampling, and spline methods which are robust on model assumptions. In addition, we suggest utilizing imputation classes to improve imputation accuracy or adding random errors to correctly estimate the variance of the estimates in nonlinear imputation models. Performances of imputation methods using nonlinear models are compared under various simulated data settings. Simulation results indicate that the performances of imputation methods are different as data settings change. However, imputation based on the kernel regression or the penalized spline performs better in most situations. Utilizing imputation classes or adding random errors improves the performance of imputation methods using nonlinear models.

The Formation of Missing Data through Heavy Rain Damage of the Hydrological Gauging Instrument (수문관측기기 호우피해에 따른 결측자료의 생성)

  • Kim, Dong-Phil;Lee, Dong-Ryul
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2012.05a
    • /
    • pp.305-309
    • /
    • 2012
  • 한국건설기술연구원의 주요사업인 "산지하천 유역의 홍수예측을 위한 수문조사(2011~2015년)"의 Test-bed 유역중의 하나인 설마천 유역(경기도 파주시 적성면)의 중류부에 위치한 사방교 수위관측소는 2011년 7월 중부지방의 집중호우에 의해 수위관측소가 유실되는 초유의 사태가 발생한 바 있다. 이에 따라 수위관측기기와 부대시설물이 모두 피해를 입었으며, 그 이후 수위관측소 운영이 중단되었다. 다만 유역출구인 전적비교 수위관측소는 일부분 피해가 있었으나, 정상적인 기능을 유지하여 결측이 없는 운영이 이루어졌다. 2011년 사방교 수위관측소의 수위관측 자료는 호우피해 발생 이전 관측자료 수집이 이루어진 2011년 7월 8일 13:30분까지 자료가 있으며, 그 이후는 호우피해에 의한 관측기기의 유실로 모두 결측이다. 따라서 본 연구는 2011년 7월 8일 13:30분 이후 관측이 이루어지지 못한 사방교 수위관측소의 유량자료를 모의 생성한 후 유역출구인 전적비교 수위관측소의 유량자료와 상 하류 유량 및 유출률 검토를 통하여 미관측기간의 결측자료를 최종 생성하였다. 최종 모의 생성된 유량자료는 2011년 이전의 유역의 수문학적 특성과 매우 유사한 경향을 보이므로 모의 생성된 결과는 매우 양호한 것으로 판단된다. 이러한 결과는 설마천 유역의 지속적인 운영과 아울러 6개 우량관측소의 우량자료와(2011년 7월 집중호우에 의해 1개 우량관측소 유실 발생), 유역출구인 전적비교 수위관측소의 신뢰성 높은 수위, 유량자료가 있었기에 가능한 일이다. 설마천 유역 수문자료는 홈페이지(http://seolmacheon.kict.re.kr)와 동시에 운영되는 '설마천-차탄천 수문정보시스템'을 통해 유역정보 및 자료를 저장하고 있으며, 제공을 통해 자료를 공유하고 있다.

  • PDF

Proposal to Supplement the Missing Values of Air Pollution Levels in Meteorological Dataset (기상 데이터에서 대기 오염도 요소의 결측치 보완 기법 제안)

  • Jo, Dong-Chol;Hahn, Hee-Il
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.21 no.1
    • /
    • pp.181-187
    • /
    • 2021
  • Recently, various air pollution factors have been measured and analyzed to reduce damages caused by it. In this process, many missing values occur due to various causes. To compensate for this, basically a vast amount of training data is required. This paper proposes a statistical techniques that effectively compensates for missing values generated in the process of measuring ozone, carbon dioxide, and ultra-fine dust using a small amount of learning data. The proposed algorithm first extracts a group of meteorological data that is expected to have positive effects on the correction of missing values through statistical information analysis such as the correlation between meteorological data and air pollution level factors, p-value, etc. It is a technique that efficiently and effectively compensates for missing values by analyzing them. In order to confirm the performance of the proposed algorithm, we analyze its characteristics through various experiments and compare the performance of the well-known representative algorithms with ours.

Missing Imputation Methods Using the Spatial Variable in Sample Survey (표본조사에서 공간 변수(SPATIAL VARIABLE)를 이용한 결측 대체(MISSING IMPUTATION)의 효율성 비교)

  • Lee Jin-Hee;Kim Jin;Lee Kee-Jae
    • The Korean Journal of Applied Statistics
    • /
    • v.19 no.1
    • /
    • pp.57-67
    • /
    • 2006
  • In sampling survey, nonresponse tend to occur inevitably. If we use information from respondents only, the estimates will be baised. To overcome this, various non-response imputation methods have been studied. If there are few auxiliary variables for replacing missing imputation or spatial autocorrelation exists between respondents and nonrespondents, spatial autocorrelation can be used for missing imputation. In this paper, we apply several nonresponse imputation methods including spatial imputation for the analysis of farm household economy data of the Gangwon-Do in 2002 as an example. We show that spatial imputation is more efficient than other methods through the numerical simulations.

Comparison of Data Reconstruction Methods for Missing Value Imputation (결측값 대체를 위한 데이터 재현 기법 비교)

  • Cheongho Kim;Kee-Hoon Kang
    • The Journal of the Convergence on Culture Technology
    • /
    • v.10 no.1
    • /
    • pp.603-608
    • /
    • 2024
  • Nonresponse and missing values are caused by sample dropouts and avoidance of answers to surveys. In this case, problems with the possibility of information loss and biased reasoning arise, and a replacement of missing values with appropriate values is required. In this paper, as an alternative to missing values imputation, we compare several replacement methods, which use mean, linear regression, random forest, K-nearest neighbor, autoencoder and denoising autoencoder based on deep learning. These methods of imputing missing values are explained, and each method is compared by using continuous simulation data and real data. The comparison results confirm that in most cases, the performance of the random forest imputation method and the denoising autoencoder imputation method are better than the others.

Default Voting using User Coefficient of Variance in Collaborative Filtering System (협력적 여과 시스템에서 사용자 변동 계수를 이용한 기본 평가간 예측)

  • Ko, Su-Jeong
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.11
    • /
    • pp.1111-1120
    • /
    • 2005
  • In collaborative filtering systems most users do not rate preferences; so User-Item matrix shows great sparsity because it has missing values for items not rated by users. Generally, the systems predict the preferences of an active user based on the preferences of a group of users. However, default voting methods predict all missing values for all users in User-Item matrix. One of the most common methods predicting default voting values tried two different approaches using the average rating for a user or using the average rating for an item. However, there is a problem that they did not consider the characteristics of items, users, and the distribution of data set. We replace the missing values in the User-Item matrix by the default noting method using user coefficient of variance. We select the threshold of user coefficient of variance by using equations automatically and determine when to shift between the user averages and item averages according to the threshold. However, there are not always regular relations between the averages and the thresholds of user coefficient of variances in datasets. It is caused that the distribution information of user coefficient of variances in datasets affects the threshold of user coefficient of variance as well as their average. We decide the threshold of user coefficient of valiance by combining them. We evaluate our method on MovieLens dataset of user ratings for movies and show that it outperforms previously default voting methods.

Gap-Filling of Sentinel-2 NDVI Using Sentinel-1 Radar Vegetation Indices and AutoML (Sentinel-1 레이더 식생지수와 AutoML을 이용한 Sentinel-2 NDVI 결측화소 복원)

  • Youjeong Youn;Jonggu Kang;Seoyeon Kim;Yemin Jeong;Soyeon Choi;Yungyo Im;Youngmin Seo;Myoungsoo Won;Junghwa Chun;Kyungmin Kim;Keunchang Jang;Joongbin Lim;Yangwon Lee
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.6_1
    • /
    • pp.1341-1352
    • /
    • 2023
  • The normalized difference vegetation index (NDVI) derived from satellite images is a crucial tool to monitor forests and agriculture for broad areas because the periodic acquisition of the data is ensured. However, optical sensor-based vegetation indices(VI) are not accessible in some areas covered by clouds. This paper presented a synthetic aperture radar (SAR) based approach to retrieval of the optical sensor-based NDVI using machine learning. SAR system can observe the land surface day and night in all weather conditions. Radar vegetation indices (RVI) from the Sentinel-1 vertical-vertical (VV) and vertical-horizontal (VH) polarizations, surface elevation, and air temperature are used as the input features for an automated machine learning (AutoML) model to conduct the gap-filling of the Sentinel-2 NDVI. The mean bias error (MAE) was 7.214E-05, and the correlation coefficient (CC) was 0.878, demonstrating the feasibility of the proposed method. This approach can be applied to gap-free nationwide NDVI construction using Sentinel-1 and Sentinel-2 images for environmental monitoring and resource management.

A Missing Data Imputation by Combining K Nearest Neighbor with Maximum Likelihood Estimation for Numerical Software Project Data (K-NN과 최대 우도 추정법을 결합한 소프트웨어 프로젝트 수치 데이터용 결측값 대치법)

  • Lee, Dong-Ho;Yoon, Kyung-A;Bae, Doo-Hwan
    • Journal of KIISE:Software and Applications
    • /
    • v.36 no.4
    • /
    • pp.273-282
    • /
    • 2009
  • Missing data is one of the common problems in building analysis or prediction models using software project data. Missing imputation methods are known to be more effective missing data handling method than deleting methods in small software project data. While K nearest neighbor imputation is a proper missing imputation method in the software project data, it cannot use non-missing information of incomplete project instances. In this paper, we propose an approach to missing data imputation for numerical software project data by combining K nearest neighbor and maximum likelihood estimation; we also extend the average absolute error measure by normalization for accurate evaluation. Our approach overcomes the limitation of K nearest neighbor imputation and outperforms on our real data sets.

A Study on the cleansing of water data using LSTM algorithm (LSTM 알고리즘을 이용한 수도데이터 정제기법)

  • Yoo, Gi Hyun;Kim, Jong Rib;Shin, Gang Wook
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2017.10a
    • /
    • pp.501-503
    • /
    • 2017
  • In the water sector, various data such as flow rate, pressure, water quality and water level are collected during the whole process of water purification plant and piping system. The collected data is stored in each water treatment plant's DB, and the collected data are combined in the regional DB and finally stored in the database server of the head office of the Korea Water Resources Corporation. Various abnormal data can be generated when a measuring instrument measures data or data is communicated over various processes, and it can be classified into missing data and wrong data. The cause of each abnormal data is different. Therefore, there is a difference in the method of detecting the wrong side and the missing side data, but the method of cleansing the data is the same. In this study, a program that can automatically refine missing or wrong data by applying deep learning LSTM (Long Short Term Memory) algorithm will be studied.

  • PDF