• Title/Summary/Keyword: 임의결측

Search Result 30, Processing Time 0.029 seconds

Comparison of Estimation Methods for the Missing Rainfall data in a Urban Sub-drainage Area (도시하천 소배수구역의 결측 강우량 산정 방법 비교)

  • Kim, Chung-Soo;Kim, Hyoung-Seop
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2006.05a
    • /
    • pp.701-705
    • /
    • 2006
  • 강우자료는 수문 모델링 작업에서 가장 기초적인 수문학적 입력자료로 시간과 공간에 따른 변동성이 크므로 규명하기 복잡한 수문현상 중의 하나이다. 산악지역이 많은 우리나라의 지형학적 특성과 태풍, 장마 및 특히, 최근의 게릴라성 집중호우 등으로 인하여 이러한 변동성이 더욱 커지고 있는 실정이다. 장기간 실측된 수문기상 기초 자료가 부족한 우리나라의 실정상 홍수예보 및 수공구조물 설계를 위해 정확한 강우량 자료의 취득이 선행돼야 한다. 따라서 적절한 장소에 수문관측소 설치 및 관리를 통해 양호한 강우량 자료를 획득해야 하지만, 현장 여건상 등의 이유로 미계측 및 결측, 이상자료가 발생하고 있다. 따라서 이러한 미계측 혹은 결측지점의 우량을 추정할 수 있는 방법을 비교, 분석하여 적절한 보정과정을 수행할 필요가 있다. 그간의 연구에서는 미계측 지점 혹은 산악지역에서의 점 강우량 보정방법에 대한 연구가 진행되었지만, 본 연구에서는 '도시홍수재해관리기술연구사업단'에서 운영 중인 도시하천 유역 특히 소배수구역에서의 결측 자료에 대해 여러 추정 방법을 비교, 분석하여 적절한 방안을 찾고자 한다. 이를 위하여 중랑천 유역의 3개 소배수 구역(월계1 배수구역, 군자 배수구역, 어린이대공원 배수구역)에 설치된 3개 우량관측소와 건설교통부 관할 우량관측소 2개소의 우량자료를 사용하였다. 본 연구에서는 결측치 보간을 위하여 널리 이용되고 있는 산술평균법(Arithmetic Average method), 역거리법(Reciprocal Distance Squared method), 거리고도비율법(Ratio of Distance and Elevation method), 인근관측소와의 관계식 이용, 크리깅방법(Simple Kriging method)을 비교, 검토 적용하였다. 중랑천 유역의 소배수구역을 대상으로 연중 발생하는 큰 호우사상에 대해 임의의 강우관측소를 결측지점으로 가정하고 주변의 강우관측소로부터 각각의 방법을 이용해 가중치들을 산정하여 결측지점의 강우량 값을 보정하고자 하였다. 또한 각각의 방법을 이용하여 얻어진 결과에 대해 실측값과 보정값의 오차정도를 평균절대오차법(Mean Absolute Error)과 제곱평균제곱근오차법(Root Mean Squared Error)에 의해 산정하여 보정 방법간의 효율성을 검토하고자 하였다.

  • PDF

Pattern-Mixture Model of the Cox Proportional Hazards Model with Missing Binary Covariates (결측이 있는 이산형 공변량에 대한 Cox비례위험모형의 패턴-혼합 모델)

  • Youk, Tae-Mi;Song, Ju-Won
    • The Korean Journal of Applied Statistics
    • /
    • v.25 no.2
    • /
    • pp.279-291
    • /
    • 2012
  • When fitting a Cox proportional hazards model with missing covariates, it is inefficient to exclude observations with missing values in the analysis. Furthermore, if the missing-data mechanism is not Missing Completely At Random(MCAR), it may lead to biased parameter estimation. Many approaches have been suggested to handle the Cox proportional hazards model when covariates are sometimes missing, but they are based on the selection model. This paper suggest an approach to handle Cox proportional hazards model with missing covariates by using the pattern-mixture model (Little, 1993). The pattern-mixture model is expressed by the joint distribution of survival time and the missing-data mechanism. In the pattern-mixture model, many models can be considered by setting up various restrictions, and different results under various restrictions indicate the sensitivity of the model due to missing covariates. A simulation study was conducted to show the sensitivity of parameter estimation under different restrictions in a pattern-mixture model. The proposed approach was also applied to mouse leukemia data.

Performance Evaluation of an Imputation Method based on Generative Adversarial Networks for Electric Medical Record (전자의무기록 데이터에서의 적대적 생성 알고리즘 기반 결측값 대치 알고리즘 성능분석)

  • Jo, Yong-Yeon;Jeong, Min-Yeong;Hwangbo, Yul
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2019.10a
    • /
    • pp.879-881
    • /
    • 2019
  • 전자의무기록 (EMR)과 같은 의료 현장에서 수집되는 대용량의 데이터는 임상 해석적으로 잠재가치가 크고 활용도가 다양하나 결측값이 많아 희소성이 크다는 한계점이 있어 분석이 어렵다. 특히 EMR의 정보수집과정에서 발생하는 결측값은 무작위적이고 임의적이어서 분석 정확도를 낮추고 예측 모델의 성능을 저하시키는 주된 요인으로 작용하기 때문에, 결측치 대체는 필수불가결하다. 최근 통상적으로 활용되어지던 통계기반 알고리즘기반의 결측치 대체 알고리즘보다는 딥러닝 기술을 활용한 알고리즘들이 새로이 등장하고 있다. 본 논문에서는 Generative Adversarial Network를 기반한 최신 결측값 대치 알고리즘인 Generative Adversarial Imputation Nets을 적용하여 EMR에서의 성능을 분석해보고자 하였다.

A longitudinal study for child aggression with Korea Welfare Panel Study data (한국복지패널 자료를 이용한 아동기 공격성에 대한 경시적 자료 분석)

  • Choi, Nayeon;Huh, Jib
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.6
    • /
    • pp.1439-1447
    • /
    • 2014
  • Most of literatures on Korean child aggression are based on using the cross-sectional data sets. Although there is a related study with a longitudinal data set, it is assumed that the data sets measured repeatedly in the longitudinal data are mutually independent. A longitudinal data analysis for Korean child aggression is then necessary. This study is to analyze the effect of child development outcomes including academic achievement, self-esteem, depression anxiety, delinquency, victimization by peers, abuse by parents and internet using time on child aggression with Korea Welfare Panel Study data observed three times between 2006 and 2012. Since Korea Welfare Panel Study data have missing values, the missing at random is assumed. The linear mixed effect model and the restricted maximum likelihood estimation are considered.

A Study on the Imputation for Missing Data in Dual-loop Vehicle Detector System (차량 검지자료 결측 보정처리에 관한 연구 (이력자료 활용방안을 중심으로))

  • Kim, Jeong-Yeon;Lee, Yeong-In;Baek, Seung-Geol;Nam, Gung-Seong
    • Journal of Korean Society of Transportation
    • /
    • v.24 no.7 s.93
    • /
    • pp.27-40
    • /
    • 2006
  • The traffic information is provided, which based on the volume of traffic, speed, occupancy collected through the currently operating Vehicle Detector System(VDS). In addition to the trend in utilization fold of traffic information is increasing gradually with the applied various fields and users. Missing data in Vehicle detector data means series of data transmitted to controller without specific property. The missing data does not have a data property, so excluded at the whole data Process Hence, increasing ratio of missing data in VDS data inflicts unreliable representation of actual traffic situation. This study presented the imputation process due out which applied the methodologies that utilized adjacent stations reference and historical data utilize about missing data. Applied imputation process methodologies to VDS data or SeoHaeAn/Kyongbu Expressway, currently operation VDS, after processes at missing data ratio of an option. Imputation process held presented to per lane-30seconds-period, and morning/afternoon/daily time scope ranges classified, and analyzed an error of imputed data preparing for actual data. The analysis results, an low error occurred relatively in the results of the imputation process way that utilized a historical data compare with adjacent stations reference methods.

A comparison of imputation methods using nonlinear models (비선형 모델을 이용한 결측 대체 방법 비교)

  • Kim, Hyein;Song, Juwon
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.4
    • /
    • pp.543-559
    • /
    • 2019
  • Data often include missing values due to various reasons. If the missing data mechanism is not MCAR, analysis based on fully observed cases may an estimation cause bias and decrease the precision of the estimate since partially observed cases are excluded. Especially when data include many variables, missing values cause more serious problems. Many imputation techniques are suggested to overcome this difficulty. However, imputation methods using parametric models may not fit well with real data which do not satisfy model assumptions. In this study, we review imputation methods using nonlinear models such as kernel, resampling, and spline methods which are robust on model assumptions. In addition, we suggest utilizing imputation classes to improve imputation accuracy or adding random errors to correctly estimate the variance of the estimates in nonlinear imputation models. Performances of imputation methods using nonlinear models are compared under various simulated data settings. Simulation results indicate that the performances of imputation methods are different as data settings change. However, imputation based on the kernel regression or the penalized spline performs better in most situations. Utilizing imputation classes or adding random errors improves the performance of imputation methods using nonlinear models.

A Comprehensive Method to Impute Vehicle Trajectory Data Collected in Wireless Traffic Surveillance Environments (무선통신기반 교통정보수집체계하에서의 차량주행궤적정보 결측치 보정방안)

  • Yeon, Ji-Yun;Kim, Hyeon-Mi;O, Cheol;Kim, Won-Gyu
    • Journal of Korean Society of Transportation
    • /
    • v.27 no.4
    • /
    • pp.175-181
    • /
    • 2009
  • Intelligent Transportation Systems(ITS) enables road users to enhance efficiency of their trips in a variety of traffic conditions. As a significant part of ITS, information communication technology among vehicles and between vehicles and infrastructure has been being developed to upgrade current traffic data collection technology through location-based traffic surveillance systems. A wider and detailed range of traffic data can be acquired with ease by the technology. However, its performance level falls with environmental impediments such as large vehicles, buildings, harsh weather, which often bring about wireless communication failure. For imputation of vehicle trajectory data discontinued by the failure, several potential existing methods were reviewed and a new method to complement them was devised. AIMSUN API(Application Programming Interface) software was utilized to simulate vehicle trajectories data and missing vehicle trajectories data was randomly generated for the verification of the method. The method was proven to yield more accurate and reliable traffic data than the existing ones.

Evaluation of the DCT-PLS Method for Spatial Gap Filling of Gridded Data (격자자료 결측복원을 위한 DCT-PLS 기법의 활용성 평가)

  • Youn, Youjeong;Kim, Seoyeon;Jeong, Yemin;Cho, Subin;Lee, Yangwon
    • Korean Journal of Remote Sensing
    • /
    • v.36 no.6_1
    • /
    • pp.1407-1419
    • /
    • 2020
  • Long time-series gridded data is crucial for the analyses of Earth environmental changes. Climate reanalysis and satellite images are now used as global-scale periodical and quantitative information for the atmosphere and land surface. This paper examines the feasibility of DCT-PLS (penalized least square regression based on discrete cosine transform) for the spatial gap filling of gridded data through the experiments for multiple variables. Because gap-free data is required for an objective comparison of original with gap-filled data, we used LDAPS (Local Data Assimilation and Prediction System) daily data and MODIS (Moderate Resolution Imaging Spectroradiometer) monthly products. In the experiments for relative humidity, wind speed, LST (land surface temperature), and NDVI (normalized difference vegetation index), we made sure that randomly generated gaps were retrieved very similar to the original data. The correlation coefficients were over 0.95 for the four variables. Because the DCT-PLS method does not require ancillary data and can refer to both spatial and temporal information with a fast computation, it can be applied to operative systems for satellite data processing.

Denoising Self-Attention Network for Mixed-type Data Imputation (혼합형 데이터 보간을 위한 디노이징 셀프 어텐션 네트워크)

  • Lee, Do-Hoon;Kim, Han-Joon;Chun, Joonghoon
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.11
    • /
    • pp.135-144
    • /
    • 2021
  • Recently, data-driven decision-making technology has become a key technology leading the data industry, and machine learning technology for this requires high-quality training datasets. However, real-world data contains missing values for various reasons, which degrades the performance of prediction models learned from the poor training data. Therefore, in order to build a high-performance model from real-world datasets, many studies on automatically imputing missing values in initial training data have been actively conducted. Many of conventional machine learning-based imputation techniques for handling missing data involve very time-consuming and cumbersome work because they are applied only to numeric type of columns or create individual predictive models for each columns. Therefore, this paper proposes a new data imputation technique called 'Denoising Self-Attention Network (DSAN)', which can be applied to mixed-type dataset containing both numerical and categorical columns. DSAN can learn robust feature expression vectors by combining self-attention and denoising techniques, and can automatically interpolate multiple missing variables in parallel through multi-task learning. To verify the validity of the proposed technique, data imputation experiments has been performed after arbitrarily generating missing values for several mixed-type training data. Then we show the validity of the proposed technique by comparing the performance of the binary classification models trained on imputed data together with the errors between the original and imputed values.

Suggestions on the Improvement of the Hydrological Data Operation (수문관측자료 운영 개선방안에 대한 연구)

  • Kim, Hwi-Rin;Cho, Hyo-Seob
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2006.05a
    • /
    • pp.706-709
    • /
    • 2006
  • 현재 수문관측 자료의 양적 확보와 더불어 질적인 측면에서도 대부분의 기술자들이 자료의 신뢰도에 의문을 제기하고 있다. 이러한 의문은 자료의 오.결측, 자료계열에 내재된 오차 등으로 인한 자료의 불확실성에서 찾을 수 있고, 품질을 알 수 없는 원시자료가 설계와 평가 등에 여과 없이 반영되고 있기 때문이다. 본 논문에서는 건설교통부 한강홍수통제소에서 수행하고 있는 수문관측자료를 대상으로 하여 관측, 기록, 전송, 품질관리, DB구축 및 정보화의 5단계로 임의 분류하고 각 단계별 현황을 파악하여 문제점을 검토한 후 개선방안을 도출하고자 하였다.

  • PDF