• 제목/요약/키워드: missing data estimation method

검색결과 87건 처리시간 0.027초

디지털 데이터에서 데이터 전처리를 위한 자동화된 결측 구간 대치 방법에 관한 연구 (A Study on Automatic Missing Value Imputation Replacement Method for Data Processing in Digital Data)

  • 김종찬;심춘보;정세훈
    • 한국멀티미디어학회논문지
    • /
    • 제24권2호
    • /
    • pp.245-254
    • /
    • 2021
  • We proposed the research on an analysis and prediction model that allows the identification of outliers or abnormality in the data followed by effective and rapid imputation of missing values was conducted. This model is expected to analyze efficiently the problems in the data based on the calibrated raw data. As a result, a system that can adequately utilize the data was constructed by using the introduced KNN + MLE algorithm. With this algorithm, the problems in some of the existing KNN-based missing data imputation algorithms such as ignoring the missing values in some data sections or discarding normal observations were effectively addressed. A comparative evaluation was performed between the existing imputation approaches such as K-means, KNN, MEI, and MI as well as the data missing mechanisms including MCAR, MAR, and NI to check the effectiveness/efficiency of the proposed algorithm, and its superiority in all aspects was confirmed.

수질자료 결측구간의 오염부하 추정기법 비교평가 (Comparative Evaluation of the Pollutant Load Estimation Method in the Water Quality Data Missing Intervals)

  • 조범준;조홍연;강성현
    • 한국해안해양공학회지
    • /
    • 제19권1호
    • /
    • pp.45-56
    • /
    • 2007
  • 수량 및 수질자료, 특히 수질자료가 없는 구간에서의 직접계산에 의한 오염부하 산정은 불가능하기 때문에 적절한 방법을 이용하여 결측구간의 자료를 보완(data filling)하여 계산하는 추정과정을 필요로 한다. 본 연구에서는 수질자료가 없는 구간, 즉 수질 결측구간에서 오염부하량을 산정하기 위한 다양한 농도 추정방법을 제시하고, 제시된 방법을 이용하여 추정된 농도변화 양상 분석 및 오염부하 변동양상을 비교 분석하여 보다 효과적이고, 효율적인 추정방법을 최종 제안하였다. 또한, 오염부하에 영향을 미치는 수량 및 수질인자의 상대적인 중요성과 연안 하천의 오염부하 특성을 구분할 수 있는 영향인자를 제시하였다. 수질자료 결측구간의 다양한 농도 추정방법을 이용하여 한강하구의 오염부하를 산정한 결과, 결측구간을 제외하고 추정한 오염부하는 매우 낮은 비현실적인 결과를 제시하였으며, 가용자료의 변동성을 고려한 선형내삽법이 가장 적합한 방법으로 파악되었다. 또한, 한강하구의 오염부하양상은 수량주도형으로 판단되었으며, 결측구간의 농도추정은 불가피한 과정으로 적절한 추정방법을 이용하는 것이 보다 바람직한 것으로 파악되었다.

Missing Pattern Analysis of the GOCI-I Optical Satellite Image Data

  • Jeon, Ho-Kun;Cho, Hong Yeon
    • Ocean and Polar Research
    • /
    • 제44권2호
    • /
    • pp.179-190
    • /
    • 2022
  • Data missing in optical satellite images caused by natural variations have been a crucial barrier in observing the status of marine surfaces. Although there have been many attempts to fill the gaps of non-observation, there is little research to analyze the ratio of missing grids to overall sea grids and their seasonal patterns. This report introduces the method of quantifying the distribution of missing points and then shows how the missing points have spatial correlation and seasonal trends. Both temporal and spatial integration methods are compared to assess the effectiveness of reducing missing data. The temporal integration shows more outstanding performance than the spatial integration. Moran's I and K-function with statistical hypothesis testing show that missing grids are clustered and there is a non-random distribution from daily integration. The result of the seasonality test for Moran's I through a periodogram shows dependency on full-year, half-year, and quarter-year periods respectively. These analysis results can be used to deduce appropriate integration periods with permissible estimation errors.

Likelihood Ratio Criterion for Testing Sphericity from a Multivariate Normal Sample with 2-step Monotone Missing Data Pattern

  • Choi, Byung-Jin
    • Communications for Statistical Applications and Methods
    • /
    • 제12권2호
    • /
    • pp.473-481
    • /
    • 2005
  • The testing problem for sphericity structure of the covariance matrix in a multivariate normal distribution is introduced when there is a sample with 2-step monotone missing data pattern. The maximum likelihood method is described to estimate the parameters on the basis of the sample. Using these estimates, the likelihood ratio criterion for testing sphericity is derived.

Comparing Accuracy of Imputation Methods for Incomplete Categorical Data

  • Shin, Hyung-Won;Sohn, So-Young
    • 한국통계학회:학술대회논문집
    • /
    • 한국통계학회 2003년도 춘계 학술발표회 논문집
    • /
    • pp.237-242
    • /
    • 2003
  • Various kinds of estimation methods have been developed for imputation of categorical missing data. They include modal category method, logistic regression, and association rule. In this study, we propose two imputation methods (neural network fusion and voting fusion) that combine the results of individual imputation methods. A Monte-Carlo simulation is used to compare the performance of these methods. Five factors used to simulate the missing data are (1) true model for the data, (2) data size, (3) noise size (4) percentage of missing data, and (5) missing pattern. Overall, neural network fusion performed the best while voting fusion is better than the individual imputation methods, although it was inferior to the neural network fusion. Result of an additional real data analysis confirms the simulation result.

  • PDF

Neighboring Elemental Image Exemplar Based Inpainting for Computational Integral Imaging Reconstruction with Partial Occlusion

  • Ko, Bumseok;Lee, Byung-Gook;Lee, Sukho
    • Journal of the Optical Society of Korea
    • /
    • 제19권4호
    • /
    • pp.390-396
    • /
    • 2015
  • We propose a partial occlusion removal method for computational integral imaging reconstruction (CIIR) based on the usage of the exemplar based inpainting technique. The proposed method is an improved version of the original linear inpainting based CIIR (LI-CIIR), which uses the inpainting technique to fill in the data missing region. The LI-CIIR shows good results for images which contain objects with smooth surfaces. However, if the object has a textured surface, the result of the LI-CIIR deteriorates, since the linear inpainting cannot recover the textured data in the data missing region well. In this work, we utilize the exemplar based inpainting to fill in the textured data in the data missing region. We call the proposed method the neighboring elemental image exemplar based inpainting (NEI-exemplar inpainting) method, since it uses sources from neighboring elemental images to fill in the data missing region. Furthermore, we also propose an automatic occluding region extraction method based on the use of the mutual constraint using depth estimation (MC-DE) and the level set based bimodal segmentation. Experimental results show the validity of the proposed system.

범주형 자료의 결측치 추정방법 성능 비교 (Comparing Accuracy of Imputation Methods for Categorical Incomplete Data)

  • 신형원;손소영
    • 응용통계연구
    • /
    • 제15권1호
    • /
    • pp.33-43
    • /
    • 2002
  • 범주형 데이터의 결측치 추정을 위하여 최빈 범주법, 로지스틱 회귀분석, 연관규칙과 같은 다양한 방법이 연구되어 왔다. 본 연구에서는 이러한 방법의 추정 값을 결합하는 신경망 융합과 투표융합 방법을 제안하고 이의 성능을 시뮬레이션을 이용하여 비교하였다. 실험에 사용된 데이터의 특성을 나타내는 인자로는 (1) 입출력 변수간의 연결함수, (2) 데이터의 크기, (3) 노이즈의 크기 (4) 결측치의 비율, (5) 결측발생 함수를 사용하였다. 분석결과는 다음과 같다. 데이터의 크기가 작고 결측 발생 비율이 높으면 최빈 범주법, 연관규칙, 신경망 융합의 성능이 높게 나타났으며 데이터의 크기가 작고 결측발생 확률이 결측이 안된 나머지 변수에 높은 의존관계가 있으면 로지스틱 회귀분석, 신경망 융합의 성능이 높게 나타났다. 데이터의 크기가 크고, 결측치의 비율이 낮으면서, 노이즈가 크고 결측발생 확률이 결측이 안된 나머지 변수에 높은 의존관계가 있으면 신경망 융합의 성능이 높게 나타났다.

Imputation of Medical Data Using Subspace Condition Order Degree Polynomials

  • Silachan, Klaokanlaya;Tantatsanawong, Panjai
    • Journal of Information Processing Systems
    • /
    • 제10권3호
    • /
    • pp.395-411
    • /
    • 2014
  • Temporal medical data is often collected during patient treatments that require personal analysis. Each observation recorded in the temporal medical data is associated with measurements and time treatments. A major problem in the analysis of temporal medical data are the missing values that are caused, for example, by patients dropping out of a study before completion. Therefore, the imputation of missing data is an important step during pre-processing and can provide useful information before the data is mined. For each patient and each variable, this imputation replaces the missing data with a value drawn from an estimated distribution of that variable. In this paper, we propose a new method, called Newton's finite divided difference polynomial interpolation with condition order degree, for dealing with missing values in temporal medical data related to obesity. We compared the new imputation method with three existing subspace estimation techniques, including the k-nearest neighbor, local least squares, and natural cubic spline approaches. The performance of each approach was then evaluated by using the normalized root mean square error and the statistically significant test results. The experimental results have demonstrated that the proposed method provides the best fit with the smallest error and is more accurate than the other methods.

Large tests of independence in incomplete two-way contingency tables using fractional imputation

  • Kang, Shin-Soo;Larsen, Michael D.
    • Journal of the Korean Data and Information Science Society
    • /
    • 제26권4호
    • /
    • pp.971-984
    • /
    • 2015
  • Imputation procedures fill-in missing values, thereby enabling complete data analyses. Fully efficient fractional imputation (FEFI) and multiple imputation (MI) create multiple versions of the missing observations, thereby reflecting uncertainty about their true values. Methods have been described for hypothesis testing with multiple imputation. Fractional imputation assigns weights to the observed data to compensate for missing values. The focus of this article is the development of tests of independence using FEFI for partially classified two-way contingency tables. Wald and deviance tests of independence under FEFI are proposed. Simulations are used to compare type I error rates and Power. The partially observed marginal information is useful for estimating the joint distribution of cell probabilities, but it is not useful for testing association. FEFI compares favorably to other methods in simulations.

미계측 결측 강수자료 보완 방법의 비교 (A Comparison of the Methods for Estimating the Missing Precipitation Values Ungauged)

  • 유주환;최용준;정관수
    • 한국수자원학회:학술대회논문집
    • /
    • 한국수자원학회 2009년도 학술발표회 초록집
    • /
    • pp.1427-1430
    • /
    • 2009
  • The amount and the continuity of the precipitation data used in a hydrological analysis may exert a big influence on the reliability of the analysis. It is a fundamental process to estimate the missing data caused by such as a breakdown of the rainfall recording machine or to expand a short period of rainfall data. In this study the eight methods widely used as methods for estimating are compared. The data used in this research is the annual precipitation amount during 17 years at the Cheolwon station including an ungauged period of 15 years and its five surrounding stations. By use of this certified method the ungauged precipitation values at the Cheolweon station is estimated and the areal average of annual precipitation for 32 years at the Han River basin is calculated.

  • PDF