• Title/Summary/Keyword: Incomplete Dataset

Search Result 19, Processing Time 0.019 seconds

Fuzzy Classification Method for Processing Incomplete Dataset

  • Woo, Young-Woon;Lee, Kwang-Eui;Han, Soo-Whan
    • Journal of information and communication convergence engineering
    • /
    • v.8 no.4
    • /
    • pp.383-386
    • /
    • 2010
  • Pattern classification is one of the most important topics for machine learning research fields. However incomplete data appear frequently in real world problems and also show low learning rate in classification models. There have been many researches for handling such incomplete data, but most of the researches are focusing on training stages. In this paper, we proposed two classification methods for incomplete data using triangular shaped fuzzy membership functions. In the proposed methods, missing data in incomplete feature vectors are inferred, learned and applied to the proposed classifier using triangular shaped fuzzy membership functions. In the experiment, we verified that the proposed methods show higher classification rate than a conventional method.

Compressive sensing-based two-dimensional scattering-center extraction for incomplete RCS data

  • Bae, Ji-Hoon;Kim, Kyung-Tae
    • ETRI Journal
    • /
    • v.42 no.6
    • /
    • pp.815-826
    • /
    • 2020
  • We propose a two-dimensional (2D) scattering-center-extraction (SCE) method using sparse recovery based on the compressive-sensing theory, even with data missing from the received radar cross-section (RCS) dataset. First, using the proposed method, we generate a 2D grid via adaptive discretization that has a considerably smaller size than a fully sampled fine grid. Subsequently, the coarse estimation of 2D scattering centers is performed using both the method of iteratively reweighted least square and a general peak-finding algorithm. Finally, the fine estimation of 2D scattering centers is performed using the orthogonal matching pursuit (OMP) procedure from an adaptively sampled Fourier dictionary. The measured RCS data, as well as simulation data using the point-scatterer model, are used to evaluate the 2D SCE accuracy of the proposed method. The results indicate that the proposed method can achieve higher SCE accuracy for an incomplete RCS dataset with missing data than that achieved by the conventional OMP, basis pursuit, smoothed L0, and existing discrete spectral estimation techniques.

Metropolis-Hastings Expectation Maximization Algorithm for Incomplete Data (불완전 자료에 대한 Metropolis-Hastings Expectation Maximization 알고리즘 연구)

  • Cheon, Soo-Young;Lee, Hee-Chan
    • The Korean Journal of Applied Statistics
    • /
    • v.25 no.1
    • /
    • pp.183-196
    • /
    • 2012
  • The inference for incomplete data such as missing data, truncated distribution and censored data is a phenomenon that occurs frequently in statistics. To solve this problem, Expectation Maximization(EM), Monte Carlo Expectation Maximization(MCEM) and Stochastic Expectation Maximization(SEM) algorithm have been used for a long time; however, they generally assume known distributions. In this paper, we propose the Metropolis-Hastings Expectation Maximization(MHEM) algorithm for unknown distributions. The performance of our proposed algorithm has been investigated on simulated and real dataset, KOSPI 200.

Handling Incomplete Data Problem in Collaborative Filtering System

  • Noh, Hyun-ju;Kwak, Min-jung;Han, In-goo
    • Proceedings of the KAIS Fall Conference
    • /
    • 2003.11a
    • /
    • pp.105-110
    • /
    • 2003
  • Collaborative filtering is one of the methodologies that are most widely used for recommendation system. It is based on a data matrix of each customer's preferences of products. There could be a lot of missing values in such preference. data matrix. This incomplete data is one of the reasons to deteriorate the accuracy of recommendation system. Multiple imputation method imputes m values for each missing value. It overcomes flaws of single imputation approaches through considering the uncertainty of missing values.. The objective of this paper is to suggest multiple imputation-based collaborative filtering approach for recommendation system to improve the accuracy in prediction performance. The experimental works show that the proposed approach provides better performance than the traditional Collaborative filtering approach, especially in case that there are a lot of missing values in dataset used for recommendation system.

  • PDF

Handling Incomplete Data Problem in Collaborative Filtering System

  • Noh, Hyun-Ju;Kwak, Min-Jung;Han, In-Goo
    • Journal of Intelligence and Information Systems
    • /
    • v.9 no.2
    • /
    • pp.51-63
    • /
    • 2003
  • Collaborative filtering is one of the methodologies that are most widely used for recommendation system. It is based on a data matrix of each customer's preferences of products. There could be a lot of missing values in such preference data matrix. This incomplete data is one of the reasons to deteriorate the accuracy of recommendation system. There are several treatments to deal with the incomplete data problem such as case deletion and single imputation. Those approaches are simple and easy to implement but they may provide biased results. Multiple imputation method imputes m values for each missing value. It overcomes flaws of single imputation approaches through considering the uncertainty of missing values. The objective of this paper is to suggest multiple imputation-based collaborative filtering approach for recommendation system to improve the accuracy in prediction performance. The experimental works show that the proposed approach provides better performance than the traditional Collaborative filtering approach, especially in case that there are a lot of missing values in dataset used for recommendation system.

  • PDF

Recovering Incomplete Data using Tucker Model for Tensor with Low-n-rank

  • Thieu, Thao Nguyen;Yang, Hyung-Jeong;Vu, Tien Duong;Kim, Sun-Hee
    • International Journal of Contents
    • /
    • v.12 no.3
    • /
    • pp.22-28
    • /
    • 2016
  • Tensor with missing or incomplete values is a ubiquitous problem in various fields such as biomedical signal processing, image processing, and social network analysis. In this paper, we considered how to reconstruct a dataset with missing values by using tensor form which is called tensor completion process. We applied Tucker factorization to solve tensor completion which was built base on optimization problem. We formulated the optimization objective function using components of Tucker model after decomposing. The weighted least square matric contained only known values of the tensor with low rank in its modes. A first order optimization method, namely Nonlinear Conjugated Gradient, was applied to solve the optimization problem. We demonstrated the effectiveness of the proposed method in EEG signals with about 70% missing entries compared to other algorithms. The relative error was proposed to compare the difference between original tensor and the process output.

Incomplete Cholesky Decomposition based Kernel Cross Modal Factor Analysis for Audiovisual Continuous Dimensional Emotion Recognition

  • Li, Xia;Lu, Guanming;Yan, Jingjie;Li, Haibo;Zhang, Zhengyan;Sun, Ning;Xie, Shipeng
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.2
    • /
    • pp.810-831
    • /
    • 2019
  • Recently, continuous dimensional emotion recognition from audiovisual clues has attracted increasing attention in both theory and in practice. The large amount of data involved in the recognition processing decreases the efficiency of most bimodal information fusion algorithms. A novel algorithm, namely the incomplete Cholesky decomposition based kernel cross factor analysis (ICDKCFA), is presented and employed for continuous dimensional audiovisual emotion recognition, in this paper. After the ICDKCFA feature transformation, two basic fusion strategies, namely feature-level fusion and decision-level fusion, are explored to combine the transformed visual and audio features for emotion recognition. Finally, extensive experiments are conducted to evaluate the ICDKCFA approach on the AVEC 2016 Multimodal Affect Recognition Sub-Challenge dataset. The experimental results show that the ICDKCFA method has a higher speed than the original kernel cross factor analysis with the comparable performance. Moreover, the ICDKCFA method achieves a better performance than other common information fusion methods, such as the Canonical correlation analysis, kernel canonical correlation analysis and cross-modal factor analysis based fusion methods.

EVALUATION OF AN ENHANCED WEATHER GENERATION TOOL FOR SAN ANTONIO CLIMATE STATION IN TEXAS

  • Lee, Ju-Young
    • Water Engineering Research
    • /
    • v.5 no.1
    • /
    • pp.47-54
    • /
    • 2004
  • Several computer programs have been developed to make stochastically generated weather data from observed daily data. But they require fully dataset to run WGEN. Mostly, meterological data frequently have sporadic missing data as well as totally missing data. The modified WGEN has data filling algorithm for incomplete meterological datasets. Any other WGEN models have not the function of data filling. Modified WGEN with data filling algorithm is processing from the equation of Matalas for first order autoregressive process on a multi dimensional state with known cross and auto correlations among state variables. The parameters of the equation of Matalas are derived from existing dataset and derived parameters are adopted to fill data. In case of WGEN (Richardson and Wright, 1984), it is one of most widely used weather generators. But it has to be modified and added. It uses an exponential distribution to generate precipitation amounts. An exponential distribution is easier to describe the distribution of precipitation amounts. But precipitation data with using exponential distribution has not been expressed well. In this paper, generated precipitation data from WGEN and Modified WGEN were compared with corresponding measured data as statistic parameters. The modified WGEN adopted a formula of CLIGEN for WEPP (Water Erosion Prediction Project) in USDA in 1985. In this paper, the result of other parameters except precipitation is not introduced. It will be introduced through study of verification and review soon

  • PDF

Statistical Analysis of Bivariate Current Status Data with Informative Censoring Using Frailty Effects

  • Kim, Yang-Jin
    • The Korean Journal of Applied Statistics
    • /
    • v.25 no.1
    • /
    • pp.115-123
    • /
    • 2012
  • In animal tumorigenicity data, tumor onsets occur at several sites and onset times cannot be exactly observed. Instead, the existence of tumors is examined only at death time or sacrifice time of the animal. Such an incomplete data structure makes it difficult to investigate the effect of treatment on tumor onset times; in addition, such dependence should be considered when censoring due to death is related with tumor onset. A bivariate frailty effect is incorporated to model bivariate tumor onsets and to connect death with tumor. For the inference of parameters, EM algorithm is applied and a real NTP(National Toxicology Program) dataset is analyzed as an illustrative example.

Improve object recognition using UWB SAR imaging with compressed sensing

  • Pham, The Hien;Hong, Ic-Pyo
    • Journal of IKEEE
    • /
    • v.25 no.1
    • /
    • pp.76-82
    • /
    • 2021
  • In this paper, the compressed sensing basic pursuit denoise algorithm adopted to synthetic aperture radar imaging is investigated to improve the object recognition. From the incomplete data sets for image processing, the compressed sensing algorithm had been integrated to recover the data before the conventional back- projection algorithm was involved to obtain the synthetic aperture radar images. This method can lead to the reduction of measurement events while scanning the objects. An ultra-wideband radar scheme using a stripmap synthetic aperture radar algorithm was utilized to detect objects hidden behind the box. The Ultra-Wideband radar system with 3.1~4.8 GHz broadband and UWB antenna were implemented to transmit and receive signal data of two conductive cylinders located inside the paper box. The results confirmed that the images can be reconstructed by using a 30% randomly selected dataset without noticeable distortion compared to the images generated by full data using the conventional back-projection algorithm.