• Title/Summary/Keyword: missing values

Search Result 445, Processing Time 0.023 seconds

A data extension technique to handle incomplete data (불완전한 데이터를 처리하기 위한 데이터 확장기법)

  • Lee, Jong Chan
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.2
    • /
    • pp.7-13
    • /
    • 2021
  • This paper introduces an algorithm that compensates for missing values after converting them into a format that can represent the probability for incomplete data including missing values in training data. In the previous method using this data conversion, incomplete data was processed by allocating missing values with an equal probability that missing variables can have. This method applied to many problems and obtained good results, but it was pointed out that there is a loss of information in that all information remaining in the missing variable is ignored and a new value is assigned. On the other hand, in the new proposed method, only complete information not including missing values is input into the well-known classification algorithm (C4.5), and the decision tree is constructed during learning. Then, the probability of the missing value is obtained from this decision tree and assigned as an estimated value of the missing variable. That is, some lost information is recovered using a lot of information that has not been lost from incomplete learning data.

A Study on Imputation using Adjusted Cohen Method

  • Chung, Sung-Suk;Chun, Young-Min;Lee, Sun-Kyung
    • Journal of the Korean Data and Information Science Society
    • /
    • v.17 no.3
    • /
    • pp.871-888
    • /
    • 2006
  • Many studies have been done to develop procedures to deal with missing values. Most common method is to reassign the other values to the missing data. The purpose of our study is to suggest adjusted Cohen methods and to compare the efficiency of them with other methods through a simulation study. The adjusted Cohen methods use an auxiliary variable to arrange ranking of the variable with missing values. It leads to a reduced mean square error(MSE) compared with the Cohen method.

  • PDF

A comparison of imputation methods using machine learning models

  • Heajung Suh;Jongwoo Song
    • Communications for Statistical Applications and Methods
    • /
    • v.30 no.3
    • /
    • pp.331-341
    • /
    • 2023
  • Handling missing values in data analysis is essential in constructing a good prediction model. The easiest way to handle missing values is to use complete case data, but this can lead to information loss within the data and invalid conclusions in data analysis. Imputation is a technique that replaces missing data with alternative values obtained from information in a dataset. Conventional imputation methods include K-nearest-neighbor imputation and multiple imputations. Recent methods include missForest, missRanger, and mixgb ,all which use machine learning algorithms. This paper compares the imputation techniques for datasets with mixed datatypes in various situations, such as data size, missing ratios, and missing mechanisms. To evaluate the performance of each method in mixed datasets, we propose a new imputation performance measure (IPM) that is a unified measurement applicable to numerical and categorical variables. We believe this metric can help find the best imputation method. Finally, we summarize the comparison results with imputation performances and computational times.

Veri cation of Improving a Clustering Algorith for Microarray Data with Missing Values

  • Kim, Su-Young
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.2
    • /
    • pp.315-321
    • /
    • 2011
  • Gene expression microarray data often include multiple missing values. Most gene expression analysis (including gene clustering analysis); however, require a complete data matric as an input. In ordinary clustering methods, just a single missing value makes one abandon the whole data of a gene even if the rest of data for that gene was intact. The quality of analysis may decrease seriously as the missing rate is increased. In the opposite aspect, the imputation of missing value may result in an artifact that reduces the reliability of the analysis. To clarify this contradiction in microarray clustering analysis, this paper compared the accuracy of clustering with and without imputation over several microarray data having different missing rates. This paper also tested the clustering efficiency of several imputation methods including our propose algorithm. The results showed it is worthwhile to check the clustering result in this alternative way without any imputed data for the imperfect microarray data.

Method of Processing the Outliers and Missing Values of Field Data to Improve RAM Analysis Accuracy (RAM 분석 정확도 향상을 위한 야전운용 데이터의 이상값과 결측값 처리 방안)

  • Kim, In Seok;Jung, Won
    • Journal of Applied Reliability
    • /
    • v.17 no.3
    • /
    • pp.264-271
    • /
    • 2017
  • Purpose: Field operation data contains missing values or outliers due to various causes of the data collection process, so caution is required when utilizing RAM analysis results by field operation data. The purpose of this study is to present a method to minimize the RAM analysis error of the field data to improve the accuracy. Methods: Statistical methods are presented for processing of the outliers and the missing values of the field operating data, and after analyzing the RAM, the differences between before and after applying the technique are discussed. Results: The availability is estimated to be lower by 6.8 to 23.5% than that before processing, and it is judged that the processing of the missing values and outliers greatly affect the RAM analysis result. Conclusion: RAM analysis of OO weapon system was performed and suggestions for improvement of RAM analysis were presented through comparison with the new and current method. Data analysis results without appropriate treatment of error values may result in incorrect conclusions leading to inappropriate decisions and actions.

Simultaneous Approach to Fuzzy Clustering and Quantification of Categorical Data with Missing Values

  • Honda, Katsuhiro;Nakamura, Yoshihito;Ichihashi, Hidetomo
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2003.09a
    • /
    • pp.36-39
    • /
    • 2003
  • This paper proposes a simultaneous application of homogeneity analysis and fuzzy clustering with in complete data. Taking the similarity between the loss of homogeneity in homogeneity analysis and the least squares criterion in principal component analysis into account, the new objective function is defined in a similar formulation to the linear fuzzy clustering with missing values. Numerical experiment shows the characteristic properties of the proposed method.

  • PDF

An Extended Version of the CPT-based Estimation for Missing Values in Nominal Attributes

  • Ko, Song;Kim, Dae-Won
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.10 no.4
    • /
    • pp.253-258
    • /
    • 2010
  • The causal network represents the knowledge related to the dependency relationship between all attributes. If the causal network is available, the dependency relationship can be employed to estimate the missing values for improving the estimation performance. However, the previous method had a limitation in that it did not consider the bidirectional characteristic of the causal network. The proposed method considers the bidirectional characteristic by applying prior and posterior conditions, so that it outperforms the previous method.

Posterior Density of Parameters in Multiresponse Regression Analysis with Missing Values in one Response

  • Kang, Gun-Seog
    • Journal of the Korean Statistical Society
    • /
    • v.19 no.2
    • /
    • pp.145-150
    • /
    • 1990
  • In this article we develop the marginal posterior density of the model parameters in the multiresponse regression models when missing values exist only in one response. The resulting density resolves a couple of problems in the estimation approach proposed by Box, Draper, and Hunter (1970) and provides a general interpretation for relationship between the estimates of the missing values and the parameters.

  • PDF

Handling Incomplete Data Problem in Collaborative Filtering System

  • Noh, Hyun-ju;Kwak, Min-jung;Han, In-goo
    • Proceedings of the KAIS Fall Conference
    • /
    • 2003.11a
    • /
    • pp.105-110
    • /
    • 2003
  • Collaborative filtering is one of the methodologies that are most widely used for recommendation system. It is based on a data matrix of each customer's preferences of products. There could be a lot of missing values in such preference. data matrix. This incomplete data is one of the reasons to deteriorate the accuracy of recommendation system. Multiple imputation method imputes m values for each missing value. It overcomes flaws of single imputation approaches through considering the uncertainty of missing values.. The objective of this paper is to suggest multiple imputation-based collaborative filtering approach for recommendation system to improve the accuracy in prediction performance. The experimental works show that the proposed approach provides better performance than the traditional Collaborative filtering approach, especially in case that there are a lot of missing values in dataset used for recommendation system.

  • PDF

Symptom Pattern Classification using Neural Networks in the Ubiquitous Healthcare Environment with Missing Values (손실 값을 갖는 유비쿼터스 헬스케어 환경에서 신경망을 이용한 에이전트 기반 증상 패턴 분류)

  • Salvo, Michael Angelo G.;Lee, Jae-Wan;Lee, Mal-Rey
    • Journal of Internet Computing and Services
    • /
    • v.11 no.2
    • /
    • pp.129-142
    • /
    • 2010
  • The ubiquitous healthcare environment is one of the systems that benefit from wireless sensor network. But one of the challenges with wireless sensor network is its high loss rates when transmitting data. Data from the biosensors may not reach the base stations which can result in missing values. This paper proposes the Health Monitor Agent (HMA) to gather data from the base stations, predict missing values, classify symptom patterns into medical conditions, and take appropriate action in case of emergency. This agent is applied in the Ubiquitous Healthcare Environment and uses data from the biosensors and from the patient’s medical history as symptom patterns to recognize medical conditions. In the event of missing data, the HMA uses a predictive algorithm to fill missing values in the symptom patterns before classification. Simulation results show that the predictive algorithm using the HMA makes classification of the symptom patterns more accurate than other methods.