• 제목/요약/키워드: incomplete data

검색결과 717건 처리시간 0.03초

Design of the Integrated Incomplete Information Processing System based on Rough Set

  • Jeong, Gu-Beom;Chung, Hwan-Mook;Kim, Guk-Boh;Park, Kyung-Ok
    • 한국지능시스템학회논문지
    • /
    • 제11권5호
    • /
    • pp.441-447
    • /
    • 2001
  • In general, Rough Set theory is used for classification, inference, and decision analysis of incomplete data by using approximation space concepts in information system. Information system can include quantitative attribute values which have interval characteristics, or incomplete data such as multiple or unknown(missing) data. These incomplete data cause tole inconsistency in information system and decrease the classification ability in system using Rough Sets. In this paper, we present various types of incomplete data which may occur in information system and propose INcomplete information Processing System(INiPS) which converts incomplete information system into complete information system in using Rough Sets.

  • PDF

A Study on the Incomplete Information Processing System(INiPS) Using Rough Set

  • Jeong, Gu-Beom;Chung, Hwan-Mook;Kim, Guk-Boh;Park, Kyung-Ok
    • 한국지능시스템학회:학술대회논문집
    • /
    • 한국퍼지및지능시스템학회 2000년도 추계학술대회 학술발표 논문집
    • /
    • pp.243-251
    • /
    • 2000
  • In general, Rough Set theory is used for classification, inference, and decision analysis of incomplete data by using approximation space concepts in information system. Information system can include quantitative attribute values which have interval characteristics, or incomplete data such as multiple or unknown(missing) data. These incomplete data cause the inconsistency in information system and decrease the classification ability in system using Rough Sets. In this paper, we present various types of incomplete data which may occur in information system and propose INcomplete information Processing System(INiPS) which converts incomplete information system into complete information system in using Rough Sets.

  • PDF

불완전 데이터를 위한 효율적 Top-k(g) 스카이라인 그룹 질의 처리 기법 (An Efficient Processing Method of Top-k(g) Skyline Group Queries for Incomplete Data)

  • 박미라;민준기
    • 정보처리학회논문지D
    • /
    • 제17D권1호
    • /
    • pp.17-24
    • /
    • 2010
  • 최근에 스카이라인 질의에 대한 관심이 점차 증가하고 있다. 대부분의 스카이라인 질의에 대한 연구는 데이터들이 널 값을 가지지 않는다는 가정에서 이루어진다. 그러나 우리가 웹이나 다른 도구로 데이터베이스에 자료를 입력할 때는 널 값을 가지는 불완전한 데이터가 존재한다. 따라서 불완전한 데이터를 위한 다양한 스카이라인 처리 기법들이 제안되었다. 그러나 기존의 불완전한 데이터를 위한 스카이라인 질의 처리 기법은 불완전한 데이터만을 고려함으로써 완전한 데이터와 불완전한 데이터가 공존하는 환경을 고려하지 않았다. 본 논문에서는 완전한 데이터를 위한 스카이라인 질의와 불완전한 데이터를 위한 스카이라인 질의를 모두 처리 하는 스카이라인 그룹 질의 처리 기법을 제안한다. 이를 위하여, 사용자 정의에 의한 차원의 선호도에 따라서 g개의 스카이라인 그룹을 검색하는 top-k(g) 스카이라인 그룹 질의를 도입하고, 이를 질의 처리하는 기법을 제안한다. 그리고 모의실험을 통하여 제안한 방식의 성능을 보인다.

Fuzzy Classification Method for Processing Incomplete Dataset

  • Woo, Young-Woon;Lee, Kwang-Eui;Han, Soo-Whan
    • Journal of information and communication convergence engineering
    • /
    • 제8권4호
    • /
    • pp.383-386
    • /
    • 2010
  • Pattern classification is one of the most important topics for machine learning research fields. However incomplete data appear frequently in real world problems and also show low learning rate in classification models. There have been many researches for handling such incomplete data, but most of the researches are focusing on training stages. In this paper, we proposed two classification methods for incomplete data using triangular shaped fuzzy membership functions. In the proposed methods, missing data in incomplete feature vectors are inferred, learned and applied to the proposed classifier using triangular shaped fuzzy membership functions. In the experiment, we verified that the proposed methods show higher classification rate than a conventional method.

불완전한 자료 및 완전한 자료 목록을 이용한 한반도 지진구들의 지진활동 매개변수 평가 (Estimation of seismicity parameters of the seismic zones of the Korean Peninsula using incomplete and complete data files)

  • 이기화
    • 한국지진공학회:학술대회논문집
    • /
    • 한국지진공학회 1998년도 춘계 학술발표회 논문집 Proceedings of EESK Conference-Spring 1998
    • /
    • pp.23-30
    • /
    • 1998
  • An estimation of seismic risk parameters by seismic zones of the Korea Peninsula in order to calculate the seismic hazard values using these was erformed. Seven seismic source zones were selected in consideration of seismicity and geology of Korean Peninsula. The seismicity parameters that should be estimated are maximum intensity, activity rate and b value in the Gutenberg - Richter relation. For computation of these parameters, least square method or maximum likelihood method is applied to the earthquake data in two ways; the one for the data without maximum intensity and the other with maximum intensity. Earthquake data since Choseon Dynasty is regarded as complete and estimation of parameters was made for these data using above two ways. And recently, a new method is published that estimate the seismicity parameters using mixed data containing large historical events and recent complete observations. Therefore, this method is applied to the whole earthquake data of the Korean Peninsula. It turns out that the b value computed considering maximum intensity is slightly lower than that computed considering without maximum intensity, and it becomes still lower when the incomplete data prior to Choseon Dynasty is used. In the case of the activity rates, the values obtained without maximum intensity and that with maximum intensity are similar, though they are lower when the incomplete data is used. The values of maximum intensities are usually lower when considering incomplete data. In the seismic source zone including the Yangsan Fault zone, however, the values are higher when considering the incomplete data.

  • PDF

불완전 자료에 대한 Metropolis-Hastings Expectation Maximization 알고리즘 연구 (Metropolis-Hastings Expectation Maximization Algorithm for Incomplete Data)

  • 전수영;이희찬
    • 응용통계연구
    • /
    • 제25권1호
    • /
    • pp.183-196
    • /
    • 2012
  • 결측자료(missing data), 절단분포(truncated distribution), 중도절단자료(censored data) 등 불완전한 자료(incomplete data)하의 추론문제(incomplete problems)는 통계학에서 자주 발생되는 현상이다. 이런 문제의 해결방법으로 Expectation Maximization, Monte Carlo Expectation Maximization, Stochastic Expectation Maximization 알고리즘 등을 이용하는 방법이 있지만, 정형화된 분포의 가정이 필요하다는 단점을 가지고 있다. 본 연구에서는 정형화된 분포의 가정이 없는 경우에 사용할 수 있는 Metropolis-Hastings Expectation Maximization(MHEM) 알고리즘을 제안하고자 한다. MHEM 알고리즘의 효율성은 중도절단자료(censored data)를 이용한 모의실험과 KOSPI 200 수익률의 실증자료분석를 통해 알수 있었다.

다수의 결측치가 존재하는 가전업 고객 데이터 활용을 위한 고객분류기법의 개발 (Customer Classification Method for Household Appliances Industries with a Large Number of Incomplete Data)

  • 장영순;서종현
    • 산업공학
    • /
    • 제19권1호
    • /
    • pp.86-96
    • /
    • 2006
  • Some customer data of manufacturing industries have a large number of incomplete data set due to the customer's infrequent purchasing behavior and the limitation of customer profile data gathered from sales representatives. So that, most sophisticated data analysis methods may not be applied directly. This paper proposes a heuristic data analysis method to classify customers in household appliances industries. The proposed PD (percent of difference) method can be used for the discriminant analysis of incomplete customer data with simple mathematical calculations. The method is composed of variable distribution estimation step, PD measure and cluster score evaluation steps, variable impact construction step, and segment assignment step. A real example is also presented.

Frequency Matrix 기법을 이용한 결측치 자료로부터의 개인신용예측 (Predicting Personal Credit Rating with Incomplete Data Sets Using Frequency Matrix technique)

  • 배재권;김진화;황국재
    • Journal of Information Technology Applications and Management
    • /
    • 제13권4호
    • /
    • pp.273-290
    • /
    • 2006
  • This study suggests a frequency matrix technique to predict personal credit rate more efficiently using incomplete data sets. At first this study test on multiple discriminant analysis and logistic regression analysis for predicting personal credit rate with incomplete data sets. Missing values are predicted with mean imputation method and regression imputation method here. An artificial neural network and frequency matrix technique are also tested on their performance in predicting personal credit rating. A data set of 8,234 customers in 2004 on personal credit information of Bank A are collected for the test. The performance of frequency matrix technique is compared with that of other methods. The results from the experiments show that the performance of frequency matrix technique is superior to that of all other models such as MDA-mean, Logit-mean, MDA-regression, Logit-regression, and artificial neural networks.

  • PDF

Reject Inference of Incomplete Data Using a Normal Mixture Model

  • Song, Ju-Won
    • 응용통계연구
    • /
    • 제24권2호
    • /
    • pp.425-433
    • /
    • 2011
  • Reject inference in credit scoring is a statistical approach to adjust for nonrandom sample bias due to rejected applicants. Function estimation approaches are based on the assumption that rejected applicants are not necessary to be included in the estimation, when the missing data mechanism is missing at random. On the other hand, the density estimation approach by using mixture models indicates that reject inference should include rejected applicants in the model. When mixture models are chosen for reject inference, it is often assumed that data follow a normal distribution. If data include missing values, an application of the normal mixture model to fully observed cases may cause another sample bias due to missing values. We extend reject inference by a multivariate normal mixture model to handle incomplete characteristic variables. A simulation study shows that inclusion of incomplete characteristic variables outperforms the function estimation approaches.

불완전 시계열 데이터를 위한 이산 HMM 학습 알고리듬 (Discrete HMM Training Algorithm for Incomplete Time Series Data)

  • 신봉기
    • 한국멀티미디어학회논문지
    • /
    • 제19권1호
    • /
    • pp.22-29
    • /
    • 2016
  • Hidden Markov Model is one of the most successful and popular tools for modeling real world sequential data. Real world signals come in a variety of shapes and variabilities, among which temporal and spectral ones are the prime targets that the HMM aims at. A new problem that is gaining increasing attention is characterizing missing observations in incomplete data sequences. They are incomplete in that there are holes or omitted measurements. The standard HMM algorithms have been developed for complete data with a measurements at each regular point in time. This paper presents a modified algorithm for a discrete HMM that allows substantial amount of omissions in the input sequence. Basically it is a variant of Baum-Welch which explicitly considers the case of isolated or a number of omissions in succession. The algorithm has been tested on online handwriting samples expressed in direction codes. An extensive set of experiments show that the HMM so modeled are highly flexible showing a consistent and robust performance regardless of the amount of omissions.