• 제목/요약/키워드: Kullback Information Distance

검색결과 17건 처리시간 0.02초

Direct Divergence Approximation between Probability Distributions and Its Applications in Machine Learning

  • Sugiyama, Masashi;Liu, Song;du Plessis, Marthinus Christoffel;Yamanaka, Masao;Yamada, Makoto;Suzuki, Taiji;Kanamori, Takafumi
    • Journal of Computing Science and Engineering
    • /
    • 제7권2호
    • /
    • pp.99-111
    • /
    • 2013
  • Approximating a divergence between two probability distributions from their samples is a fundamental challenge in statistics, information theory, and machine learning. A divergence approximator can be used for various purposes, such as two-sample homogeneity testing, change-point detection, and class-balance estimation. Furthermore, an approximator of a divergence between the joint distribution and the product of marginals can be used for independence testing, which has a wide range of applications, including feature selection and extraction, clustering, object matching, independent component analysis, and causal direction estimation. In this paper, we review recent advances in divergence approximation. Our emphasis is that directly approximating the divergence without estimating probability distributions is more sensible than a naive two-step approach of first estimating probability distributions and then approximating the divergence. Furthermore, despite the overwhelming popularity of the Kullback-Leibler divergence as a divergence measure, we argue that alternatives such as the Pearson divergence, the relative Pearson divergence, and the $L^2$-distance are more useful in practice because of their computationally efficient approximability, high numerical stability, and superior robustness against outliers.

최대 엔트로피 분포를 이용한 퍼지 관측데이터의 분석법에 관한 연구 (An Analysis of Fuzzy Survey Data Based on the Maximum Entropy Principle)

  • 유재휘;유동일
    • 한국컴퓨터정보학회논문지
    • /
    • 제3권2호
    • /
    • pp.131-138
    • /
    • 1998
  • 통상 통계적인 데이터 해석에서 취급되는 데이터는 확정된 값으로서 통계 처리를실시한다. 그러나 복잡˙대규모화하는 현대의 시스템에 있어서는 정확하게 측정된 데이터만을 취급하는 것은 곤란하며 인간의 주관적인 판단에 따른 데이터를 수집하는 경우가 발생하게 된다. 본 연구에서는 이러한 인간의 주관적인 판단에 따른 데이터를 퍼지 관측 데이터로하여(언어 변수에 의해 Membership 함수를 정의한다.) 최대 엔트로피 원리를 이용한 새로운 분석 방법을 제안한다. 또한 보다 현실적인 상황 아래 시뮬레이션을 실시함으로서 제안모델의 유효성을 검증한다.

  • PDF

Bayesian Model Selection in the Unbalanced Random Effect Model

  • Kim, Dal-Ho;Kang, Sang-Gil;Lee, Woo-Dong
    • Journal of the Korean Data and Information Science Society
    • /
    • 제15권4호
    • /
    • pp.743-752
    • /
    • 2004
  • In this paper, we develop the Bayesian model selection procedure using the reference prior for comparing two nested model such as the independent and intraclass models using the distance or divergence between the two as the basis of comparison. A suitable criterion for this is the power divergence measure as introduced by Cressie and Read(1984). Such a measure includes the Kullback -Liebler divergence measures and the Hellinger divergence measure as special cases. For this problem, the power divergence measure turns out to be a function solely of $\rho$, the intraclass correlation coefficient. Also, this function is convex, and the minimum is attained at $\rho=0$. We use reference prior for $\rho$. Due to the duality between hypothesis tests and set estimation, the hypothesis testing problem can also be solved by solving a corresponding set estimation problem. The present paper develops Bayesian method based on the Kullback-Liebler and Hellinger divergence measures, rejecting $H_0:\rho=0$ when the specified divergence measure exceeds some number d. This number d is so chosen that the resulting credible interval for the divergence measure has specified coverage probability $1-{\alpha}$. The length of such an interval is compared with the equal two-tailed credible interval and the HPD credible interval for $\rho$ with the same coverage probability which can also be inverted into acceptance regions of $H_0:\rho=0$. Example is considered where the HPD interval based on the one-at- a-time reference prior turns out to be the shortest credible interval having the same coverage probability.

  • PDF

On the comparison of cumulative hazard functions

  • Park, Sangun;Ha, Seung Ah
    • Communications for Statistical Applications and Methods
    • /
    • 제26권6호
    • /
    • pp.623-633
    • /
    • 2019
  • This paper proposes two distance measures between two cumulative hazard functions that can be obtained by comparing their difference and ratio, respectively. Then we estimate the measures and present goodness of t test statistics. Since the proposed test statistics are expressed in terms of the cumulative hazard functions, we can easily give more weights on earlier (or later) departures in cumulative hazards if we like to place an emphasis on earlier (or later) departures. We also show that these test statistics present comparable performances with other well-known test statistics based on the empirical distribution function for an exponential null distribution. The proposed test statistic is an omnibus test which is applicable to other lots of distributions than an exponential distribution.

Secure and Robust Clustering for Quantized Target Tracking in Wireless Sensor Networks

  • Mansouri, Majdi;Khoukhi, Lyes;Nounou, Hazem;Nounou, Mohamed
    • Journal of Communications and Networks
    • /
    • 제15권2호
    • /
    • pp.164-172
    • /
    • 2013
  • We consider the problem of secure and robust clustering for quantized target tracking in wireless sensor networks (WSN) where the observed system is assumed to evolve according to a probabilistic state space model. We propose a new method for jointly activating the best group of candidate sensors that participate in data aggregation, detecting the malicious sensors and estimating the target position. Firstly, we select the appropriate group in order to balance the energy dissipation and to provide the required data of the target in the WSN. This selection is also based on the transmission power between a sensor node and a cluster head. Secondly, we detect the malicious sensor nodes based on the information relevance of their measurements. Then, we estimate the target position using quantized variational filtering (QVF) algorithm. The selection of the candidate sensors group is based on multi-criteria function, which is computed by using the predicted target position provided by the QVF algorithm, while the malicious sensor nodes detection is based on Kullback-Leibler distance between the current target position distribution and the predicted sensor observation. The performance of the proposed method is validated by simulation results in target tracking for WSN.

영역 기반의 Multi-level Thresholding에 의한 컬러 영상 분할 (Region-based Multi-level Thresholding for Color Image Segmentation)

  • 오준택;김욱현
    • 대한전자공학회논문지SP
    • /
    • 제43권6호
    • /
    • pp.20-27
    • /
    • 2006
  • Multi-level thresholding은 영상 분할 방법 중 하나로 널리 이용되고 있지만 대부분의 기존 논문들은 응용 분야에 직접적으로 이용되기에는 적합하지 않거나 영상 분할 단계까지 확장되지 않고 있다. 본 논문에서는 영상 분할을 위한 multi-level thresholding 방안으로써 영역 단위의 multi-level thresholding을 제안한다. 먼저, 영상의 색상별 성분에 대해서 EWFCM(Entropy-based Weighted Fuzzy C-Means) 알고리즘을 적용하여 2개의 군집으로 분류한 후 코드 영상을 생성한다. EWFCM 알고리즘은 화소들에 대한 공간 정보를 추가한 개선된 FCM 알고리즘으로 영상 내 존재하는 잡음을 제거한다. 그리고 코드 영상에 존재하는 군집의 수를 감소함으로써 좀 더 나은 영상 분할 결과를 얻을 수 있으며 군집의 감소는 하나의 군집내에 존재하는 영역들과 나머지 군집들간의 유사도를 기반으로 영역을 재분류함으로써 처리된다. 그러나 영상에는 여전히 많은 영역들이 존재하기 때문에 이를 해결하기 위한 하나의 후처리 방안으로써 영역간의 Kullback-Leibler 거리값을 기반으로 Bayesian 알고리즘에 의한 영역 합병을 수행한다. 실험 결과 제안한 영역 기반의 multi-level thresholding은 기존 방법이나 화소나 군집 기반의 multi-level thresholding보다 좋은 분할 결과를 보였으며 Bayesian 알고리즘을 이용한 후처리 방안에 의해 좀 더 나은 결과를 보였다.

A Study on Particle Filter based on KLD-Resampling for Wireless Patient Tracking

  • Ly-Tu, Nga;Le-Tien, Thuong;Mai, Linh
    • Industrial Engineering and Management Systems
    • /
    • 제16권1호
    • /
    • pp.92-102
    • /
    • 2017
  • In this paper, we consider a typical health care system via the help of Wireless Sensor Network (WSN) for wireless patient tracking. The wireless patient tracking module of this system performs localization out of samples of Received Signal Strength (RSS) variations and tracking through a Particle Filter (PF) for WSN assisted by multiple transmit-power information. We propose a modified PF, Kullback-Leibler Distance (KLD)-resampling PF, to ameliorate the effect of RSS variations by generating a sample set near the high-likelihood region for improving the wireless patient tracking. The key idea of this method is to approximate a discrete distribution with an upper bound error on the KLD for reducing both location error and the number of particles used. To determine this bound error, an optimal algorithm is proposed based on the maximum gap error between the proposal and Sampling Important Resampling (SIR) algorithms. By setting up these values, a number of simulations using the health care system's data sets which contains the real RSSI measurements to evaluate the location error in term of various power levels and density nodes for all methods. Finally, we point out the effect of different power levels vs. different density nodes for the wireless patient tracking.