• 제목/요약/키워드: kernel distribution estimation

검색결과 79건 처리시간 0.021초

극단값 분포 추정을 위한 모수적 비모수적 방법 (Parametric nonparametric methods for estimating extreme value distribution)

  • 우승현;강기훈
    • 문화기술의 융합
    • /
    • 제8권1호
    • /
    • pp.531-536
    • /
    • 2022
  • 본 논문은 꼬리가 두꺼운 분포의 꼬리부분에 대한 분포를 추정할 경우 모수적 방법과 비모수적 방법의 성능에 대해 비교하였다. 모수적 방법으로는 일반화 극단값 분포와 일반화 파레토 분포를 이용하였고, 비모수적 방법은 커널형 확률밀도함수 추정방법을 적용하였다. 두 접근법의 비교를 위해 2014년부터 2018년까지 서울시 관측소별 일일 미세먼지 공공데이터를 이용하여 블록 최댓값 모형과 분계점 초과치 모형을 적용하여 함수 추정한 결과를 함께 보이고 2년, 5년, 10년의 재현수준을 통해 고농도의 미세먼지가 일어날 지역을 예측하였다.

Identification of the associations between genes and quantitative traits using entropy-based kernel density estimation

  • Yee, Jaeyong;Park, Taesung;Park, Mira
    • Genomics & Informatics
    • /
    • 제20권2호
    • /
    • pp.17.1-17.11
    • /
    • 2022
  • Genetic associations have been quantified using a number of statistical measures. Entropy-based mutual information may be one of the more direct ways of estimating the association, in the sense that it does not depend on the parametrization. For this purpose, both the entropy and conditional entropy of the phenotype distribution should be obtained. Quantitative traits, however, do not usually allow an exact evaluation of entropy. The estimation of entropy needs a probability density function, which can be approximated by kernel density estimation. We have investigated the proper sequence of procedures for combining the kernel density estimation and entropy estimation with a probability density function in order to calculate mutual information. Genotypes and their interactions were constructed to set the conditions for conditional entropy. Extensive simulation data created using three types of generating functions were analyzed using two different kernels as well as two types of multifactor dimensionality reduction and another probability density approximation method called m-spacing. The statistical power in terms of correct detection rates was compared. Using kernels was found to be most useful when the trait distributions were more complex than simple normal or gamma distributions. A full-scale genomic dataset was explored to identify associations using the 2-h oral glucose tolerance test results and γ-glutamyl transpeptidase levels as phenotypes. Clearly distinguishable single-nucleotide polymorphisms (SNPs) and interacting SNP pairs associated with these phenotypes were found and listed with empirical p-values.

다양한 대역폭 선택법에 따른 커널밀도추정의 비교 연구 (Comparison Study of Kernel Density Estimation according to Various Bandwidth Selectors)

  • 강영진;노유정
    • 한국전산구조공학회논문집
    • /
    • 제32권3호
    • /
    • pp.173-181
    • /
    • 2019
  • 제한된 실험 데이터로부터 확률분포함수를 추정하기 위해서 KDE가 많이 사용되고 있다. KDE에 의한 분포함수는 대역폭 선택법에 따라서 실험 데이터에 대해 평활하거나 과대적합된 커널 추정치를 생성한다. 본 연구에서는 Silverman's rule of thumb, rule using adaptive estimate, oversmoothing rule을 사용해서 각 방법에 따른 정확성과 보수적인 성향을 비교하였다. 비교를 위해서 단봉분포와 다봉분포를 가지는 실제 모델을 가정하고 통계적 시뮬레이션을 수행한 다음 다양한 데이터의 개수에 따른 추정된 분포함수의 정확도와 보수성을 비교하였다. 또한, 간단한 신뢰성 예제를 통해 대역폭 선택법에 따른 KDE의 추정된 분포가 신뢰성 해석 결과에 어떻게 영향을 미치는지 확인하였다.

A Berry-Esseen Type Bound in Kernel Density Estimation for a Random Left-Truncation Model

  • Asghari, P.;Fakoor, V.;Sarmad, M.
    • Communications for Statistical Applications and Methods
    • /
    • 제21권2호
    • /
    • pp.115-124
    • /
    • 2014
  • In this paper we derive a Berry-Esseen type bound for the kernel density estimator of a random left truncated model, in which each datum (Y) is randomly left truncated and is sampled if $Y{\geq}T$, where T is the truncation random variable with an unknown distribution. This unknown distribution is estimated with the Lynden-Bell estimator. In particular the normal approximation rate, by choice of the bandwidth, is shown to be close to $n^{-1/6}$ modulo logarithmic term. We have also investigated this normal approximation rate via a simulation study.

Minimum Distance Estimation Based On The Kernels For U-Statistics

  • Park, Hyo-Il
    • Journal of the Korean Statistical Society
    • /
    • 제27권1호
    • /
    • pp.113-132
    • /
    • 1998
  • In this paper, we consider a minimum distance (M.D.) estimation based on kernels for U-statistics. We use Cramer-von Mises type distance function which measures the discrepancy between U-empirical distribution function(d.f.) and modeled d.f. of kernel. In the distance function, we allow various integrating measures, which can be finite, $\sigma$-finite or discrete. Then we derive the asymptotic normality and study the qualitative robustness of M. D. estimates.

  • PDF

A kernel machine for estimation of mean and volatility functions

  • Shim, Joo-Yong;Park, Hye-Jung;Hwang, Chang-Ha
    • Journal of the Korean Data and Information Science Society
    • /
    • 제20권5호
    • /
    • pp.905-912
    • /
    • 2009
  • We propose a doubly penalized kernel machine (DPKM) which uses heteroscedastic location-scale model as basic model and estimates both mean and volatility functions simultaneously by kernel machines. We also present the model selection method which employs the generalized approximate cross validation techniques for choosing the hyperparameters which affect the performance of DPKM. Artificial examples are provided to indicate the usefulness of DPKM for the mean and volatility functions estimation.

  • PDF

On the Equality of Two Distributions Based on Nonparametric Kernel Density Estimator

  • Kim, Dae-Hak;Oh, Kwang-Sik
    • Journal of the Korean Data and Information Science Society
    • /
    • 제14권2호
    • /
    • pp.247-255
    • /
    • 2003
  • Hypothesis testing for the equality of two distributions were considered. Nonparametric kernel density estimates were used for testing equality of distributions. Cross-validatory choice of bandwidth was used in the kernel density estimation. Sampling distribution of considered test statistic were developed by resampling method, called the bootstrap. Small sample Monte Carlo simulation were conducted. Empirical power of considered tests were compared for variety distributions.

  • PDF

ECG Denoising by Modeling Wavelet Sub-Band Coefficients using Kernel Density Estimation

  • Ardhapurkar, Shubhada;Manthalkar, Ramchandra;Gajre, Suhas
    • Journal of Information Processing Systems
    • /
    • 제8권4호
    • /
    • pp.669-684
    • /
    • 2012
  • Discrete wavelet transforms are extensively preferred in biomedical signal processing for denoising, feature extraction, and compression. This paper presents a new denoising method based on the modeling of discrete wavelet coefficients of ECG in selected sub-bands with Kernel density estimation. The modeling provides a statistical distribution of information and noise. A Gaussian kernel with bounded support is used for modeling sub-band coefficients and thresholds and is estimated by placing a sliding window on a normalized cumulative density function. We evaluated this approach on offline noisy ECG records from the Cardiovascular Research Centre of the University of Glasgow and on records from the MIT-BIH Arrythmia database. Results show that our proposed technique has a more reliable physical basis and provides improvement in the Signal-to-Noise Ratio (SNR) and Percentage RMS Difference (PRD). The morphological information of ECG signals is found to be unaffected after employing denoising. This is quantified by calculating the mean square error between the feature vectors of original and denoised signal. MSE values are less than 0.05 for most of the cases.

다변량 확률분포함수의 추정을 위한 MKDE-ebd 개발 (Development of MKDE-ebd for Estimation of Multivariate Probabilistic Distribution Functions)

  • 강영진;노유정;임오강
    • 한국전산구조공학회논문집
    • /
    • 제32권1호
    • /
    • pp.55-63
    • /
    • 2019
  • 공학문제에서 많은 확률 변수들은 상관성을 가지고 있고, 입력변수의 상관성은 기계시스템의 통계적 성능 분석 결과에 큰 영향을 미친다. 하지만, 상관 변수들은 결합분포함수를 모델링하기 어렵다는 이유로 종종 독립변수로 취급되거나 특정한 모수적 모델로 표현되는 경우가 많으며, 특히 데이터가 적은 경우 결합분포함수를 정확히 모델링하는데 더 큰 어려움이 있다. 본 연구에서 개발된 경계데이터를 이용한 다변량 커널밀도추정은 비선형성을 갖는 다양한 형태의 다변량 확률 분포 추정을 위해 개발되었다. 다변량 커널밀도추정은 주어진 데이터와 균등분포함수의 파라미터의 신뢰구간으로부터 생성된 경계데이터를 결합하여 데이터의 질과 수에 덜 민감하다. 따라서 제안된 방법은 보수적인 통계모델링과 신뢰성 해석 결과를 도출할 수 있으며, 통계시뮬레이션과 공학예제를 통해 그 성능을 검증하였다.

Online Probability Density Estimation of Nonstationary Random Signal using Dynamic Bayesian Networks

  • Cho, Hyun-Cheol;Fadali, M. Sami;Lee, Kwon-Soon
    • International Journal of Control, Automation, and Systems
    • /
    • 제6권1호
    • /
    • pp.109-118
    • /
    • 2008
  • We present two estimators for discrete non-Gaussian and nonstationary probability density estimation based on a dynamic Bayesian network (DBN). The first estimator is for off line computation and consists of a DBN whose transition distribution is represented in terms of kernel functions. The estimator parameters are the weights and shifts of the kernel functions. The parameters are determined through a recursive learning algorithm using maximum likelihood (ML) estimation. The second estimator is a DBN whose parameters form the transition probabilities. We use an asymptotically convergent, recursive, on-line algorithm to update the parameters using observation data. The DBN calculates the state probabilities using the estimated parameters. We provide examples that demonstrate the usefulness and simplicity of the two proposed estimators.