• 제목/요약/키워드: kernel density estimation

검색결과 136건 처리시간 0.027초

The shifted Chebyshev series-based plug-in for bandwidth selection in kernel density estimation

  • Soratja Klaichim;Juthaphorn Sinsomboonthong;Thidaporn Supapakorn
    • Communications for Statistical Applications and Methods
    • /
    • 제31권3호
    • /
    • pp.337-347
    • /
    • 2024
  • Kernel density estimation is a prevalent technique employed for nonparametric density estimation, enabling direct estimation from the data itself. This estimation involves two crucial elements: selection of the kernel function and the determination of the appropriate bandwidth. The selection of the bandwidth plays an important role in kernel density estimation, which has been developed over the past decade. A range of methods is available for selecting the bandwidth, including the plug-in bandwidth. In this article, the proposed plug-in bandwidth is introduced, which leverages shifted Chebyshev series-based approximation to determine the optimal bandwidth. Through a simulation study, the performance of the suggested bandwidth is analyzed to reveal its favorable performance across a wide range of distributions and sample sizes compared to alternative bandwidths. The proposed bandwidth is also applied for kernel density estimation on real dataset. The outcomes obtained from the proposed bandwidth indicate a favorable selection. Hence, this article serves as motivation to explore additional plug-in bandwidths that rely on function approximations utilizing alternative series expansions.

A Note on Nonparametric Density Estimation for the Deconvolution Problem

  • Lee, Sung-Ho
    • Communications for Statistical Applications and Methods
    • /
    • 제15권6호
    • /
    • pp.939-946
    • /
    • 2008
  • In this paper the support vector method is presented for the probability density function estimation when the sample observations are contaminated with random noise. The performance of the procedure is compared to kernel density estimates by the simulation study.

Utilizing Order Statistics in Density Estimation

  • Kim, W.C.;Park, B.U.
    • Communications for Statistical Applications and Methods
    • /
    • 제2권2호
    • /
    • pp.227-230
    • /
    • 1995
  • In this paper, we discuss simple ways of implementing non-basic kernel density estimators which typically ceed extra pilot estimation. The methods utilize order statistics at the pilot estimation stages. We focus mainly on bariable lacation and scale kernel density estimator (Jones, Hu and McKay, 1994), but the same idea can be applied to other methods too.

  • PDF

A Note on Support Vector Density Estimation with Wavelets

  • Lee, Sung-Ho
    • Journal of the Korean Data and Information Science Society
    • /
    • 제16권2호
    • /
    • pp.411-418
    • /
    • 2005
  • We review support vector and wavelet density estimation. The relationship between support vector and wavelet density estimation in reproducing kernel Hilbert space (RKHS) is investigated in order to use wavelets as a variety of support vector kernels in support vector density estimation.

  • PDF

Jackknife Kernel Density Estimation Using Uniform Kernel Function in the Presence of k's Unidentified Outliers

  • Woo, Jung-Soo;Lee, Jang-Choon
    • Journal of the Korean Data and Information Science Society
    • /
    • 제6권1호
    • /
    • pp.85-96
    • /
    • 1995
  • The purpose of this paper is to propose the kernel density estimator and the jackknife kernel density estimator in the presence of k's unidentified outliers, and to compare the small sample performances of the proposed estimators in a sense of mean integrated square error(MISE).

  • PDF

변환(變換)을 이용(利用)한 커널함수추정추정법(函數推定推定法) (Transformation in Kernel Density Estimation)

  • 석경하
    • Journal of the Korean Data and Information Science Society
    • /
    • 제3권1호
    • /
    • pp.17-24
    • /
    • 1992
  • The problem of estimating symmetric probability density with high kurtosis is considered. Such densities are often estimated poorly by a global bandwidth kernel estimation since good estimation of the peak of the distribution leads to unsatisfactory estimation of the tails and vice versa. In this paper, we propose a transformation technique before using a global bandwidth kernel estimator. Performance of density estimator based on proposed transformation is investigated through simulation study. It is observed that our method offers a substantial improvement for the densities with high kurtosis. However, its performance is a little worse than that of ordinary kernel estimator in the situation where the kurtosis is not high.

  • PDF

커널 밀도 측정에서의 나이브 베이스 접근 방법 (Naive Bayes Approach in Kernel Density Estimation)

  • 샹총량;유샹루;아메드 압둘하킴 알-압시;강대기
    • 한국정보통신학회:학술대회논문집
    • /
    • 한국정보통신학회 2014년도 춘계학술대회
    • /
    • pp.76-78
    • /
    • 2014
  • 나이브 베이스 학습은 유명하면서도, 빠르면서도 효과적인 지도 학습 방법으로, 다소 잡음을 가진 라벨이 있는 데이터집합을 다루는 데 좋은 성능을 보인다. 그러나, 나이브 베이스의 조건적 독립성 가정은 실세계 데이터를 다루는 데 필요한 특성에 다소 제약사항을 가지게 한다. 지금까지 연구자들이 이 조건적 독립성 가정을 완화시키는 방법들을 제안해 왔다. 이러한 방법들은 어트리뷰트 가중치, 커널 밀도 측정 등이 있다. 본 논문에서, 우리는 커널 밀도 측정과 어트리뷰트 가증치를 이용하여 나이브 베이스의 학습 효과를 개선하기 위한 NB Based on Attribute Weighting in Kernel Density Estimation (NBAWKDE) 이라는 새로운 접근 방법을 제안한다.

  • PDF

On Practical Efficiency of Locally Parametric Nonparametric Density Estimation Based on Local Likelihood Function

  • Kang, Kee-Hoon;Han, Jung-Hoon
    • Communications for Statistical Applications and Methods
    • /
    • 제10권2호
    • /
    • pp.607-617
    • /
    • 2003
  • This paper offers a practical comparison of efficiency between local likelihood approach and conventional kernel approach in density estimation. The local likelihood estimation procedure maximizes a kernel smoothed log-likelihood function with respect to a polynomial approximation of the log likelihood function. We use two types of data driven bandwidths for each method and compare the mean integrated squares for several densities. Numerical results reveal that local log-linear approach with simple plug-in bandwidth shows better performance comparing to the standard kernel approach in heavy tailed distribution. For normal mixture density cases, standard kernel estimator with the bandwidth in Sheather and Jones(1991) dominates the others in moderately large sample size.

나이브 베이스에서의 커널 밀도 측정과 상호 정보량 (Mutual Information in Naive Bayes with Kernel Density Estimation)

  • 샹총량;유샹루;강대기
    • 한국정보통신학회:학술대회논문집
    • /
    • 한국정보통신학회 2014년도 춘계학술대회
    • /
    • pp.86-88
    • /
    • 2014
  • 나이브 베이스가 가지는 가정은 실세계 데이터를 분류함에 있어 해로운 효과를 보이곤 한다. 이러한 가정을 완화하기 위해, 우리는 Naive Bayes Mutual Information Attribute Weighting with Smooth Kernel Density Estimation (NBMIKDE) 접근 방법을 소개한다. NBMIKDE는 애트리뷰트를 위한 스무드 커널과 상호 정보량 측정값을 기반으로 하는 어트리뷰트 가중치 기법을 조합한 것이다.

  • PDF

Identification of the associations between genes and quantitative traits using entropy-based kernel density estimation

  • Yee, Jaeyong;Park, Taesung;Park, Mira
    • Genomics & Informatics
    • /
    • 제20권2호
    • /
    • pp.17.1-17.11
    • /
    • 2022
  • Genetic associations have been quantified using a number of statistical measures. Entropy-based mutual information may be one of the more direct ways of estimating the association, in the sense that it does not depend on the parametrization. For this purpose, both the entropy and conditional entropy of the phenotype distribution should be obtained. Quantitative traits, however, do not usually allow an exact evaluation of entropy. The estimation of entropy needs a probability density function, which can be approximated by kernel density estimation. We have investigated the proper sequence of procedures for combining the kernel density estimation and entropy estimation with a probability density function in order to calculate mutual information. Genotypes and their interactions were constructed to set the conditions for conditional entropy. Extensive simulation data created using three types of generating functions were analyzed using two different kernels as well as two types of multifactor dimensionality reduction and another probability density approximation method called m-spacing. The statistical power in terms of correct detection rates was compared. Using kernels was found to be most useful when the trait distributions were more complex than simple normal or gamma distributions. A full-scale genomic dataset was explored to identify associations using the 2-h oral glucose tolerance test results and γ-glutamyl transpeptidase levels as phenotypes. Clearly distinguishable single-nucleotide polymorphisms (SNPs) and interacting SNP pairs associated with these phenotypes were found and listed with empirical p-values.