• Title/Summary/Keyword: Kernel Density Estimate

Search Result 35, Processing Time 0.033 seconds

Comparison Study of Kernel Density Estimation according to Various Bandwidth Selectors (다양한 대역폭 선택법에 따른 커널밀도추정의 비교 연구)

  • Kang, Young-Jin;Noh, Yoojeong
    • Journal of the Computational Structural Engineering Institute of Korea
    • /
    • v.32 no.3
    • /
    • pp.173-181
    • /
    • 2019
  • To estimate probabilistic distribution function from experimental data, kernel density estimation(KDE) is mostly used in cases when data is insufficient. The estimated distribution using KDE depends on bandwidth selectors that smoothen or overfit a kernel estimator to experimental data. In this study, various bandwidth selectors such as the Silverman's rule of thumb, rule using adaptive estimates, and oversmoothing rule, were compared for accuracy and conservativeness. For this, statistical simulations were carried out using assumed true models including unimodal and multimodal distributions, and, accuracies and conservativeness of estimating distribution functions were compared according to various data. In addition, it was verified how the estimated distributions using KDE with different bandwidth selectors affect reliability analysis results through simple reliability examples.

Bandwidth selection for discontinuity point estimation in density (확률밀도함수의 불연속점 추정을 위한 띠폭 선택)

  • Huh, Jib
    • Journal of the Korean Data and Information Science Society
    • /
    • v.23 no.1
    • /
    • pp.79-87
    • /
    • 2012
  • In the case that the probability density function has a discontinuity point, Huh (2002) estimated the location and jump size of the discontinuity point based on the difference between the right and left kernel density estimators using the one-sided kernel function. In this paper, we consider the cross-validation, made by the right and left maximum likelihood cross-validations, for the bandwidth selection in order to estimate the location and jump size of the discontinuity point. This method is motivated by the one-sided cross-validation of Hart and Yi (1998). The finite sample performance is illustrated by simulated example.

Problems Occurred with Histogram and a Resolution

  • Park, Byeong Uk;Park, Hong Nae;Song, Moon Sup;Song, Jae Kee
    • Journal of Korean Society for Quality Management
    • /
    • v.18 no.2
    • /
    • pp.127-133
    • /
    • 1990
  • In this article, several problems inherent in histogram estimate of unknown probability density function are discussed. Those include so called sharp comers and bin edge effect. A resolution for these problems occurred with histogram is discussed. The resulting estimate is called kernel density estimate which is most widely used by data analysts. One of the most recent and reliable data-based choices of scale factor (bandwidth) of the estimate, which has been known to be most crucial, is also discussed.

  • PDF

An Algorithm of Score Function Generation using Convolution-FFT in Independent Component Analysis (독립성분분석에서 Convolution-FFT을 이용한 효율적인 점수함수의 생성 알고리즘)

  • Kim Woong-Myung;Lee Hyon-Soo
    • The KIPS Transactions:PartB
    • /
    • v.13B no.1 s.104
    • /
    • pp.27-34
    • /
    • 2006
  • In this study, we propose this new algorithm that generates score function in ICA(Independent Component Analysis) using entropy theory. To generate score function, estimation of probability density function about original signals are certainly necessary and density function should be differentiated. Therefore, we used kernel density estimation method in order to derive differential equation of score function by original signal. After changing formula to convolution form to increase speed of density estimation, we used FFT algorithm that can calculate convolution faster. Proposed score function generation method reduces the errors, it is density difference of recovered signals and originals signals. In the result of computer simulation, we estimate density function more similar to original signals compared with Extended Infomax and Fixed Point ICA in blind source separation problem and get improved performance at the SNR(Signal to Noise Ratio) between recovered signals and original signal.

Initialization of Fuzzy C-Means Using Kernel Density Estimation (커널 밀도 추정을 이용한 Fuzzy C-Means의 초기화)

  • Heo, Gyeong-Yong;Kim, Kwang-Baek
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.15 no.8
    • /
    • pp.1659-1664
    • /
    • 2011
  • Fuzzy C-Means (FCM) is one of the most widely used clustering algorithms and has been used in many applications successfully. However, FCM has some shortcomings and initial prototype selection is one of them. As FCM is only guaranteed to converge on a local optimum, different initial prototype results in different clustering. Therefore, much care should be given to the selection of initial prototype. In this paper, a new initialization method for FCM using kernel density estimation (KDE) is proposed to resolve the initialization problem. KDE can be used to estimate non-parametric data distribution and is useful in estimating local density. After KDE, in the proposed method, one initial point is placed at the most dense region and the density of that region is reduced. By iterating the process, initial prototype can be obtained. The initial prototype such obtained showed better result than the randomly selected one commonly used in FCM, which was demonstrated by experimental results.

A Study on the Trade Area Analysis Model based on GIS - A Case of Huff probability model - (GIS 기반의 상권분석 모형 연구 - Huff 확률모형을 중심으로 -)

  • Son, Young-Gi;An, Sang-Hyun;Shin, Young-Chul
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.10 no.2
    • /
    • pp.164-171
    • /
    • 2007
  • This research used GIS spatial analysis model and Huff probability model and achieved trade area analysis of area center. we constructed basic maps that were surveyed according to types of business, number of households etc. using a land registration map of LMIS(Land Management Information System) in Bokdae-dong, Cheongju-si. Kernel density function and NNI(Nearest Neighbor Index) was used to estimate store distribution center area in neighborhood life zones. The center point of area and scale were estimated by means of the center area. Huff probability model was used in abstracting trade areas according to estimated center areas, those was drew map. Therefore, this study describes method that can apply in Huff probability model through kernel density function and NNI of GIS spatial analysis techniques. A trade area was abstracted more exactly by taking advantage of this method, which will can aid merchant for the foundation of small sized enterprises.

  • PDF

Simulation of Hourly Precipitation using Nonhomogeneous Markov Chain Model and Derivation of Rainfall Mass Curve using Transition Probability (비동질성 Markov 모형에 의한 시간강수량 모의 발생과 천이확률을 이용한 강우의 시간분포 유도)

  • Choi, Byung-Kyu;Oh, Tae-Suk;Park, Rae-Gun;Moon, Young-Il
    • Journal of Korea Water Resources Association
    • /
    • v.41 no.3
    • /
    • pp.265-276
    • /
    • 2008
  • The observed data of enough period need for design of hydrological works. But, most hydrological data aren't enough. Therefore in this paper, hourly precipitation generated by nonhomogeneous Markov chain model using variable Kernel density function. First, the Kernel estimator is used to estimate the transition probabilities. Second, wet hours are decided by transition probabilities and random numbers. Third, the amount of precipitation of each hours is calculated by the Kernel density function that estimated from observed data. At the results, observed precipitation data and generated precipitation data have similar statistic. Also, rainfall mass curve is derived by calculated transition probabilities for generation of hourly precipitation.

Bandwidth selections based on cross-validation for estimation of a discontinuity point in density (교차타당성을 이용한 확률밀도함수의 불연속점 추정의 띠폭 선택)

  • Huh, Jib
    • Journal of the Korean Data and Information Science Society
    • /
    • v.23 no.4
    • /
    • pp.765-775
    • /
    • 2012
  • The cross-validation is a popular method to select bandwidth in all types of kernel estimation. The maximum likelihood cross-validation, the least squares cross-validation and biased cross-validation have been proposed for bandwidth selection in kernel density estimation. In the case that the probability density function has a discontinuity point, Huh (2012) proposed a method of bandwidth selection using the maximum likelihood cross-validation. In this paper, two forms of cross-validation with the one-sided kernel function are proposed for bandwidth selection to estimate the location and jump size of the discontinuity point of density. These methods are motivated by the least squares cross-validation and the biased cross-validation. By simulated examples, the finite sample performances of two proposed methods with the one of Huh (2012) are compared.

Development of MKDE-ebd for Estimation of Multivariate Probabilistic Distribution Functions (다변량 확률분포함수의 추정을 위한 MKDE-ebd 개발)

  • Kang, Young-Jin;Noh, Yoojeong;Lim, O-Kaung
    • Journal of the Computational Structural Engineering Institute of Korea
    • /
    • v.32 no.1
    • /
    • pp.55-63
    • /
    • 2019
  • In engineering problems, many random variables have correlation, and the correlation of input random variables has a great influence on reliability analysis results of the mechanical systems. However, correlated variables are often treated as independent variables or modeled by specific parametric joint distributions due to difficulty in modeling joint distributions. Especially, when there are insufficient correlated data, it becomes more difficult to correctly model the joint distribution. In this study, multivariate kernel density estimation with bounded data is proposed to estimate various types of joint distributions with highly nonlinearity. Since it combines given data with bounded data, which are generated from confidence intervals of uniform distribution parameters for given data, it is less sensitive to data quality and number of data. Thus, it yields conservative statistical modeling and reliability analysis results, and its performance is verified through statistical simulation and engineering examples.

Stochastic simulation models with non-parametric approaches: Case study for the Colorado River basin

  • Lee, Tae-Sam;Salas, Jose D.;Prairie, James R.;Frevert, Donald;Fulp, Terry
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2010.05a
    • /
    • pp.283-287
    • /
    • 2010
  • Stochastic simulation of hydrologic data has been widely developed for several decades. However, despite the several advances made in literature still a number of limitations and problems remain. In the current study, some stochastic simulation approaches tackling some of the existing problems are discussed. The presented models are based on nonparametric techniques such as block bootstrapping, and K-nearest neighbor resampling (KNNR), and kernel density estimate (KDE). Three different types of the presented stochastic simulation models are (1) Pilot Gamma Kernel estimate with KNNR (a single site case) and (2) Enhanced Nonparametric Disaggregation with Genetic Algorithm (a disaggregation case). We applied these models to one of the most challenging and critical river basins in USA, the Colorado River. These models are embedded into the hydrological software package, Pros and cons of the models compared with existing models are presented through basic statistics and drought and storage-related statistics.

  • PDF