• Title/Summary/Keyword: statistical approach

Search Result 2,375, Processing Time 0.029 seconds

Input Variable Importance in Supervised Learning Models

  • Huh, Myung-Hoe;Lee, Yong Goo
    • Communications for Statistical Applications and Methods
    • /
    • v.10 no.1
    • /
    • pp.239-246
    • /
    • 2003
  • Statisticians, or data miners, are often requested to assess the importances of input variables in the given supervised learning model. For the purpose, one may rely on separate ad hoc measures depending on modeling types, such as linear regressions, the neural networks or trees. Consequently, the conceptual consistency in input variable importance measures is lacking, so that the measures cannot be directly used in comparing different types of models, which is often done in data mining processes, In this short communication, we propose a unified approach to the importance measurement of input variables. Our method uses sensitivity analysis which begins by perturbing the values of input variables and monitors the output change. Research scope is limited to the models for continuous output, although it is not difficult to extend the method to supervised learning models for categorical outcomes.

QUASI-LIKELIHOOD REGRESSION FOR VARYING COEFFICIENT MODELS WITH LONGITUDINAL DATA

  • Kim, Choong-Rak;Jeong, Mee-Seon;Kim, Woo-Chul;Park, Byeong-U.
    • Journal of the Korean Statistical Society
    • /
    • v.33 no.4
    • /
    • pp.367-379
    • /
    • 2004
  • This article deals with the nonparametric analysis of longitudinal data when there exist possible correlations among repeated measurements for a given subject. We consider a quasi-likelihood regression model where a transformation of the regression function through a link function is linear in time-varying coefficients. We investigate the local polynomial approach to estimate the time-varying coefficients, and derive the asymptotic distribution of the estimators in this quasi-likelihood context. A real data set is analyzed as an illustrative example.

Nonparametric Bayesian Multiple Change Point Problems

  • Kim, Chansoo;Younshik Chung
    • Journal of the Korean Statistical Society
    • /
    • v.31 no.1
    • /
    • pp.1-16
    • /
    • 2002
  • Since changepoint identification is important in many data analysis problem, we wish to make inference about the locations of one or more changepoints of the sequence. We consider the Bayesian nonparameteric inference for multiple changepoint problem using a Bayesian segmentation procedure proposed by Yang and Kuo (2000). A mixture of products of Dirichlet process is used as a prior distribution. To decide whether there exists a single change or not, our approach depends on nonparametric Bayesian Schwartz information criterion at each step. We discuss how to choose the precision parameter (total mass parameter) in nonparametric setting and show that the discreteness of the Dirichlet process prior can ha17e a large effect on the nonparametric Bayesian Schwartz information criterion and leads to conclusions that are very different results from reasonable parametric model. One example is proposed to show this effect.

Moment-Based Density Approximation Algorithm for Symmetric Distributions

  • Ha, Hyung-Tae
    • Communications for Statistical Applications and Methods
    • /
    • v.14 no.3
    • /
    • pp.583-592
    • /
    • 2007
  • Given the moments of a symmetric random variable, its density and distribution functions can be accurately approximated by making use of the algorithm proposed in this paper. This algorithm is specially designed for approximating symmetric distributions and comprises of four phases. This approach is essentially based on the transformation of variable technique and moment-based density approximants expressed in terms of the product of an appropriate initial approximant and a polynomial adjustment. Probabilistic quantities such as percentage points and percentiles can also be accurately determined from approximation of the corresponding distribution functions. This algorithm is not only conceptually simple but also easy to implement. As illustrated by the first two numerical examples, the density functions so obtained are in good agreement with the exact values. Moreover, the proposed approximation algorithm can provide the more accurate quantities than direct approximation as shown in the last example.

BAYESIAN INFERENCE FOR FIELLER-CREASY PROBLEM USING UNBALANCED DATA

  • Lee, Woo-Dong;Kim, Dal-Ho;Kang, Sang-Gil
    • Journal of the Korean Statistical Society
    • /
    • v.36 no.4
    • /
    • pp.489-500
    • /
    • 2007
  • In this paper, we consider Bayesian approach to the Fieller-Creasy problem using noninformative priors. Specifically we extend the results of Yin and Ghosh (2000) to the unbalanced case. We develop some noninformative priors such as the first and second order matching priors and reference priors. Also we prove the posterior propriety under the derived noninformative priors. We compare these priors in light of how accurately the coverage probabilities of Bayesian credible intervals match the corresponding frequentist coverage probabilities.

Bayesian Multiple Change-Point Estimation and Segmentation

  • Kim, Jaehee;Cheon, Sooyoung
    • Communications for Statistical Applications and Methods
    • /
    • v.20 no.6
    • /
    • pp.439-454
    • /
    • 2013
  • This study presents a Bayesian multiple change-point detection approach to segment and classify the observations that no longer come from an initial population after a certain time. Inferences are based on the multiple change-points in a sequence of random variables where the probability distribution changes. Bayesian multiple change-point estimation is classifies each observation into a segment. We use a truncated Poisson distribution for the number of change-points and conjugate prior for the exponential family distributions. The Bayesian method can lead the unsupervised classification of discrete, continuous variables and multivariate vectors based on latent class models; therefore, the solution for change-points corresponds to the stochastic partitions of observed data. We demonstrate segmentation with real data.

Multiple imputation for competing risks survival data via pseudo-observations

  • Han, Seungbong;Andrei, Adin-Cristian;Tsui, Kam-Wah
    • Communications for Statistical Applications and Methods
    • /
    • v.25 no.4
    • /
    • pp.385-396
    • /
    • 2018
  • Competing risks are commonly encountered in biomedical research. Regression models for competing risks data can be developed based on data routinely collected in hospitals or general practices. However, these data sets usually contain the covariate missing values. To overcome this problem, multiple imputation is often used to fit regression models under a MAR assumption. Here, we introduce a multivariate imputation in a chained equations algorithm to deal with competing risks survival data. Using pseudo-observations, we make use of the available outcome information by accommodating the competing risk structure. Lastly, we illustrate the practical advantages of our approach using simulations and two data examples from a coronary artery disease data and hepatocellular carcinoma data.

Objective Bayesian inference based on upper record values from Rayleigh distribution

  • Seo, Jung In;Kim, Yongku
    • Communications for Statistical Applications and Methods
    • /
    • v.25 no.4
    • /
    • pp.411-430
    • /
    • 2018
  • The Bayesian approach is a suitable alternative in constructing appropriate models for observed record values because the number of these values is small. This paper provides an objective Bayesian analysis method for upper record values arising from the Rayleigh distribution. For the objective Bayesian analysis, the Fisher information matrix for unknown parameters is derived in terms of the second derivative of the log-likelihood function by using Leibniz's rule; subsequently, objective priors are provided, resulting in proper posterior distributions. We examine if these priors are the PMPs. In a simulation study, inference results under the provided priors are compared through Monte Carlo simulations. Through real data analysis, we reveal a limitation of the appropriate confidence interval based on the maximum likelihood estimator for the scale parameter and evaluate the models under the provided priors.

Computational explosion in the frequency estimation of sinusoidal data

  • Zhang, Kaimeng;Ng, Chi Tim;Na, Myunghwan
    • Communications for Statistical Applications and Methods
    • /
    • v.25 no.4
    • /
    • pp.431-442
    • /
    • 2018
  • This paper highlights the computational explosion issues in the autoregressive moving average approach of frequency estimation of sinusoidal data with a large sample size. A new algorithm is proposed to circumvent the computational explosion difficulty in the conditional least-square estimation method. Notice that sinusoidal pattern can be generated by a non-invertible non-stationary autoregressive moving average (ARMA) model. The computational explosion is shown to be closely related to the non-invertibility of the equivalent ARMA model. Simulation studies illustrate the computational explosion phenomenon and show that the proposed algorithm can efficiently overcome computational explosion difficulty. Real data example of sunspot number is provided to illustrate the application of the proposed algorithm to the time series data exhibiting sinusoidal pattern.

A Study on Methodology of Framework for Development of Environmental Statistics (환경통계 작성체계의 방법론적 연구)

  • Kang, Sang-Mok
    • Journal of Environmental Impact Assessment
    • /
    • v.6 no.1
    • /
    • pp.135-149
    • /
    • 1997
  • Environmental issues are currently in the forefront of the political and economic area both globaly and nationally. In the all spheres of socio-economic development and policy, it is suggested that there are need, to measure environmental impacts and to produce and disseminate environmental statistics systematically for environmentally sound and sustainable development. Specially, because environmental statistics encompass a wide spectrum of sectors from the natural to the social sciences and are dispersed among various agencies, an organized approach and compilation methods in complicated fields such as environment are required. This article includes the methodology on the framework for development of environmental statistics to advance korean environmental statistics.

  • PDF