• Title/Summary/Keyword: penalized spline regression

Search Result 11, Processing Time 0.022 seconds

An Outlier Detection Method in Penalized Spline Regression Models (벌점 스플라인 회귀모형에서의 이상치 탐지방법)

  • Seo, Han Son;Song, Ji Eun;Yoon, Min
    • The Korean Journal of Applied Statistics
    • /
    • v.26 no.4
    • /
    • pp.687-696
    • /
    • 2013
  • The detection and the examination of outliers are important parts of data analysis because some outliers in the data may have a detrimental effect on statistical analysis. Outlier detection methods have been discussed by many authors. In this article, we propose to apply Hadi and Simonoff's (1993) method to penalized spline a regression model to detect multiple outliers. Simulated data sets and real data sets are used to illustrate and compare the proposed procedure to a penalized spline regression and a robust penalized spline regression.

A Penalized Spline Based Method for Detecting the DNA Copy Number Alteration in an Array-CGH Experiment

  • Kim, Byung-Soo;Kim, Sang-Cheol
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.1
    • /
    • pp.115-127
    • /
    • 2009
  • The purpose of statistical analyses of array-CGH experiment data is to divide the whole genome into regions of equal copy number, to quantify the copy number in each region and finally to evaluate its significance of being different from two. Several statistical procedures have been proposed which include the circular binary segmentation, and a Gaussian based local regression for detecting break points (GLAD) by estimating a piecewise constant function. We propose in this note a penalized spline regression and its simultaneous confidence band(SCB) approach to evaluate the statistical significance of regions of genetic gain/loss. The region of which the simultaneous confidence band stays above 0 or below 0 can be considered as a region of genetic gain or loss. We compare the performance of the SCB procedure with GLAD and hidden Markov model approaches through a simulation study in which the data were generated from AR(1) and AR(2) models to reflect spatial dependence of the array-CGH data in addition to the independence model. We found that the SCB method is more sensitive in detecting the low level copy number alterations.

Semiparametric Bayesian Estimation under Structural Measurement Error Model

  • Hwang, Jin-Seub;Kim, Dal-Ho
    • Communications for Statistical Applications and Methods
    • /
    • v.17 no.4
    • /
    • pp.551-560
    • /
    • 2010
  • This paper considers a Bayesian approach to modeling a flexible regression function under structural measurement error model. The regression function is modeled based on semiparametric regression with penalized splines. Model fitting and parameter estimation are carried out in a hierarchical Bayesian framework using Markov chain Monte Carlo methodology. Their performances are compared with those of the estimators under structural measurement error model without a semiparametric component.

Semiparametric Bayesian estimation under functional measurement error model

  • Hwang, Jin-Seub;Kim, Dal-Ho
    • Journal of the Korean Data and Information Science Society
    • /
    • v.21 no.2
    • /
    • pp.379-385
    • /
    • 2010
  • This paper considers Bayesian approach to modeling a flexible regression function under functional measurement error model. The regression function is modeled based on semiparametric regression with penalized splines. Model fitting and parameter estimation are carried out in a hierarchical Bayesian framework using Markov chain Monte Carlo methodology. Their performances are compared with those of the estimators under functional measurement error model without semiparametric component.

Bayesian Curve-Fitting in Semiparametric Small Area Models with Measurement Errors

  • Hwang, Jinseub;Kim, Dal Ho
    • Communications for Statistical Applications and Methods
    • /
    • v.22 no.4
    • /
    • pp.349-359
    • /
    • 2015
  • We study a semiparametric Bayesian approach to small area estimation under a nested error linear regression model with area level covariate subject to measurement error. Consideration is given to radial basis functions for the regression spline and knots on a grid of equally spaced sample quantiles of covariate with measurement errors in the nested error linear regression model setup. We conduct a hierarchical Bayesian structural measurement error model for small areas and prove the propriety of the joint posterior based on a given hierarchical Bayesian framework since some priors are defined non-informative improper priors that uses Markov Chain Monte Carlo methods to fit it. Our methodology is illustrated using numerical examples to compare possible models based on model adequacy criteria; in addition, analysis is conducted based on real data.

Negative Binomial Varying Coefficient Partially Linear Models

  • Kim, Young-Ju
    • Communications for Statistical Applications and Methods
    • /
    • v.19 no.6
    • /
    • pp.809-817
    • /
    • 2012
  • We propose a semiparametric inference for a generalized varying coefficient partially linear model(VCPLM) for negative binomial data. The VCPLM is useful to model real data in that varying coefficients are a special type of interaction between explanatory variables and partially linear models fit both parametric and nonparametric terms. The negative binomial distribution often arise in modelling count data which usually are overdispersed. The varying coefficient function estimators and regression parameters in generalized VCPLM are obtained by formulating a penalized likelihood through smoothing splines for negative binomial data when the shape parameter is known. The performance of the proposed method is then evaluated by simulations.

Semiparametric Regression Splines in Matched Case-Control Studies

  • Kim, In-Young;Carroll, Raymond J.;Cohen, Noah
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2003.05a
    • /
    • pp.167-170
    • /
    • 2003
  • We develop semiparametric methods for matched case-control studies using regression splines. Three methods are developed: an approximate crossvalidation scheme to estimate the smoothing parameter inherent in regression splines, as well as Monte Carlo Expectation Maximization (MCEM) and Bayesian methods to fit the regression spline model. We compare the approximate cross-validation approach, MCEM and Bayesian approaches using simulation, showing that they appear approximately equally efficient, with the approximate cross-validation method being computationally the most convenient. An example from equine epidemiology that motivated the work is used to demonstrate our approaches.

  • PDF

Bayesian curve-fitting with radial basis functions under functional measurement error model

  • Hwang, Jinseub;Kim, Dal Ho
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.3
    • /
    • pp.749-754
    • /
    • 2015
  • This article presents Bayesian approach to regression splines with knots on a grid of equally spaced sample quantiles of the independent variables under functional measurement error model.We consider small area model by using penalized splines of non-linear pattern. Specifically, in a basis functions of the regression spline, we use radial basis functions. To fit the model and estimate parameters we suggest a hierarchical Bayesian framework using Markov Chain Monte Carlo methodology. Furthermore, we illustrate the method in an application data. We check the convergence by a potential scale reduction factor and we use the posterior predictive p-value and the mean logarithmic conditional predictive ordinate to compar models.

Efficient estimation and variable selection for partially linear single-index-coefficient regression models

  • Kim, Young-Ju
    • Communications for Statistical Applications and Methods
    • /
    • v.26 no.1
    • /
    • pp.69-78
    • /
    • 2019
  • A structured model with both single-index and varying coefficients is a powerful tool in modeling high dimensional data. It has been widely used because the single-index can overcome the curse of dimensionality and varying coefficients can allow nonlinear interaction effects in the model. For high dimensional index vectors, variable selection becomes an important question in the model building process. In this paper, we propose an efficient estimation and a variable selection method based on a smoothing spline approach in a partially linear single-index-coefficient regression model. We also propose an efficient algorithm for simultaneously estimating the coefficient functions in a data-adaptive lower-dimensional approximation space and selecting significant variables in the index with the adaptive LASSO penalty. The empirical performance of the proposed method is illustrated with simulated and real data examples.

Derivation of a benchmark dose lower bound of lead for attention deficit hyperactivity disorder using a longitudinal data set (경시적 자료의 주의력 결핍 과잉행동 장애를 종점으로 한 납의 벤치마크 용량 하한 도출)

  • Lee, Juhyung;Kim, Si Yeon;Ha, Mina;Kwon, Hojang;Kim, Byung Soo
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.7
    • /
    • pp.1295-1309
    • /
    • 2016
  • This paper is to reproduce the result of Kim et al. (2014) by deriving a benchmark dose lower bound (BMDL) of lead based on the 2005 cohort data set of Children's Health and Environmental Research (CHEER) data set. The ADHD rating scales in the 2005 cohort were not consistent along the three follow-ups since two different ADHD rating scales were used in the cohort. We first unified the ADHD rating scales in the 2005 cohort by deriving a conversion formula using a penalized linear spline. We then constructed two linear mixed models for the 2005 cohort which reflected the longitudinal characteristics of the data set. The first model introduced the random intercept and the random slope terms and the second model assumed the first order autoregressive structure of the error term. Using these two models, we derived the BMDLs of lead and reconfirmed the "regression to the mean" nature of the ADHD score discovered by Kim et al. (2014). We also noticed that there was a definite difference between the sampling distributions of the two cohorts. As a result, taking this difference into account, we were able to obtain the consistent result with Kim et al. (2014).