• Title/Summary/Keyword: Generalized Cross-Validation

Search Result 77, Processing Time 0.028 seconds

Testing the Goodness of Fit of a Parametric Model via Smoothing Parameter Estimate

  • Kim, Choongrak
    • Journal of the Korean Statistical Society
    • /
    • v.30 no.4
    • /
    • pp.645-660
    • /
    • 2001
  • In this paper we propose a goodness-of-fit test statistic for testing the (null) parametric model versus the (alternative) nonparametric model. Most of existing nonparametric test statistics are based on the residuals which are obtained by regressing the data to a parametric model. Our test is based on the bootstrap estimator of the probability that the smoothing parameter estimator is infinite when fitting residuals to cubic smoothing spline. Power performance of this test is investigated and is compared with many other tests. Illustrative examples based on real data sets are given.

  • PDF

Variable selection in the kernel Cox regression

  • Shim, Joo-Yong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.22 no.4
    • /
    • pp.795-801
    • /
    • 2011
  • In machine learning and statistics it is often the case that some variables are not important, while some variables are more important than others. We propose a novel algorithm for selecting such relevant variables in the kernel Cox regression. We employ the weighted version of ANOVA decomposition kernels to choose optimal subset of relevant variables in the kernel Cox regression. Experimental results are then presented which indicate the performance of the proposed method.

Boundary Corrected Smoothing Splines

  • Kim, Jong-Tae
    • Journal of the Korean Data and Information Science Society
    • /
    • v.9 no.1
    • /
    • pp.77-88
    • /
    • 1998
  • Smoothing spline estimators are modified to remove boundary bias effects using the technique proposed in Eubank and Speckman (1991). An O(n) algorithm is developed for the computation of the resulting estimator as well as associated generalized cross-validation criteria, etc. The asymptotic properties of the estimator are studied for the case of a linear smoothing spline and the upper bound for the average mean squared error of the estimator given in Eubank and Speckman (1991) is shown to be asymptotically sharp in this case.

  • PDF

MULTI-PARAMETER TIKHONOV REGULARIZATION PROBLEM WITH MULTIPLE RIGHT HAND SIDES

  • Oh, SeYoung;Kwon, SunJoo
    • Journal of the Chungcheong Mathematical Society
    • /
    • v.33 no.4
    • /
    • pp.505-516
    • /
    • 2020
  • This study shows that image deblurring problems can be transformed into the multi-parameter Tikhonov type with multiple right hand sides. Also, this paper proposes the extension of the global generalized cross validation to obtain an appropriate choice of the regularization parameters for this problem. The experimental results of using the preconditioned Gl-CGLS algorithm were analyzed.

Estimation of nonlinear GARCH-M model (비선형 평균 일반화 이분산 자기회귀모형의 추정)

  • Shim, Joo-Yong;Lee, Jang-Taek
    • Journal of the Korean Data and Information Science Society
    • /
    • v.21 no.5
    • /
    • pp.831-839
    • /
    • 2010
  • Least squares support vector machine (LS-SVM) is a kernel trick gaining a lot of popularities in the regression and classification problems. We use LS-SVM to propose a iterative algorithm for a nonlinear generalized autoregressive conditional heteroscedasticity model in the mean (GARCH-M) model to estimate the mean and the conditional volatility of stock market returns. The proposed method combines a weighted LS-SVM for the mean and unweighted LS-SVM for the conditional volatility. In this paper, we show that nonlinear GARCH-M models have a higher performance than the linear GARCH model and the linear GARCH-M model via real data estimations.

Mixed effects least squares support vector machine for survival data analysis (생존자료분석을 위한 혼합효과 최소제곱 서포트벡터기계)

  • Hwang, Chang-Ha;Shim, Joo-Yong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.23 no.4
    • /
    • pp.739-748
    • /
    • 2012
  • In this paper we propose a mixed effects least squares support vector machine (LS-SVM) for the censored data which are observed from different groups. We use weights by which the randomly right censoring is taken into account in the nonlinear regression. The weights are formed with Kaplan-Meier estimates of censoring distribution. In the proposed model a random effects term representing inter-group variation is included. Furthermore generalized cross validation function is proposed for the selection of the optimal values of hyper-parameters. Experimental results are then presented which indicate the performance of the proposed LS-SVM by comparing with a standard LS-SVM for the censored data.

Development and Validation of Generalized Linear Regression Models to Predict Vessel Enhancement on Coronary CT Angiography

  • Masuda, Takanori;Nakaura, Takeshi;Funama, Yoshinori;Sato, Tomoyasu;Higaki, Toru;Kiguchi, Masao;Matsumoto, Yoriaki;Yamashita, Yukari;Imada, Naoyuki;Awai, Kazuo
    • Korean Journal of Radiology
    • /
    • v.19 no.6
    • /
    • pp.1021-1030
    • /
    • 2018
  • Objective: We evaluated the effect of various patient characteristics and time-density curve (TDC)-factors on the test bolus-affected vessel enhancement on coronary computed tomography angiography (CCTA). We also assessed the value of generalized linear regression models (GLMs) for predicting enhancement on CCTA. Materials and Methods: We performed univariate and multivariate regression analysis to evaluate the effect of patient characteristics and to compare contrast enhancement per gram of iodine on test bolus (${\Delta}HUTEST$) and CCTA (${\Delta}HUCCTA$). We developed GLMs to predict ${\Delta}HUCCTA$. GLMs including independent variables were validated with 6-fold cross-validation using the correlation coefficient and Bland-Altman analysis. Results: In multivariate analysis, only total body weight (TBW) and ${\Delta}HUTEST$ maintained their independent predictive value (p < 0.001). In validation analysis, the highest correlation coefficient between ${\Delta}HUCCTA$ and the prediction values was seen in the GLM (r = 0.75), followed by TDC (r = 0.69) and TBW (r = 0.62). The lowest Bland-Altman limit of agreement was observed with GLM-3 (mean difference, $-0.0{\pm}5.1$ Hounsfield units/grams of iodine [HU/gI]; 95% confidence interval [CI], -10.1, 10.1), followed by ${\Delta}HUCCTA$ ($-0.0{\pm}5.9HU/gI$; 95% CI, -11.9, 11.9) and TBW ($1.1{\pm}6.2HU/gI$; 95% CI, -11.2, 13.4). Conclusion: We demonstrated that the patient's TBW and ${\Delta}HUTEST$ significantly affected contrast enhancement on CCTA images and that the combined use of clinical information and test bolus results is useful for predicting aortic enhancement.

Prediction of movie audience numbers using hybrid model combining GLS and Bass models (GLS와 Bass 모형을 결합한 하이브리드 모형을 이용한 영화 관객 수 예측)

  • Kim, Bokyung;Lim, Changwon
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.4
    • /
    • pp.447-461
    • /
    • 2018
  • Domestic film industry sales are increasing every year. Theaters are the primary sales channels for movies and the number of audiences using the theater affects additional selling rights. Therefore, the number of audiences using the theater is an important factor directly linked to movie industry sales. In this paper we consider a hybrid model that combines a multiple linear regression model and the Bass model to predict the audience numbers for a specific day. By combining the two models, the predictive value of the regression analysis was corrected to that of the Bass model. In the analysis, three films with different release dates were used. All subset regression method is used to generate all possible combinations and 5-fold cross validation to estimate the model 5 times. In this case, the predicted value is obtained from the model with the smallest root mean square error and then combined with the predicted value of the Bass model to obtain the final predicted value. With the existence of past data, it was confirmed that the weight of the Bass model increases and the compensation is added to the predicted value.

Estimating Stature and Weight from Anthropometry for the Elderly Who are Limited in Mobility (신체계측방법에 의한 거동이 제한된 노인들의 신장과 체중추정)

  • 한경희
    • Journal of Nutrition and Health
    • /
    • v.28 no.1
    • /
    • pp.71-83
    • /
    • 1995
  • The purpose of the study was to develop generalized equations for estimating stature and weight for the nonambulatory elderly persons. Height weight recumbent knee height total ann length, midarm, waist and calf circumferences, triceps and subscapular skinfolds were measured from over 60 years old 315 ambulatory elderly. The equations to predict stature and weight were derived from participants in the validation sample and were applied to the participants in the cross-validation to test the accuracy and validity of equations. Stature and weight were significantly and negatively associated with age of women and similar patterns observed in men but associated to a slight degree. Knee height and total arm length were highly correlated with stature but the majority of the variances in stature was accounted for by knee height for both the men and women. In men, waist circumference was the most significantly correlated with weight and am, calf circumferences and so forth. But in women arm circumference was the highest then waist and calf circumference in order. The possible predictor variables to estimate of stature were knee height total arm length and age for both elderly men and women. Predictor variables to estimate of weight were recumbent measures of waist am, calf circumferences and knee height for both sexes. Inclusion of skinfold thickness measurements did not improve the prediction power of estimation for weight. When both equations developed from the present study and Chumlea's study were applied to cross-valida-tions samples, the equations derived from present study showed better accuracy and validity. The presentation of prediction equations using two, three, or four recommended measurements allows the selection of an equation based upon the measurements that are possible to collect on an individual basis.

  • PDF

A generalized explainable approach to predict the hardened properties of self-compacting geopolymer concrete using machine learning techniques

  • Endow Ayar Mazumder;Sanjog Chhetri Sapkota;Sourav Das;Prasenjit Saha;Pijush Samui
    • Computers and Concrete
    • /
    • v.34 no.3
    • /
    • pp.279-296
    • /
    • 2024
  • In this study, ensemble machine learning (ML) models are employed to estimate the hardened properties of Self-Compacting Geopolymer Concrete (SCGC). The input variables affecting model development include the content of the SCGC such as the binder material, the age of the specimen, and the ratio of alkaline solution. On the other hand, the output parameters examined includes compressive strength, flexural strength, and split tensile strength. The ensemble machine learning models are trained and validated using a database comprising 396 records compiled from 132 unique mix trials performed in the laboratory. Diverse machine learning techniques, notably K-nearest neighbours (KNN), Random Forest, and Extreme Gradient Boosting (XGBoost), have been employed to construct the models coupled with Bayesian optimisation (BO) for the purpose of hyperparameter tuning. Furthermore, the application of nested cross-validation has been employed in order to mitigate the risk of overfitting. The findings of this study reveal that the BO-XGBoost hybrid model confirms better predictive accuracy in comparison to other models. The R2 values for compressive strength, flexural strength, and split tensile strength are 0.9974, 0.9978, and 0.9937, respectively. Additionally, the BO-XGBoost hybrid model exhibits the lowest RMSE values of 0.8712, 0.0773, and 0.0799 for compressive strength, flexural strength, and split tensile strength, respectively. Furthermore, a SHAP dependency analysis was conducted to ascertain the significance of each parameter. It is observed from this study that GGBS, Flyash, and the age of specimens exhibit a substantial level of influence when predicting the strengths of geopolymers.