• Title/Summary/Keyword: Models, statistical

Search Result 3,041, Processing Time 0.026 seconds

Input Variable Importance in Supervised Learning Models

  • Huh, Myung-Hoe;Lee, Yong Goo
    • Communications for Statistical Applications and Methods
    • /
    • v.10 no.1
    • /
    • pp.239-246
    • /
    • 2003
  • Statisticians, or data miners, are often requested to assess the importances of input variables in the given supervised learning model. For the purpose, one may rely on separate ad hoc measures depending on modeling types, such as linear regressions, the neural networks or trees. Consequently, the conceptual consistency in input variable importance measures is lacking, so that the measures cannot be directly used in comparing different types of models, which is often done in data mining processes, In this short communication, we propose a unified approach to the importance measurement of input variables. Our method uses sensitivity analysis which begins by perturbing the values of input variables and monitors the output change. Research scope is limited to the models for continuous output, although it is not difficult to extend the method to supervised learning models for categorical outcomes.

Performance Evaluation of the ACD Models for Analysing the Transaction Data of the KOSPI Stocks (주식 거래 자료 분석을 위한 ACD 모형 성능 비교)

  • Kim, Sahm;Jung, Da-Woon
    • Communications for Statistical Applications and Methods
    • /
    • v.16 no.1
    • /
    • pp.21-29
    • /
    • 2009
  • Engle and Russell (1998) proposed the ACD(Autoregressive Conditional Duration) model to explain the relationship between the prices and the duration times of the stocks. In this paper, we first introduce the various types of the ACD models such as the linear ACD, log ACD and Box-Cox ACD models and we evaluate the performance of the models for analysing the transaction data of the stocks in Korea.

Modeling Aided Lead Design of FAK Inhibitors

  • Madhavan, Thirumurthy
    • Journal of Integrative Natural Science
    • /
    • v.4 no.4
    • /
    • pp.266-272
    • /
    • 2011
  • Focal adhesion kinase (FAK) is a potential target for the treatment of primary cancers as well as prevention of tumor metastasis. To understand the structural and chemical features of FAK inhibitors, we report comparative molecular field analysis (CoMFA) for the series of 7H-pyrrolo(2,3-d)pyrimidines. The CoMFA models showed good correlation between the actual and predicted values for training set molecules. Our results indicated the ligand-based alignment has produced better statistical results for CoMFA ($q^2$ = 0.505, $r^2$ = 0.950). Both models were validated using test set compounds, and gave good predictive values of 0.537. The statistical parameters from the generated 3D-QSAR models were indicated that the data are well fitted and have high predictive ability. The contour map from 3D-QSAR models explains nicely the structure-activity relationships of FAK inhibitors and our results would give proper guidelines to further enhance the activity of novel inhibitors.

Potential of regression models in projecting sea level variability due to climate change at Haldia Port, India

  • Roshni, Thendiyath;K., Md. Sajid;Samui, Pijush
    • Ocean Systems Engineering
    • /
    • v.7 no.4
    • /
    • pp.319-328
    • /
    • 2017
  • Higher prediction efficacy is a very challenging task in any field of engineering. Due to global warming, there is a considerable increase in the global sea level. Through this work, an attempt has been made to find the sea level variability due to climate change impact at Haldia Port, India. Different statistical downscaling techniques are available and through this paper authors are intending to compare and illustrate the performances of three regression models. The models: Wavelet Neural Network (WNN), Minimax Probability Machine Regression (MPMR), Feed-Forward Neural Network (FFNN) are used for projecting the sea level variability due to climate change at Haldia Port, India. Model performance indices like PI, RMSE, NSE, MAPE, RSR etc were evaluated to get a clear picture on the model accuracy. All the indices are pointing towards the outperformance of WNN in projecting the sea level variability. The findings suggest a strong recommendation for ensembled models especially wavelet decomposed neural network to improve projecting efficiency in any time series modeling.

Bootstrap-Based Test for Volatility Shifts in GARCH against Long-Range Dependence

  • Wang, Yu;Park, Cheolwoo;Lee, Taewook
    • Communications for Statistical Applications and Methods
    • /
    • v.22 no.5
    • /
    • pp.495-506
    • /
    • 2015
  • Volatility is a variation measure in finance for returns of a financial instrument over time. GARCH models have been a popular tool to analyze volatility of financial time series data since Bollerslev (1986) and it is said that volatility is highly persistent when the sum of the estimated coefficients of the squared lagged returns and the lagged conditional variance terms in GARCH models is close to 1. Regarding persistence, numerous methods have been proposed to test if such persistency is due to volatility shifts in the market or natural fluctuation explained by stationary long-range dependence (LRD). Recently, Lee et al. (2015) proposed a residual-based cumulative sum (CUSUM) test statistic to test volatility shifts in GARCH models against LRD. We propose a bootstrap-based approach for the residual-based test and compare the sizes and powers of our bootstrap-based CUSUM test with the one in Lee et al. (2015) through simulation studies.

Forecasting evaluation via parametric bootstrap for threshold-INARCH models

  • Kim, Deok Ryun;Hwang, Sun Young
    • Communications for Statistical Applications and Methods
    • /
    • v.27 no.2
    • /
    • pp.177-187
    • /
    • 2020
  • This article is concerned with the issue of forecasting and evaluation of threshold-asymmetric volatility models for time series of count data. In particular, threshold integer-valued models with conditional Poisson and conditional negative binomial distributions are highlighted. Based on the parametric bootstrap method, some evaluation measures are discussed in terms of one-step ahead forecasting. A parametric bootstrap procedure is explained from which directional measure, magnitude measure and expected cost of misclassification are discussed to evaluate competing models. The cholera data in Bangladesh from 1988 to 2016 is analyzed as a real application.

Bayesian baseline-category logit random effects models for longitudinal nominal data

  • Kim, Jiyeong;Lee, Keunbaik
    • Communications for Statistical Applications and Methods
    • /
    • v.27 no.2
    • /
    • pp.201-210
    • /
    • 2020
  • Baseline-category logit random effects models have been used to analyze longitudinal nominal data. The models account for subject-specific variations using random effects. However, the random effects covariance matrix in the models needs to explain subject-specific variations as well as serial correlations for nominal outcomes. In order to satisfy them, the covariance matrix must be heterogeneous and high-dimensional. However, it is difficult to estimate the random effects covariance matrix due to its high dimensionality and positive-definiteness. In this paper, we exploit the modified Cholesky decomposition to estimate the high-dimensional heterogeneous random effects covariance matrix. Bayesian methodology is proposed to estimate parameters of interest. The proposed methods are illustrated with real data from the McKinney Homeless Research Project.

Stochastic structures of world's death counts after World War II

  • Lee, Jae J.
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.3
    • /
    • pp.353-371
    • /
    • 2022
  • This paper analyzes death counts after World War II of several countries to identify and to compare their stochastic structures. The stochastic structures that this paper entertains are three structural time series models, a local level with a random walk model, a fixed local linear trend model and a local linear trend model. The structural time series models assume that a time series can be formulated directly with the unobserved components such as trend, slope, seasonal, cycle and daily effect. Random effect of each unobserved component is characterized by its own stochastic structure and a distribution of its irregular component. The structural time series models use the Kalman filter to estimate unknown parameters of a stochastic model, to predict future data, and to do filtering data. This paper identifies the best-fitted stochastic model for three types of death counts (Female, Male and Total) of each country. Two diagnostic procedures are used to check the validity of fitted models. Three criteria, AIC, BIC and SSPE are used to select the best-fitted valid stochastic model for each type of death counts of each country.

A Kullback-Leibler divergence based comparison of approximate Bayesian estimations of ARMA models

  • Amin, Ayman A
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.4
    • /
    • pp.471-486
    • /
    • 2022
  • Autoregressive moving average (ARMA) models involve nonlinearity in the model coefficients because of unobserved lagged errors, which complicates the likelihood function and makes the posterior density analytically intractable. In order to overcome this problem of posterior analysis, some approximation methods have been proposed in literature. In this paper we first review the main analytic approximations proposed to approximate the posterior density of ARMA models to be analytically tractable, which include Newbold, Zellner-Reynolds, and Broemeling-Shaarawy approximations. We then use the Kullback-Leibler divergence to study the relation between these three analytic approximations and to measure the distance between their derived approximate posteriors for ARMA models. In addition, we evaluate the impact of the approximate posteriors distance in Bayesian estimates of mean and precision of the model coefficients by generating a large number of Monte Carlo simulations from the approximate posteriors. Simulation study results show that the approximate posteriors of Newbold and Zellner-Reynolds are very close to each other, and their estimates have higher precision compared to those of Broemeling-Shaarawy approximation. Same results are obtained from the application to real-world time series datasets.

Is it possible to forecast KOSPI direction using deep learning methods?

  • Choi, Songa;Song, Jongwoo
    • Communications for Statistical Applications and Methods
    • /
    • v.28 no.4
    • /
    • pp.329-338
    • /
    • 2021
  • Deep learning methods have been developed, used in various fields, and they have shown outstanding performances in many cases. Many studies predicted a daily stock return, a classic example of time-series data, using deep learning methods. We also tried to apply deep learning methods to Korea's stock market data. We used Korea's stock market index (KOSPI) and several individual stocks to forecast daily returns and directions. We compared several deep learning models with other machine learning methods, including random forest and XGBoost. In regression, long short term memory (LSTM) and gated recurrent unit (GRU) models are better than other prediction models. For the classification applications, there is no clear winner. However, even the best deep learning models cannot predict significantly better than the simple base model. We believe that it is challenging to predict daily stock return data even if we use the latest deep learning methods.