• Title/Summary/Keyword: Distribution statistical model

Search Result 1,255, Processing Time 0.026 seconds

Integer-Valued HAR(p) model with Poisson distribution for forecasting IPO volumes

  • SeongMin Yu;Eunju Hwang
    • Communications for Statistical Applications and Methods
    • /
    • v.30 no.3
    • /
    • pp.273-289
    • /
    • 2023
  • In this paper, we develop a new time series model for predicting IPO (initial public offering) data with non-negative integer value. The proposed model is based on integer-valued autoregressive (INAR) model with a Poisson thinning operator. Just as the heterogeneous autoregressive (HAR) model with daily, weekly and monthly averages in a form of cascade, the integer-valued heterogeneous autoregressive (INHAR) model is considered to reflect efficiently the long memory. The parameters of the INHAR model are estimated using the conditional least squares estimate and Yule-Walker estimate. Through simulations, bias and standard error are calculated to compare the performance of the estimates. Effects of model fitting to the Korea's IPO are evaluated using performance measures such as mean square error (MAE), root mean square error (RMSE), mean absolute percentage error (MAPE) etc. The results show that INHAR model provides better performance than traditional INAR model. The empirical analysis of the Korea's IPO indicates that our proposed model is efficient in forecasting monthly IPO volumes.

Bayesian Conway-Maxwell-Poisson (CMP) regression for longitudinal count data

  • Morshed Alam ;Yeongjin Gwon ;Jane Meza
    • Communications for Statistical Applications and Methods
    • /
    • v.30 no.3
    • /
    • pp.291-309
    • /
    • 2023
  • Longitudinal count data has been widely collected in biomedical research, public health, and clinical trials. These repeated measurements over time on the same subjects need to account for an appropriate dependency. The Poisson regression model is the first choice to model the expected count of interest, however, this may not be an appropriate when data exhibit over-dispersion or under-dispersion. Recently, Conway-Maxwell-Poisson (CMP) distribution is popularly used as the distribution offers a flexibility to capture a wide range of dispersion in the data. In this article, we propose a Bayesian CMP regression model to accommodate over and under-dispersion in modeling longitudinal count data. Specifically, we develop a regression model with random intercept and slope to capture subject heterogeneity and estimate covariate effects to be different across subjects. We implement a Bayesian computation via Hamiltonian MCMC (HMCMC) algorithm for posterior sampling. We then compute Bayesian model assessment measures for model comparison. Simulation studies are conducted to assess the accuracy and effectiveness of our methodology. The usefulness of the proposed methodology is demonstrated by a well-known example of epilepsy data.

Variable selection and prediction performance of penalized two-part regression with community-based crime data application

  • Seong-Tae Kim;Man Sik Park
    • Communications for Statistical Applications and Methods
    • /
    • v.31 no.4
    • /
    • pp.441-457
    • /
    • 2024
  • Semicontinuous data are characterized by a mixture of a point probability mass at zero and a continuous distribution of positive values. This type of data is often modeled using a two-part model where the first part models the probability of dichotomous outcomes -zero or positive- and the second part models the distribution of positive values. Despite the two-part model's popularity, variable selection in this model has not been fully addressed, especially, in high dimensional data. The objective of this study is to investigate variable selection and prediction performance of penalized regression methods in two-part models. The performance of the selected techniques in the two-part model is evaluated via simulation studies. Our findings show that LASSO and ENET tend to select more predictors in the model than SCAD and MCP. Consequently, MCP and SCAD outperform LASSO and ENET for β-specificity, and LASSO and ENET perform better than MCP and SCAD with respect to the mean squared error. We find similar results when applying the penalized regression methods to the prediction of crime incidents using community-based data.

Cost Model for Annual Cost Spread Estimation of Space Launch Vehicle Development (발사체 개발의 연차별 비용 추정을 위한 비용모델 개발)

  • Kim, Hong-Rae;Yoo, Dong-Seo;Choi, Jong-Kwon;Chang, Young-Keun
    • Journal of the Korean Society for Aeronautical & Space Sciences
    • /
    • v.39 no.6
    • /
    • pp.576-584
    • /
    • 2011
  • In order to develop a launch vehicle successfully, it is important to estimate development costs accurately but it is also important to plan the annual budget. In this paper, the statistical method was utilized for cost spreading. For cost spread modeling, the suitability of the model by analyzing several statistical models was evaluated and consequently, the beta-distribution model has been selected. In this study, the validity of the annual estimation cost model was verified through the comparison of the actual development cost distribution and the estimating cost distribution of Space Shuttle Main Engine. In addition, this paper estimated the annual budget required for the development of the KSLV-II using currently allocated cost for successful development. It is anticipated that the present cost spread model can be applied to not only launch vehicle development but also other large complex system development.

BAYESIAN ROBUST ANALYSIS FOR NON-NORMAL DATA BASED ON A PERTURBED-t MODEL

  • Kim, Hea-Jung
    • Journal of the Korean Statistical Society
    • /
    • v.35 no.4
    • /
    • pp.419-439
    • /
    • 2006
  • The article develops a new class of distributions by introducing a nonnegative perturbing function to $t_\nu$ distribution having location and scale parameters. The class is obtained by using transformations and conditioning. The class strictly includes $t_\nu$ and $skew-t_\nu$ distributions. It provides yet other models useful for selection modeling and robustness analysis. Analytic forms of the densities are obtained and distributional properties are studied. These developments are followed by an easy method for estimating the distribution by using Markov chain Monte Carlo. It is shown that the method is straightforward to specify distribution ally and to implement computationally, with output readily adopted for constructing required criterion. The method is illustrated by using a simulation study.

Penalized maximum likelihood estimation with symmetric log-concave errors and LASSO penalty

  • Seo-Young, Park;Sunyul, Kim;Byungtae, Seo
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.6
    • /
    • pp.641-653
    • /
    • 2022
  • Penalized least squares methods are important tools to simultaneously select variables and estimate parameters in linear regression. The penalized maximum likelihood can also be used for the same purpose assuming that the error distribution falls in a certain parametric family of distributions. However, the use of a certain parametric family can suffer a misspecification problem which undermines the estimation accuracy. To give sufficient flexibility to the error distribution, we propose to use the symmetric log-concave error distribution with LASSO penalty. A feasible algorithm to estimate both nonparametric and parametric components in the proposed model is provided. Some numerical studies are also presented showing that the proposed method produces more efficient estimators than some existing methods with similar variable selection performance.

A Generation-based Text Steganography by Maintaining Consistency of Probability Distribution

  • Yang, Boya;Peng, Wanli;Xue, Yiming;Zhong, Ping
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.11
    • /
    • pp.4184-4202
    • /
    • 2021
  • Text steganography combined with natural language generation has become increasingly popular. The existing methods usually embed secret information in the generated word by controlling the sampling in the process of text generation. A candidate pool will be constructed by greedy strategy, and only the words with high probability will be encoded, which damages the statistical law of the texts and seriously affects the security of steganography. In order to reduce the influence of the candidate pool on the statistical imperceptibility of steganography, we propose a steganography method based on a new sampling strategy. Instead of just consisting of words with high probability, we select words with relatively small difference from the actual sample of the language model to build a candidate pool, thus keeping consistency with the probability distribution of the language model. What's more, we encode the candidate words according to their probability similarity with the target word, which can further maintain the probability distribution. Experimental results show that the proposed method can outperform the state-of-the-art steganographic methods in terms of security performance.

Grouping stocks using dynamic linear models

  • Sihyeon, Kim;Byeongchan, Seong
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.6
    • /
    • pp.695-708
    • /
    • 2022
  • Recently, several studies have been conducted using state space model. In this study, a dynamic linear model with state space model form is applied to stock data. The monthly returns for 135 Korean stocks are fitted to a dynamic linear model, to obtain an estimate of the time-varying 𝛽-coefficient time-series. The model formula used for the return is a capital asset pricing model formula explained in economics. In particular, the transition equation of the state space model form is appropriately modified to satisfy the assumptions of the error term. k-shape clustering is performed to classify the 135 estimated 𝛽 time-series into several groups. As a result of the clustering, four clusters are obtained, each consisting of approximately 30 stocks. It is found that the distribution is different for each group, so that it is well grouped to have its own characteristics. In addition, a common pattern is observed for each group, which could be interpreted appropriately.

Statistical Analysis of Degradation Data under a Random Coefficient Rate Model (확률계수 열화율 모형하에서 열화자료의 통계적 분석)

  • Seo, Sun-Keun;Lee, Su-Jin;Cho, You-Hee
    • Journal of Korean Society for Quality Management
    • /
    • v.34 no.3
    • /
    • pp.19-30
    • /
    • 2006
  • For highly reliable products, it is difficult to assess the lifetime of the products with traditional life tests. Accordingly, a recent approach is to observe the performance degradation of product during the test rather than regular failure time. This study compares performances of three methods(i.e. the approximation, analytical and numerical methods) to estimate the parameters and quantiles of the lifetime when the time-to-failure distribution follows Weibull and lognormal distributions under a random coefficient degradation rate model. Numerical experiments are also conducted to investigate the effects of model error such as measurements in a random coefficient model.

Application of Bootstrap Method to Primary Model of Microbial Food Quality Change

  • Lee, Dong-Sun;Park, Jin-Pyo
    • Food Science and Biotechnology
    • /
    • v.17 no.6
    • /
    • pp.1352-1356
    • /
    • 2008
  • Bootstrap method, a computer-intensive statistical technique to estimate the distribution of a statistic was applied to deal with uncertainty and variability of the experimental data in stochastic prediction modeling of microbial growth on a chill-stored food. Three different bootstrapping methods for the curve-fitting to the microbial count data were compared in determining the parameters of Baranyi and Roberts growth model: nonlinear regression to static version function with resampling residuals onto all the experimental microbial count data; static version regression onto mean counts at sampling times; dynamic version fitting of differential equations onto the bootstrapped mean counts. All the methods outputted almost same mean values of the parameters with difference in their distribution. Parameter search according to the dynamic form of differential equations resulted in the largest distribution of the model parameters but produced the confidence interval of the predicted microbial count close to those of nonlinear regression of static equation.