• Title/Summary/Keyword: Regression Statistical Analysis

Search Result 3,457, Processing Time 0.033 seconds

Comparison of National Occupational Accident Fatality Rates using Statistical Analysis on Economic and Social Indicators (경제⋅사회지표의 다변량 통계 분석을 활용한 국가 간 산업재해 사고사망 상대수준 비교)

  • Kyunghun, Kim;Sudong, Lee
    • Journal of the Korean Society of Safety
    • /
    • v.37 no.6
    • /
    • pp.128-135
    • /
    • 2022
  • The comparative evaluation of occupational accident fatality rates (OAFRs) of different countries is complicated owing to the differences in their level of socio-economic development. However, such evaluation is necessary to assess the national occupational safety and health system of a country. This study proposes a statistical method to compare the OAFRs of countries taking into consideration the difference in their level of socio-economic development. We first collected data on the socio-economic indicators and OAFRs of 11 countries over a 30-year period. Next, based on literature survey and statistical correlation analysis, we selected the significant independent variables and built multiple linear regression models to predict OAFR. We also determined the groups of countries having heterogeneous relationships between the independent variables and OAFRs, which are represented by the regression models. The proposed method is demonstrated by comparing the OAFR of Korea with the OAFRs of 10 other developed countries.

An Application of Support Vector Machines to Personal Credit Scoring: Focusing on Financial Institutions in China (Support Vector Machines을 이용한 개인신용평가 : 중국 금융기관을 중심으로)

  • Ding, Xuan-Ze;Lee, Young-Chan
    • Journal of Industrial Convergence
    • /
    • v.16 no.4
    • /
    • pp.33-46
    • /
    • 2018
  • Personal credit scoring is an effective tool for banks to properly guide decision profitably on granting loans. Recently, many classification algorithms and models are used in personal credit scoring. Personal credit scoring technology is usually divided into statistical method and non-statistical method. Statistical method includes linear regression, discriminate analysis, logistic regression, and decision tree, etc. Non-statistical method includes linear programming, neural network, genetic algorithm and support vector machine, etc. But for the development of the credit scoring model, there is no consistent conclusion to be drawn regarding which method is the best. In this paper, we will compare the performance of the most common scoring techniques such as logistic regression, neural network, and support vector machines using personal credit data of the financial institution in China. Specifically, we build three models respectively, classify the customers and compare analysis results. According to the results, support vector machine has better performance than logistic regression and neural networks.

Penalized quantile regression tree (벌점화 분위수 회귀나무모형에 대한 연구)

  • Kim, Jaeoh;Cho, HyungJun;Bang, Sungwan
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.7
    • /
    • pp.1361-1371
    • /
    • 2016
  • Quantile regression provides a variety of useful statistical information to examine how covariates influence the conditional quantile functions of a response variable. However, traditional quantile regression (which assume a linear model) is not appropriate when the relationship between the response and the covariates is a nonlinear. It is also necessary to conduct variable selection for high dimensional data or strongly correlated covariates. In this paper, we propose a penalized quantile regression tree model. The split rule of the proposed method is based on residual analysis, which has a negligible bias to select a split variable and reasonable computational cost. A simulation study and real data analysis are presented to demonstrate the satisfactory performance and usefulness of the proposed method.

Principal component regression for spatial data (공간자료 주성분분석)

  • Lim, Yaeji
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.3
    • /
    • pp.311-321
    • /
    • 2017
  • Principal component analysis is a popular statistical method to reduce the dimension of the high dimensional climate data and to extract meaningful climate patterns. Based on the principal component analysis, we can further apply a regression approach for the linear prediction of future climate, termed as principal component regression (PCR). In this paper, we develop a new PCR method based on the regularized principal component analysis for spatial data proposed by Wang and Huang (2016) to account spatial feature of the climate data. We apply the proposed method to temperature prediction in the East Asia region and compare the result with conventional PCR results.

AN ASSESSMENT OF UNCERTAINTY ON A LOFT L2-5 LBLOCA PCT BASED ON THE ACE-RSM APPROACH: COMPLEMENTARY WORK FOR THE OECD BEMUSE PHASE-III PROGRAM

  • Ahn, Kwang-Il;Chung, Bub-Dong;Lee, John C.
    • Nuclear Engineering and Technology
    • /
    • v.42 no.2
    • /
    • pp.163-174
    • /
    • 2010
  • As pointed out in the OECD BEMUSE Program, when a high computation time is taken to obtain the relevant output values of a complex physical model (or code), the number of statistical samples that must be evaluated through it is a critical factor for the sampling-based uncertainty analysis. Two alternative methods have been utilized to avoid the problem associated with the size of these statistical samples: one is based on Wilks' formula, which is based on simple random sampling, and the other is based on the conventional nonlinear regression approach. While both approaches provide a useful means for drawing conclusions on the resultant uncertainty with a limited number of code runs, there are also some unique corresponding limitations. For example, a conclusion based on the Wilks' formula can be highly affected by the sampled values themselves, while the conventional regression approach requires an a priori estimate on the functional forms of a regression model. The main objective of this paper is to assess the feasibility of the ACE-RSM approach as a complementary method to the Wilks' formula and the conventional regression-based uncertainty analysis. This feasibility was assessed through a practical application of the ACE-RSM approach to the LOFT L2-5 LBLOCA PCT uncertainty analysis, which was implemented as a part of the OECD BEMUSE Phase III program.

A Short Note on Empirical Penalty Term Study of BIC in K-means Clustering Inverse Regression

  • Ahn, Ji-Hyun;Yoo, Jae-Keun
    • Communications for Statistical Applications and Methods
    • /
    • v.18 no.3
    • /
    • pp.267-275
    • /
    • 2011
  • According to recent studies, Bayesian information criteria(BIC) is proposed to determine the structural dimension of the central subspace through sliced inverse regression(SIR) with high-dimensional predictors. The BIC may be useful in K-means clustering inverse regression(KIR) with high-dimensional predictors. However, the direct application of the BIC to KIR may be problematic, because the slicing scheme in SIR is not the same as that of KIR. In this paper, we present empirical penalty term studies of BIC in KIR to identify the most appropriate one. Numerical studies and real data analysis are presented.

A Local Influence Approach to Regression Diagnostics with Application to Robust Regression

  • Huh, Myung-Hoe;Park, Sung H.
    • Journal of the Korean Statistical Society
    • /
    • v.19 no.2
    • /
    • pp.151-159
    • /
    • 1990
  • Regression diagnostics often involves assesment of the changes that result from deleting multiple cases. Diagnostic mehtodology based on global influence measure, however, needs prohibitive computing time. As an alternative, Cook (1986) developed influence approach in which it is checked whether a minor modification of specifiation influences key results of an analysis. In line with Cook's development, we propose and study an inflence derivative method that yields both the magnitude and direction of case influences. The utility of our methodology is highlighted when case influence derivatives are plotted in a lower demensional space. Such plots are especially effective in unmasking "masked" observations in least squares regression and in robust regression also. We give several illustrations.strations.

  • PDF

Simultaneous outlier detection and variable selection via difference-based regression model and stochastic search variable selection

  • Park, Jong Suk;Park, Chun Gun;Lee, Kyeong Eun
    • Communications for Statistical Applications and Methods
    • /
    • v.26 no.2
    • /
    • pp.149-161
    • /
    • 2019
  • In this article, we suggest the following approaches to simultaneous variable selection and outlier detection. First, we determine possible candidates for outliers using properties of an intercept estimator in a difference-based regression model, and the information of outliers is reflected in the multiple regression model adding mean shift parameters. Second, we select the best model from the model including the outlier candidates as predictors using stochastic search variable selection. Finally, we evaluate our method using simulations and real data analysis to yield promising results. In addition, we need to develop our method to make robust estimates. We will also to the nonparametric regression model for simultaneous outlier detection and variable selection.

Ensemble variable selection using genetic algorithm

  • Seogyoung, Lee;Martin Seunghwan, Yang;Jongkyeong, Kang;Seung Jun, Shin
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.6
    • /
    • pp.629-640
    • /
    • 2022
  • Variable selection is one of the most crucial tasks in supervised learning, such as regression and classification. The best subset selection is straightforward and optimal but not practically applicable unless the number of predictors is small. In this article, we propose directly solving the best subset selection via the genetic algorithm (GA), a popular stochastic optimization algorithm based on the principle of Darwinian evolution. To further improve the variable selection performance, we propose to run multiple GA to solve the best subset selection and then synthesize the results, which we call ensemble GA (EGA). The EGA significantly improves variable selection performance. In addition, the proposed method is essentially the best subset selection and hence applicable to a variety of models with different selection criteria. We compare the proposed EGA to existing variable selection methods under various models, including linear regression, Poisson regression, and Cox regression for survival data. Both simulation and real data analysis demonstrate the promising performance of the proposed method.

Support Vector Median Regression

  • Hwang, Chang-Ha
    • Journal of the Korean Data and Information Science Society
    • /
    • v.14 no.1
    • /
    • pp.67-74
    • /
    • 2003
  • Median regression analysis has robustness properties which make it an attractive alternative to regression based on the mean. Support vector machine (SVM) is used widely in real-world regression tasks. In this paper, we propose a new SV median regression based on check function. And we illustrate how this proposed SVM performs and compare this with the SVM based on absolute deviation loss function.

  • PDF