• Title/Summary/Keyword: the multiple regression analysis

Search Result 9,700, Processing Time 0.037 seconds

Classification via principal differential analysis

  • Jang, Eunseong;Lim, Yaeji
    • Communications for Statistical Applications and Methods
    • /
    • v.28 no.2
    • /
    • pp.135-150
    • /
    • 2021
  • We propose principal differential analysis based classification methods. Computations of squared multiple correlation function (RSQ) and principal differential analysis (PDA) scores are reviewed; in addition, we combine principal differential analysis results with the logistic regression for binary classification. In the numerical study, we compare the principal differential analysis based classification methods with functional principal component analysis based classification. Various scenarios are considered in a simulation study, and principal differential analysis based classification methods classify the functional data well. Gene expression data is considered for real data analysis. We observe that the PDA score based method also performs well.

A Study of Human Resource Efficiency in Public Corporation Medical Centers (지방공사의료원의 인적자원 효율성평가)

  • 남상요
    • Health Policy and Management
    • /
    • v.10 no.4
    • /
    • pp.75-98
    • /
    • 2000
  • This study applied Data Envelopment Analysis(DEA) and Ratio Analysis and Regression Analysis to a set of Korean Public Corporation Medical Centers to evaluate their relative human resource efficiencies. The output measure used in this study was based on health insurance system which was used in both in-patient departments and out-patient departments. Inputs included working time of the doctors, nurses, technicians, and managerial department staff. Based on the data provided on the inputs and outputs, the analysis showed 23 of the 34 hospitals to be relatively inefficient. Each hospital with an efficiency rating of less than 1 was considered relatively inefficient. In addition, managerial strategies based on dual variables were constructed to indicate the manner In which inefficient hospitals may be made efficient. A subsequent analysis of t-test revealed that the bed occupancy rate, medical revenue per 100beds, value added revenue per staff, medical revenue per staff were statistically significant. The results of this study suggest the DEA is a promising tool for evaluating relative human resource efficiency in hospitals which have multiple inputs and outputs and where the efficient production function is not specifiable with any precision. But it is considered that efficiency evaluations may be most effective]y accomplished by Incorporating a combination of methodologies such as ratio analysis and regression analysis.

  • PDF

FUZZY REGRESSION TOWARDS A GENERAL INSURANCE APPLICATION

  • Kim, Joseph H.T.;Kim, Joocheol
    • Journal of applied mathematics & informatics
    • /
    • v.32 no.3_4
    • /
    • pp.343-357
    • /
    • 2014
  • In many non-life insurance applications past data are given in a form known as the run-off triangle. Smoothing such data using parametric crisp regression models has long served as the basis of estimating future claim amounts and the reserves set aside to protect the insurer from future losses. In this article a fuzzy counterpart of the Hoerl curve, a well-known claim reserving regression model, is proposed to analyze the past claim data and to determine the reserves. The fuzzy Hoerl curve is more flexible and general than the one considered in the previous fuzzy literature in that it includes a categorical variable with multiple explanatory variables, which requires the development of the fuzzy analysis of covariance, or fuzzy ANCOVA. Using an actual insurance run-off claim data we show that the suggested fuzzy Hoerl curve based on the fuzzy ANCOVA gives reasonable claim reserves without stringent assumptions needed for the traditional regression approach in claim reserving.

Ensemble variable selection using genetic algorithm

  • Seogyoung, Lee;Martin Seunghwan, Yang;Jongkyeong, Kang;Seung Jun, Shin
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.6
    • /
    • pp.629-640
    • /
    • 2022
  • Variable selection is one of the most crucial tasks in supervised learning, such as regression and classification. The best subset selection is straightforward and optimal but not practically applicable unless the number of predictors is small. In this article, we propose directly solving the best subset selection via the genetic algorithm (GA), a popular stochastic optimization algorithm based on the principle of Darwinian evolution. To further improve the variable selection performance, we propose to run multiple GA to solve the best subset selection and then synthesize the results, which we call ensemble GA (EGA). The EGA significantly improves variable selection performance. In addition, the proposed method is essentially the best subset selection and hence applicable to a variety of models with different selection criteria. We compare the proposed EGA to existing variable selection methods under various models, including linear regression, Poisson regression, and Cox regression for survival data. Both simulation and real data analysis demonstrate the promising performance of the proposed method.

Development of Regression Models for Estimation of Unmeasured Dissolved Organic Carbon Concentrations in Mixed Land-use Watersheds (복합토지이용 유역의 수질 관리를 위한 미측정 용존유기탄소 농도 추정)

  • Min Kyeong Park;Jin a Beom;Minhyuk Jeung;Ji Yeon Jeong;Kwang Sik Yoon
    • Journal of Korean Society on Water Environment
    • /
    • v.39 no.2
    • /
    • pp.162-174
    • /
    • 2023
  • In order to prevent water pollution caused by organic matter, Total Organic Carbon(TOC) has been adopted indicator and monitored. TOC can be divided into Dissolved Organic Carbon(DOC) and Particulate Organic Carbon(POC). POC is largely precipitated and removed during stream flow, which making DOC environmentally significant. However, there are lack of studies to define spatio-temporal distributions of DOC in stream affected by various land use. Therefore, it is necessary to estimate the past DOC concentration using other water quality indicators to evaluate status of watershed management. In this study, DOC was estimated by correlation and regression analysis using three different organic matter indicators monitored in mixed land-use watersheds. The results of correlation analysis showed that DOC has the highest correlation with TOC. Based on the results of the correlation analysis, the single- and multiple-regression models were developed using Biochemical Oxygen Demand(BOD), Chemical Oxygen Demand(COD), and TOC. The results of the prediction accuracy for three different regression models showed that the single-regression model with TOC was better than those of the other multiple-regression models. The trend analysis using extended average concentration DOC data shows that DOC tends to decrease reflecting watershed management. This study could contribute to assessment and management of organic water pollution in mixed land-use watershed by suggesting methods for assessment of unmeasured DOC concentration.

Simultaneous outlier detection and variable selection via difference-based regression model and stochastic search variable selection

  • Park, Jong Suk;Park, Chun Gun;Lee, Kyeong Eun
    • Communications for Statistical Applications and Methods
    • /
    • v.26 no.2
    • /
    • pp.149-161
    • /
    • 2019
  • In this article, we suggest the following approaches to simultaneous variable selection and outlier detection. First, we determine possible candidates for outliers using properties of an intercept estimator in a difference-based regression model, and the information of outliers is reflected in the multiple regression model adding mean shift parameters. Second, we select the best model from the model including the outlier candidates as predictors using stochastic search variable selection. Finally, we evaluate our method using simulations and real data analysis to yield promising results. In addition, we need to develop our method to make robust estimates. We will also to the nonparametric regression model for simultaneous outlier detection and variable selection.

An Empirical Study on the Correlation between TOD Planning Elements and Subway Ridership in Busan Metropolitan City (부산시 역세권 TOD계획요소의 공간특성과 지하철 이용객 수의 상관성에 관한 실증연구)

  • Choi, Don-Jeong;Suh, Yong-Cheol
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.17 no.3
    • /
    • pp.147-159
    • /
    • 2014
  • Public transportation ridership and walkability of urban district can be enhanced through high quality of TOD(Transit Oriented Development) elements. Generally, TOD have been evaluated several physical components such as the diversity of land use pattern, accessibility of public transportation and aspects of urban design around the station area. Especially, Spatial characteristics of TOD planning elements have many potential dependent when considering the characteristics of Rail Station-Influenced Area Development which is performing around subway station. Therefore, researchers should be considering the variation of spatial properties for planning elements according the set of spatial area and their socioeconomic factors. However, existing many cases related TOD does not consider about this point. In this paper, the changes of TOD characteristics were analyzed by different spatial units surrounding subway station in Busan Metropolitan City. Multiple Regression Analysis was performed for an investigation of effective spatial unit of TOD planning elements in this area using subway ridership data. In addition, the application validity of socioeconomic variables was examined through a comparative analysis of regression results with the multiple regression that implied only physical TOD elements. As the result, the variation of spatial properties for TOD planning elements according to the set of spatial unit was found. Furthermore, the specific spatial unit to applicable TOD elements in this area was derived. And the multiple regression model which added socioeconomic variables was derived more improved estimate results than the multiple regression model that implied only physical TOD elements.

Empirical seismic fragility rapid prediction probability model of regional group reinforced concrete girder bridges

  • Li, Si-Qi;Chen, Yong-Sheng;Liu, Hong-Bo;Du, Ke
    • Earthquakes and Structures
    • /
    • v.22 no.6
    • /
    • pp.609-623
    • /
    • 2022
  • To study the empirical seismic fragility of a reinforced concrete girder bridge, based on the theory of numerical analysis and probability modelling, a regression fragility method of a rapid fragility prediction model (Gaussian first-order regression probability model) considering empirical seismic damage is proposed. A total of 1,069 reinforced concrete girder bridges of 22 highways were used to verify the model, and the vulnerability function, plane, surface and curve model of reinforced concrete girder bridges (simple supported girder bridges and continuous girder bridges) considering the number of samples in multiple intensity regions were established. The new empirical seismic damage probability matrix and curve models of observation frequency and damage exceeding probability are developed in multiple intensity regions. A comparative vulnerability analysis between simple supported girder bridges and continuous girder bridges is provided. Depending on the theory of the regional mean seismic damage index matrix model, the empirical seismic damage prediction probability matrix is embedded in the multidimensional mean seismic damage index matrix model, and the regional rapid prediction matrix and curve of reinforced concrete girder bridges, simple supported girder bridges and continuous girder bridges in multiple intensity regions based on mean seismic damage index parameters are developed. The established multidimensional group bridge vulnerability model can be used to quantify and predict the fragility of bridges in multiple intensity regions and the fragility assessment of regional group reinforced concrete girder bridges in the future.

A Local Influence Approach to Regression Diagnostics with Application to Robust Regression

  • Huh, Myung-Hoe;Park, Sung H.
    • Journal of the Korean Statistical Society
    • /
    • v.19 no.2
    • /
    • pp.151-159
    • /
    • 1990
  • Regression diagnostics often involves assesment of the changes that result from deleting multiple cases. Diagnostic mehtodology based on global influence measure, however, needs prohibitive computing time. As an alternative, Cook (1986) developed influence approach in which it is checked whether a minor modification of specifiation influences key results of an analysis. In line with Cook's development, we propose and study an inflence derivative method that yields both the magnitude and direction of case influences. The utility of our methodology is highlighted when case influence derivatives are plotted in a lower demensional space. Such plots are especially effective in unmasking "masked" observations in least squares regression and in robust regression also. We give several illustrations.strations.

  • PDF

Optimal fractions in terms of a prediction-oriented measure

  • Lee, Won-Woo
    • Journal of the Korean Statistical Society
    • /
    • v.22 no.2
    • /
    • pp.209-217
    • /
    • 1993
  • The multicollinearity problem in a multiple linear regression model may present deleterious effects on predictions. Thus, its is desirable to consider the optimal fractions with respect to the unbiased estimate of the mean squares errors of the predicted values. Interstingly, the optimal fractions can be also illuminated by the Bayesian inerpretation of the general James-Stein estimators.

  • PDF