• Title/Summary/Keyword: multicollinearity measures

Search Result 9, Processing Time 0.019 seconds

Multicollinarity in Logistic Regression

  • Jong-Han lee;Myung-Hoe Huh
    • Communications for Statistical Applications and Methods
    • /
    • v.2 no.2
    • /
    • pp.303-309
    • /
    • 1995
  • Many measures to detect multicollinearity in linear regression have been proposed in statistics and numerical analysis literature. Among them, condition number and variance inflation factor(VIF) are most popular. In this study, we give new interpretations of condition number and VIF in linear regression, using geometry on the explanatory space. In the same line, we derive natural measures of condition number and VIF for logistic regression. These computer intensive measures can be easily extended to evaluate multicollinearity in generalized linear models.

  • PDF

ILL-CONDITIONING IN LINEAR REGRESSION MODELS AND ITS DIAGNOSTICS

  • Ghorbani, Hamid
    • The Pure and Applied Mathematics
    • /
    • v.27 no.2
    • /
    • pp.71-81
    • /
    • 2020
  • Multicollinearity is a common problem in linear regression models when two or more regressors are highly correlated, which yields some serious problems for the ordinary least square estimates of the parameters as well as model validation and interpretation. In this paper, first the problem of multicollinearity and its subsequent effects on the linear regression along with some important measures for detecting multicollinearity is reviewed, then the role of eigenvalues and eigenvectors in detecting multicollinearity are bolded. At the end a real data set is evaluated for which the fitted linear regression models is investigated for multicollinearity diagnostics.

Comparing Fault Prediction Models Using Change Request Data for a Telecommunication System

  • Park, Young-Sik;Yoon, Byeong-Nam;Lim, Jae-Hak
    • ETRI Journal
    • /
    • v.21 no.3
    • /
    • pp.6-15
    • /
    • 1999
  • Many studies in the software reliability have attempted to develop a model for predicting the faults of a software module because the application of good prediction models provides the optimal resource allocation during the development period. In this paper, we consider the change request data collected from the field test of the software module that incorporate a functional relation between the faults and some software metrics. To this end, we discuss the general aspect if regression method, the problem of multicollinearity and the measures of model evaluation. We consider four possible regression models including two stepwise regression models and two nonlinear models. Four developed models are evaluated with respect to the predictive quality.

  • PDF

Prediction of extreme PM2.5 concentrations via extreme quantile regression

  • Lee, SangHyuk;Park, Seoncheol;Lim, Yaeji
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.3
    • /
    • pp.319-331
    • /
    • 2022
  • In this paper, we develop a new statistical model to forecast the PM2.5 level in Seoul, South Korea. The proposed model is based on the extreme quantile regression model with lasso penalty. Various meteorological variables and air pollution variables are considered as predictors in the regression model, and the lasso quantile regression performs variable selection and solves the multicollinearity problem. The final prediction model is obtained by combining various extreme lasso quantile regression estimators and we construct a binary classifier based on the model. Prediction performance is evaluated through the statistical measures of the performance of a binary classification test. We observe that the proposed method works better compared to the other classification methods, and predicts 'very bad' cases of the PM2.5 level well.

Health Information Managers' Job Stress in an Electronic Medical Record Environment

  • Noh, Jin-Won;Choi, Hyo-Jin;Hong, Jin-Hyuk;Boo, Yoo-Kyung
    • International Journal of Contents
    • /
    • v.13 no.2
    • /
    • pp.35-43
    • /
    • 2017
  • This study sought to measure the influence of HIMs' work environment changes on job stress, and to explore measures for improving job satisfaction among them. A total of 275 hospital HIMs' were surveyed using a structured questionnaire. Significant job stress impact variables were sorted out using a simple linear regression analysis. Then, through multiple linear regression analysis, multicollinearity was tested. Significant impact factors were identified from among the control variables, and job stress impact was measured. The survey revealed that in public hospitals where the EMR system has been implemented for a longer period, depression scores in HIMs' were increased. HIMs' job stress level was found to be affected by the following factors: computerization of their working environment, experience of depression, unemployment, and manpower reduction, as well as, their lifestyles, including leisure activities. The results of this study suggest that HIMs' job stress can be reduced through work environment improvement and improvement of their personal lifestyle habits.

Impact analysis of Industrial-University cooperation adherency degree and cooperation degree configuration variable on satisfaction (산학협력 밀착도, 협력도 구성변수가 만족도에 미치는 영향 분석)

  • Kim, Young-Bu
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.17 no.9
    • /
    • pp.359-368
    • /
    • 2016
  • In the 21st century, the Korean university education system is focused on innovation and change, including cooperation between industry and universities. It should be a goal to foster an industry-university ecosystem through interactions between universities and industry. Therefore, it is important to measure their relationships and to find advisable ways to measure the final results of industry-university cooperation. This paper sets out the achievements in cooperation and the satisfaction from such enterprises and measures mutual relationships influencing satisfaction from industry-university cooperation as to adherence and cooperation. Therefore, this research focuses on regression equation analysis in order to analyze the influence from satisfaction with industry-university cooperation based on factors in the relations between industry and universities. Also, as we examined the multicollinearity problem, before analyzing multiple regression, the multicollinearity problem appeared to be relatively irrelevant. In particular, the satisfaction variable, which can also be set as a subordinate variable, was in this research constructed as a high-dimensional subordinate variable composed of five individual variables. We then analyzed how the adherence construct factor and degree of cooperation construct factor influences the respective and subordinate satisfaction variables. As a result, the degree of realization of local customized programs was shown by the most significant variables. The biggest factor influencing satisfaction with industry-university cooperation proves the degree of realization for appropriate programs under local conditions, such as education, research, and technique guidance.

A Study on the Prediction of Fuel Consumption of a Ship Using the Principal Component Analysis (주성분 분석기법을 이용한 선박의 연료소비 예측에 관한 연구)

  • Kim, Young-Rong;Kim, Gujong;Park, Jun-Bum
    • Journal of Navigation and Port Research
    • /
    • v.43 no.6
    • /
    • pp.335-343
    • /
    • 2019
  • As the regulations of ship exhaust gas have been strengthened recently, many measures are under consideration to reduce fuel consumption. Among them, research has been performed actively to develop a machine-learning model that predicts fuel consumption by using data collected from ships. However, many studies have not considered the methodology of the main parameter selection for the model or the processing of the collected data sufficiently, and the reckless use of data may cause problems such as multicollinearity between variables. In this study, we propose a method to predict the fuel consumption of the ship by using the principal component analysis to solve these problems. The principal component analysis was performed on the operational data of the 13K TEU container ship and the fuel consumption prediction model was implemented by regression analysis with extracted components. As the R-squared value of the model for the test data was 82.99%, this model would be expected to support the decision-making of operators in the voyage planning and contribute to the monitoring of energy-efficient operation of ships during voyages.

An Investigation of the Factors Affecting Satisfaction with Cell Broadcast Service(CBS) -Focusing on Users in Incheon- (긴급재난문자 만족도에 영향을 미치는 요인 규명 -인천광역시 서비스 대상자를 중심으로-)

  • Park, Keon-Oh;Park, Jae-Young
    • Journal of Environmental Science International
    • /
    • v.33 no.3
    • /
    • pp.193-203
    • /
    • 2024
  • This study aims to determine the factors affecting the level of satisfaction with the Cell Broadcast Service (CBS) among citizens in Incheon. Partial least squares (PLS) regression, instead of multiple regression, was used for the analysis because it can solve multicollinearity and sample size issues. The analysis results are as follows: The factor with the greatest effect on satisfaction with CBS among Incheon citizens, was the elimination of redundancies (VIP=1.185). Therefore, local governments, government agencies, and public organizations must coordinate their ideas and collectively create guidelines to eliminate redundancies. The second most influential factor was the expansion in the broadcast medium from legal, institutional, and policy aspects (VIP=1.087). This is because differences in generation, age, gender, and personal characteristics were not considered. Therefore, it is necessary to devise a customized messaging tool through the expansion of broadcast media. The broadcast criteria of the legal, institutional, and policy perspectives comprised the third most influential factor, with a high VIP value of 1.053. Consequently, it is essential to devise a plan to avoid distributing unnecessary cell broadcast services, by establishing criteria for areas and sections, time, and the direct and indirect impact zones of a disaster. In the future, this study could be used as base data to develop policies, guidelines, and response measures for Incheon CBS. Given the lack of research on the diverse characteristics of each social class and the city traits of each region, and a lack of concrete empirical research on each factor, continuous and in-depth studies are required in the future.

The Effect of Social Network on Information Sharing in Franchise System (프랜차이즈시스템의 사회연결망 특성이 정보공유에 미치는 영향)

  • Yun, Han-Sung;Bae, Sang-Wook;Noh, Jung-Koo
    • Journal of Distribution Research
    • /
    • v.16 no.2
    • /
    • pp.95-118
    • /
    • 2011
  • The purpose of this study is as follows. First, we investigate empirically the effects of social network properties such as social network density and centrality of a franchisee on its information sharing with various subjects such as the franchisor and other franchisees in the franchise system. Second, we examine exploratively if tie strength between a franchisee and its franchisor plays a moderating role on the relationship between social network properties and information sharing. The study model was established as shown in

    . We gathered 200 data from franchisees in Busan through a questionnaire survey and used 189 data for our purpose. To improve the quality of data, we selected respondents from the franchisees' owners or managers that had contacted often with their franchisor and other franchisees in the franchise system. Our data analysis began with reliability analysis, exploratory and confirmatory factor analysis, on the multi-item measures of social network density, social network centrality, tie strength, information sharing and control variables such as shared goals and ownership to assess the reliability and validity of those measures. The results were shown that the presented values satisfied the general criteria for reliability and validity. We tested our hypotheses using a hierarchical multiple regression analysis in four steps. Model 1 regressed the dependent variable(information sharing) only on control variables(shared goals, ownership). Model 2 added main effect variables(social network density, social network centrality) in Model 1. Model 3 added a moderating variable(tie strength) in Model 2. Finally, Model 4 added interaction terms between the main variables and the moderating variable in Model 3. We used a mean-centering method for the main variables and the moderating variable to minimize the multicollinearity problem due to the interaction terms in Model 4. Two important empirical findings emerge from this study. In other words, the effects of social network properties and tie strength on a franchisee's information sharing depend on subject types such as the franchisor and other franchisees in franchise system. First, social network centrality, tie strength, the interaction between social network density and tie strength and the interaction between social network centrality and tie strength all affect significantly a franchisee's information sharing with its franchisor. By the way, the interaction between social network centrality and tie strength has a negative effect on its information sharing while the interaction of social network density and tie strength has a positive effect on its information sharing. Second, both social network centrality affects significantly and directly a franchisee's information sharing with other franchisees in the franchise system. However, there does not exist the moderating role of tie strength in the second case. Finally, we suggest the implications of our findings and some avenues for future research.

  • PDF