• Title/Summary/Keyword: Multiple logistic regression

Search Result 1,444, Processing Time 0.024 seconds

MULTIPLE OUTLIER DETECTION IN LOGISTIC REGRESSION BY USING INFLUENCE MATRIX

  • Lee, Gwi-Hyun;Park, Sung-Hyun
    • Journal of the Korean Statistical Society
    • /
    • v.36 no.4
    • /
    • pp.457-469
    • /
    • 2007
  • Many procedures are available to identify a single outlier or an isolated influential point in linear regression and logistic regression. But the detection of influential points or multiple outliers is more difficult, owing to masking and swamping problems. The multiple outlier detection methods for logistic regression have not been studied from the points of direct procedure yet. In this paper we consider the direct methods for logistic regression by extending the $Pe\tilde{n}a$ and Yohai (1995) influence matrix algorithm. We define the influence matrix in logistic regression by using Cook's distance in logistic regression, and test multiple outliers by using the mean shift model. To show accuracy of the proposed multiple outlier detection algorithm, we simulate artificial data including multiple outliers with masking and swamping.

Multiple Deletions in Logistic Regression Models

  • Jung, Kang-Mo
    • Communications for Statistical Applications and Methods
    • /
    • v.16 no.2
    • /
    • pp.309-315
    • /
    • 2009
  • We extended the results of Roy and Guria (2008) to multiple deletions in logistic regression models. Since single deletions may not exactly detect outliers or influential observations due to swamping effects and masking effects, it needs multiple deletions. We developed conditional deletion diagnostics which are designed to overcome problems of masking effects. We derived the closed forms for several statistics in logistic regression models. They give useful diagnostics on the statistics.

Preventing the Musculoskeletal Disorders using Association Rule - Based on Result of Multiple Logistic Regression - (연관규칙을 이용한 근골격계 질환 예방 - 다변량 로지스틱 회귀분석의 결과를 기반으로 -)

  • Park, Seung-Hun;Lee, Seog-Hwan
    • Journal of the Korea Safety Management & Science
    • /
    • v.9 no.4
    • /
    • pp.29-38
    • /
    • 2007
  • We adapted association rules of data mining in order to investigate the relation among the factors of musculoskeletal disorders and proposed the method of preventing the musculoskeletal disorders associated with multiple logistic regression in previous study. This multiple logistic regression was difficult to establish the method of preventing musculoskeletal disorders in case factors can't be managed by worker himself, i.e., age, gender, marital status. In order to solve this problem, we devised association rules of factors of musculoskeletal disorders and proposed the interactive method of preventing the musculoskeletal disorders, by applying association rules with the result of multiple logistic regression in previous study. The result of correlation analysis showed that prevention method of one part also prevents musculoskeletal disorders of other parts of body.

A Study on Factors Affecting the Use of Ambulatory Physician Services (의사방문수 결정요인 분석)

  • 박현애;송건용
    • Health Policy and Management
    • /
    • v.4 no.2
    • /
    • pp.58-76
    • /
    • 1994
  • In order to study factors affecting the use of the ambulatory physician services. Andersen's model for health utilization was modified by adding the health behavior component and examined with three different approaches. Three different approaches were the multiople regression model, logistic regression model, and LISREL model. For multiple regression, dependent variable was reported illness-related visits to a physician during past one year and independent variables are variaous variables measuring predisposing factor, enabling factor, need factor and health behavior. For the logistic regression, dependent variable was visit or no-visit to a physician during past one year and independent variables were same as the multiple regression analysis. For the LISREL, five endogenous variables of health utiliztion, predisposing factor, enabling factor, need factor, and health behavior and 20 exogeneous variables which measures five endogenous variables were used. According to the multiple regression analysis, chronic illness, health status, perceived health status of the need factor; residence, sex, age, marital status, education of the predisposing factor ; health insurance, usual source for medical care of enabling factor were the siginificant exploratory variables for the health utilization. Out of the logistic regression analysis, health status, chronic illness, residence, marital status, education, drinking, use of health aid were found to be significant exploratory variables. From LISREL, need factor affect utilization most following by predisposing factor, enabling factor and health behavior. For LISREL model, age, education, and residence for predisposing factor; health status, chronic illess, and perceived health status for need factor; medical insurance for enabling factor; and doing any kind of health behavior for the health behavior were found as the significant observed variables for each theoretical variables.

  • PDF

APPLICATION AND CROSS-VALIDATION OF SPATIAL LOGISTIC MULTIPLE REGRESSION FOR LANDSLIDE SUSCEPTIBILITY ANALYSIS

  • LEE SARO
    • Proceedings of the KSRS Conference
    • /
    • 2004.10a
    • /
    • pp.302-305
    • /
    • 2004
  • The aim of this study is to apply and crossvalidate a spatial logistic multiple-regression model at Boun, Korea, using a Geographic Information System (GIS). Landslide locations in the Boun area were identified by interpretation of aerial photographs and field surveys. Maps of the topography, soil type, forest cover, geology, and land-use were constructed from a spatial database. The factors that influence landslide occurrence, such as slope, aspect, and curvature of topography, were calculated from the topographic database. Texture, material, drainage, and effective soil thickness were extracted from the soil database, and type, diameter, and density of forest were extracted from the forest database. Lithology was extracted from the geological database and land-use was classified from the Landsat TM image satellite image. Landslide susceptibility was analyzed using landslide-occurrence factors by logistic multiple-regression methods. For validation and cross-validation, the result of the analysis was applied both to the study area, Boun, and another area, Youngin, Korea. The validation and cross-validation results showed satisfactory agreement between the susceptibility map and the existing data with respect to landslide locations. The GIS was used to analyze the vast amount of data efficiently, and statistical programs were used to maintain specificity and accuracy.

  • PDF

On the Logistic Regression Diagnostics

  • Kim, Choong-Rak;Jeong, Kwang-Mo
    • Journal of the Korean Statistical Society
    • /
    • v.22 no.1
    • /
    • pp.27-37
    • /
    • 1993
  • Since the analytic expression for a diagnostic in the logistic regression model is not available, one-step estimation is often used by a case-deletion point of view. In this paper, infinitesimal perturbation approach is used, and it is shown that the scale transformation of infinitesimal perturbation approach is eventually equal to the weighted perturbation of local influence approach and the replacement measure. Also, multiple cases deletion for the masking effect is considered.

  • PDF

Sleep Quality and its Associated Factors in Adults (성인의 수면의 질과 관련요인에 관한 연구)

  • Yi, Hyeryeon
    • Journal of Korean Public Health Nursing
    • /
    • v.27 no.1
    • /
    • pp.76-88
    • /
    • 2013
  • Purpose: The purpose of this study was to identify the degree of sleep quality and its associated factors in adults. Methods: The data was collected from 986 adults aged 19 to 64 by convenience sampling. Subjects completed a questionnaire composed of Pittsburgh Sleep Quality Index (PSQI), Beck Depression Inventory, and other questions that self-rated health and sociodemographic variables. Statistical methods used included descriptive statistics, simple logistic regression, and multiple logistic regression analyses. Results: The global PSQI score was 5.7. About 45% of the subjects were poor sleepers (global PSQI score >5). Multiple logistic regression analyses showed that factors significantly associated with sleep quality were depression and poor self-rated health in young and middle-aged adults. Depression was the most significant associated factor. The presence of a spouse was also associated with sleep quality in young adults. Conclusion: These findings suggest that people with poor sleep quality should have their health carefully screened for depression. In addition, we recommend the development of a nursing program for improving sleep quality.

An Introduction to Logistic Regression: From Basic Concepts to Interpretation with Particular Attention to Nursing Domain

  • Park, Hyeoun-Ae
    • Journal of Korean Academy of Nursing
    • /
    • v.43 no.2
    • /
    • pp.154-164
    • /
    • 2013
  • Purpose: The purpose of this article is twofold: 1) introducing logistic regression (LR), a multivariable method for modeling the relationship between multiple independent variables and a categorical dependent variable, and 2) examining use and reporting of LR in the nursing literature. Methods: Text books on LR and research articles employing LR as main statistical analysis were reviewed. Twenty-three articles published between 2010 and 2011 in the Journal of Korean Academy of Nursing were analyzed for proper use and reporting of LR models. Results: Logistic regression from basic concepts such as odds, odds ratio, logit transformation and logistic curve, assumption, fitting, reporting and interpreting to cautions were presented. Substantial shortcomings were found in both use of LR and reporting of results. For many studies, sample size was not sufficiently large to call into question the accuracy of the regression model. Additionally, only one study reported validation analysis. Conclusion: Nursing researchers need to pay greater attention to guidelines concerning the use and reporting of LR models.

The health effects of low blood lead level in oxidative stress as a marker, serum gamma-glutamyl transpeptidase level, in male steelworkers

  • Su-Yeon Lee;Yong-Jin Lee;Young-Sun Min;Eun-Chul Jang;Soon-Chan Kwon;Inho Lee
    • Annals of Occupational and Environmental Medicine
    • /
    • v.34
    • /
    • pp.34.1-34.13
    • /
    • 2022
  • Background: This study aimed to investigate the association between lead exposure and serum gamma-glutamyl transpeptidase (γGT) levels as an oxidative stress marker in male steelworkers. Methods: Data were collected during the annual health examination of workers in 2020. A total of 1,654 steelworkers were selected, and the variables for adjustment included the workers' general characteristics, lifestyle, and occupational characteristics. The association between the blood lead level (BLL) and serum γGT level was investigated by multiple linear and logistic regression analyses. The BLL and serum γGT values that were transformed into natural logarithms were used in multiple linear regression analysis, and the tertile of BLL was used in logistic regression analysis. Results: The geometric mean of the participants' BLLs and serum γGT level was 1.36 ㎍/dL and 27.72 IU/L, respectively. Their BLLs differed depending on age, body mass index (BMI), smoking status, drinking status, shift work, and working period, while their serum γGT levels differed depending on age, BMI, smoking status, drinking status, physical activity, and working period. In multiple linear regression analysis, the difference in models 1, 2, and 3 was significant, obtaining 0.326, 0.176, and 0.172 (all: p < 0.001), respectively. In the multiple linear regression analysis stratified according to drinking status, BMI, and age, BLLs were positively associated with serum γGT levels. Regarding the logistic regression analysis, the odds ratio of the third BLL tertile in models 1, 2, and 3 (for having an elevated serum γGT level within the first tertile reference) was 2.74, 1.83, and 1.81, respectively. Conclusions: BLL was positively associated with serum γGT levels in male steelworkers even at low lead concentrations (< 5 ㎍/dL).

Power Failure Sensitivity Analysis via Grouped L1/2 Sparsity Constrained Logistic Regression

  • Li, Baoshu;Zhou, Xin;Dong, Ping
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.8
    • /
    • pp.3086-3101
    • /
    • 2021
  • To supply precise marketing and differentiated service for the electric power service department, it is very important to predict the customers with high sensitivity of electric power failure. To solve this problem, we propose a novel grouped 𝑙1/2 sparsity constrained logistic regression method for sensitivity assessment of electric power failure. Different from the 𝑙1 norm and k-support norm, the proposed grouped 𝑙1/2 sparsity constrained logistic regression method simultaneously imposes the inter-class information and tighter approximation to the nonconvex 𝑙0 sparsity to exploit multiple correlated attributions for prediction. Firstly, the attributes or factors for predicting the customer sensitivity of power failure are selected from customer sheets, such as customer information, electric consuming information, electrical bill, 95598 work sheet, power failure events, etc. Secondly, all these samples with attributes are clustered into several categories, and samples in the same category are assumed to be sharing similar properties. Then, 𝑙1/2 norm constrained logistic regression model is built to predict the customer's sensitivity of power failure. Alternating direction of multipliers (ADMM) algorithm is finally employed to solve the problem by splitting it into several sub-problems effectively. Experimental results on power electrical dataset with about one million customer data from a province validate that the proposed method has a good prediction accuracy.