• Title/Summary/Keyword: logistic multiple regression

검색결과 1,413건 처리시간 0.028초

MULTIPLE OUTLIER DETECTION IN LOGISTIC REGRESSION BY USING INFLUENCE MATRIX

  • Lee, Gwi-Hyun;Park, Sung-Hyun
    • Journal of the Korean Statistical Society
    • /
    • 제36권4호
    • /
    • pp.457-469
    • /
    • 2007
  • Many procedures are available to identify a single outlier or an isolated influential point in linear regression and logistic regression. But the detection of influential points or multiple outliers is more difficult, owing to masking and swamping problems. The multiple outlier detection methods for logistic regression have not been studied from the points of direct procedure yet. In this paper we consider the direct methods for logistic regression by extending the $Pe\tilde{n}a$ and Yohai (1995) influence matrix algorithm. We define the influence matrix in logistic regression by using Cook's distance in logistic regression, and test multiple outliers by using the mean shift model. To show accuracy of the proposed multiple outlier detection algorithm, we simulate artificial data including multiple outliers with masking and swamping.

Multiple Deletions in Logistic Regression Models

  • Jung, Kang-Mo
    • Communications for Statistical Applications and Methods
    • /
    • 제16권2호
    • /
    • pp.309-315
    • /
    • 2009
  • We extended the results of Roy and Guria (2008) to multiple deletions in logistic regression models. Since single deletions may not exactly detect outliers or influential observations due to swamping effects and masking effects, it needs multiple deletions. We developed conditional deletion diagnostics which are designed to overcome problems of masking effects. We derived the closed forms for several statistics in logistic regression models. They give useful diagnostics on the statistics.

연관규칙을 이용한 근골격계 질환 예방 - 다변량 로지스틱 회귀분석의 결과를 기반으로 - (Preventing the Musculoskeletal Disorders using Association Rule - Based on Result of Multiple Logistic Regression -)

  • 박승헌;이석환
    • 대한안전경영과학회지
    • /
    • 제9권4호
    • /
    • pp.29-38
    • /
    • 2007
  • We adapted association rules of data mining in order to investigate the relation among the factors of musculoskeletal disorders and proposed the method of preventing the musculoskeletal disorders associated with multiple logistic regression in previous study. This multiple logistic regression was difficult to establish the method of preventing musculoskeletal disorders in case factors can't be managed by worker himself, i.e., age, gender, marital status. In order to solve this problem, we devised association rules of factors of musculoskeletal disorders and proposed the interactive method of preventing the musculoskeletal disorders, by applying association rules with the result of multiple logistic regression in previous study. The result of correlation analysis showed that prevention method of one part also prevents musculoskeletal disorders of other parts of body.

의사방문수 결정요인 분석 (A Study on Factors Affecting the Use of Ambulatory Physician Services)

  • 박현애;송건용
    • 보건행정학회지
    • /
    • 제4권2호
    • /
    • pp.58-76
    • /
    • 1994
  • In order to study factors affecting the use of the ambulatory physician services. Andersen's model for health utilization was modified by adding the health behavior component and examined with three different approaches. Three different approaches were the multiople regression model, logistic regression model, and LISREL model. For multiple regression, dependent variable was reported illness-related visits to a physician during past one year and independent variables are variaous variables measuring predisposing factor, enabling factor, need factor and health behavior. For the logistic regression, dependent variable was visit or no-visit to a physician during past one year and independent variables were same as the multiple regression analysis. For the LISREL, five endogenous variables of health utiliztion, predisposing factor, enabling factor, need factor, and health behavior and 20 exogeneous variables which measures five endogenous variables were used. According to the multiple regression analysis, chronic illness, health status, perceived health status of the need factor; residence, sex, age, marital status, education of the predisposing factor ; health insurance, usual source for medical care of enabling factor were the siginificant exploratory variables for the health utilization. Out of the logistic regression analysis, health status, chronic illness, residence, marital status, education, drinking, use of health aid were found to be significant exploratory variables. From LISREL, need factor affect utilization most following by predisposing factor, enabling factor and health behavior. For LISREL model, age, education, and residence for predisposing factor; health status, chronic illess, and perceived health status for need factor; medical insurance for enabling factor; and doing any kind of health behavior for the health behavior were found as the significant observed variables for each theoretical variables.

  • PDF

APPLICATION AND CROSS-VALIDATION OF SPATIAL LOGISTIC MULTIPLE REGRESSION FOR LANDSLIDE SUSCEPTIBILITY ANALYSIS

  • LEE SARO
    • 대한원격탐사학회:학술대회논문집
    • /
    • 대한원격탐사학회 2004년도 Proceedings of ISRS 2004
    • /
    • pp.302-305
    • /
    • 2004
  • The aim of this study is to apply and crossvalidate a spatial logistic multiple-regression model at Boun, Korea, using a Geographic Information System (GIS). Landslide locations in the Boun area were identified by interpretation of aerial photographs and field surveys. Maps of the topography, soil type, forest cover, geology, and land-use were constructed from a spatial database. The factors that influence landslide occurrence, such as slope, aspect, and curvature of topography, were calculated from the topographic database. Texture, material, drainage, and effective soil thickness were extracted from the soil database, and type, diameter, and density of forest were extracted from the forest database. Lithology was extracted from the geological database and land-use was classified from the Landsat TM image satellite image. Landslide susceptibility was analyzed using landslide-occurrence factors by logistic multiple-regression methods. For validation and cross-validation, the result of the analysis was applied both to the study area, Boun, and another area, Youngin, Korea. The validation and cross-validation results showed satisfactory agreement between the susceptibility map and the existing data with respect to landslide locations. The GIS was used to analyze the vast amount of data efficiently, and statistical programs were used to maintain specificity and accuracy.

  • PDF

On the Logistic Regression Diagnostics

  • Kim, Choong-Rak;Jeong, Kwang-Mo
    • Journal of the Korean Statistical Society
    • /
    • 제22권1호
    • /
    • pp.27-37
    • /
    • 1993
  • Since the analytic expression for a diagnostic in the logistic regression model is not available, one-step estimation is often used by a case-deletion point of view. In this paper, infinitesimal perturbation approach is used, and it is shown that the scale transformation of infinitesimal perturbation approach is eventually equal to the weighted perturbation of local influence approach and the replacement measure. Also, multiple cases deletion for the masking effect is considered.

  • PDF

성인의 수면의 질과 관련요인에 관한 연구 (Sleep Quality and its Associated Factors in Adults)

  • 이혜련
    • 한국보건간호학회지
    • /
    • 제27권1호
    • /
    • pp.76-88
    • /
    • 2013
  • Purpose: The purpose of this study was to identify the degree of sleep quality and its associated factors in adults. Methods: The data was collected from 986 adults aged 19 to 64 by convenience sampling. Subjects completed a questionnaire composed of Pittsburgh Sleep Quality Index (PSQI), Beck Depression Inventory, and other questions that self-rated health and sociodemographic variables. Statistical methods used included descriptive statistics, simple logistic regression, and multiple logistic regression analyses. Results: The global PSQI score was 5.7. About 45% of the subjects were poor sleepers (global PSQI score >5). Multiple logistic regression analyses showed that factors significantly associated with sleep quality were depression and poor self-rated health in young and middle-aged adults. Depression was the most significant associated factor. The presence of a spouse was also associated with sleep quality in young adults. Conclusion: These findings suggest that people with poor sleep quality should have their health carefully screened for depression. In addition, we recommend the development of a nursing program for improving sleep quality.

An Introduction to Logistic Regression: From Basic Concepts to Interpretation with Particular Attention to Nursing Domain

  • Park, Hyeoun-Ae
    • 대한간호학회지
    • /
    • 제43권2호
    • /
    • pp.154-164
    • /
    • 2013
  • Purpose: The purpose of this article is twofold: 1) introducing logistic regression (LR), a multivariable method for modeling the relationship between multiple independent variables and a categorical dependent variable, and 2) examining use and reporting of LR in the nursing literature. Methods: Text books on LR and research articles employing LR as main statistical analysis were reviewed. Twenty-three articles published between 2010 and 2011 in the Journal of Korean Academy of Nursing were analyzed for proper use and reporting of LR models. Results: Logistic regression from basic concepts such as odds, odds ratio, logit transformation and logistic curve, assumption, fitting, reporting and interpreting to cautions were presented. Substantial shortcomings were found in both use of LR and reporting of results. For many studies, sample size was not sufficiently large to call into question the accuracy of the regression model. Additionally, only one study reported validation analysis. Conclusion: Nursing researchers need to pay greater attention to guidelines concerning the use and reporting of LR models.

Power Failure Sensitivity Analysis via Grouped L1/2 Sparsity Constrained Logistic Regression

  • Li, Baoshu;Zhou, Xin;Dong, Ping
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제15권8호
    • /
    • pp.3086-3101
    • /
    • 2021
  • To supply precise marketing and differentiated service for the electric power service department, it is very important to predict the customers with high sensitivity of electric power failure. To solve this problem, we propose a novel grouped 𝑙1/2 sparsity constrained logistic regression method for sensitivity assessment of electric power failure. Different from the 𝑙1 norm and k-support norm, the proposed grouped 𝑙1/2 sparsity constrained logistic regression method simultaneously imposes the inter-class information and tighter approximation to the nonconvex 𝑙0 sparsity to exploit multiple correlated attributions for prediction. Firstly, the attributes or factors for predicting the customer sensitivity of power failure are selected from customer sheets, such as customer information, electric consuming information, electrical bill, 95598 work sheet, power failure events, etc. Secondly, all these samples with attributes are clustered into several categories, and samples in the same category are assumed to be sharing similar properties. Then, 𝑙1/2 norm constrained logistic regression model is built to predict the customer's sensitivity of power failure. Alternating direction of multipliers (ADMM) algorithm is finally employed to solve the problem by splitting it into several sub-problems effectively. Experimental results on power electrical dataset with about one million customer data from a province validate that the proposed method has a good prediction accuracy.

기계학습을 적용한 자기보고 증상 기반의 어혈 변증 모델 구축 (Machine Learning Approach to Blood Stasis Pattern Identification Based on Self-reported Symptoms)

  • 김현호;양승범;강연석;박영배;김재효
    • Korean Journal of Acupuncture
    • /
    • 제33권3호
    • /
    • pp.102-113
    • /
    • 2016
  • Objectives : This study is aimed at developing and discussing the prediction model of blood stasis pattern of traditional Korean medicine(TKM) using machine learning algorithms: multiple logistic regression and decision tree model. Methods : First, we reviewed the blood stasis(BS) questionnaires of Korean, Chinese, and Japanese version to make a integrated BS questionnaire of patient-reported outcomes. Through a human subject research, patients-reported BS symptoms data were acquired. Next, experts decisions of 5 Korean medicine doctor were also acquired, and supervised learning models were developed using multiple logistic regression and decision tree. Results : Integrated BS questionnaire with 24 items was developed. Multiple logistic regression models with accuracy of 0.92(male) and 0.95(female) validated by 10-folds cross-validation were constructed. By decision tree modeling methods, male model with 8 decision node and female model with 6 decision node were made. In the both models, symptoms of 'recent physical trauma', 'chest pain', 'numbness', and 'menstrual disorder(female only)' were considered as important factors. Conclusions : Because machine learning, especially supervised learning, can reveal and suggest important or essential factors among the very various symptoms making up a pattern identification, it can be a very useful tool in researching diagnostics of TKM. With a proper patient-reported outcomes or well-structured database, it can also be applied to a pre-screening solutions of healthcare system in Mibyoung stage.