• Title/Summary/Keyword: Conditional logistic regression

Search Result 88, Processing Time 0.018 seconds

Multiple Deletions in Logistic Regression Models

  • Jung, Kang-Mo
    • Communications for Statistical Applications and Methods
    • /
    • v.16 no.2
    • /
    • pp.309-315
    • /
    • 2009
  • We extended the results of Roy and Guria (2008) to multiple deletions in logistic regression models. Since single deletions may not exactly detect outliers or influential observations due to swamping effects and masking effects, it needs multiple deletions. We developed conditional deletion diagnostics which are designed to overcome problems of masking effects. We derived the closed forms for several statistics in logistic regression models. They give useful diagnostics on the statistics.

Variable Selection with Log-Density in Logistic Regression Model (로지스틱회귀모형에서 로그-밀도비를 이용한 변수의 선택)

  • Kahng, Myung-Wook;Shin, Eun-Young
    • Communications for Statistical Applications and Methods
    • /
    • v.19 no.1
    • /
    • pp.1-11
    • /
    • 2012
  • We present methods to study the log-density ratio of the conditional densities of the predictors given the response variable in the logistic regression model. This allows us to select which predictors are needed and how they should be included in the model. If the conditional distributions are skewed, the distributions can be considered as gamma distributions. A simulation study shows that the linear and log terms are required in general. If the conditional distributions of xjy for the two groups overlap significantly, we need both the linear and log terms; however, only the linear or log term is needed in the model if they are well separated.

Improved Exact Inference in Logistic Regression Model

  • Kim, Donguk;Kim, Sooyeon
    • Communications for Statistical Applications and Methods
    • /
    • v.10 no.2
    • /
    • pp.277-289
    • /
    • 2003
  • We propose modified exact inferential methods in logistic regression model. Exact conditional distribution in logistic regression model is often highly discrete, and ordinary exact inference in logistic regression is conservative, because of the discreteness of the distribution. For the exact inference in logistic regression model we utilize the modified P-value. The modified P-value can not exceed the ordinary P-value, so the test of size $\alpha$ based on the modified P-value is less conservative. The modified exact confidence interval maintains at least a fixed confidence level but tends to be much narrower. The approach inverts results of a test with a modified P-value utilizing the test statistic and table probabilities in logistic regression model.

Statistical micro matching using a multinomial logistic regression model for categorical data

  • Kim, Kangmin;Park, Mingue
    • Communications for Statistical Applications and Methods
    • /
    • v.26 no.5
    • /
    • pp.507-517
    • /
    • 2019
  • Statistical matching is a method of combining multiple sources of data that are extracted or surveyed from the same population. It can be used in situation when variables of interest are not jointly observed. It is a low-cost way to expect high-effects in terms of being able to create synthetic data using existing sources. In this paper, we propose the several statistical micro matching methods using a multinomial logistic regression model when all variables of interest are categorical or categorized ones, which is common in sample survey. Under conditional independence assumption (CIA), a mixed statistical matching method, which is useful when auxiliary information is not available, is proposed. We also propose a statistical matching method with auxiliary information that reduces the bias of the conventional matching methods suggested under CIA. Through a simulation study, proposed micro matching methods and conventional ones are compared. Simulation study shows that suggested matching methods outperform the existing ones especially when CIA does not hold.

Probabilistic Forecasting of Seasonal Inflow to Reservoir (계절별 저수지 유입량의 확률예측)

  • Kang, Jaewon
    • Journal of Environmental Science International
    • /
    • v.22 no.8
    • /
    • pp.965-977
    • /
    • 2013
  • Reliable long-term streamflow forecasting is invaluable for water resource planning and management which allocates water supply according to the demand of water users. It is necessary to get probabilistic forecasts to establish risk-based reservoir operation policies. Probabilistic forecasts may be useful for the users who assess and manage risks according to decision-making responding forecasting results. Probabilistic forecasting of seasonal inflow to Andong dam is performed and assessed using selected predictors from sea surface temperature and 500 hPa geopotential height data. Categorical probability forecast by Piechota's method and logistic regression analysis, and probability forecast by conditional probability density function are used to forecast seasonal inflow. Kernel density function is used in categorical probability forecast by Piechota's method and probability forecast by conditional probability density function. The results of categorical probability forecasts are assessed by Brier skill score. The assessment reveals that the categorical probability forecasts are better than the reference forecasts. The results of forecasts using conditional probability density function are assessed by qualitative approach and transformed categorical probability forecasts. The assessment of the forecasts which are transformed to categorical probability forecasts shows that the results of the forecasts by conditional probability density function are much better than those of the forecasts by Piechota's method and logistic regression analysis except for winter season data.

A study on log-density with log-odds graph for variable selection in logistic regression (로지스틱회귀모형의 변수선택에서 로그-오즈 그래프를 통한 로그-밀도비 연구)

  • Kahng, Myung-Wook;Shin, Eun-Young
    • Journal of the Korean Data and Information Science Society
    • /
    • v.23 no.1
    • /
    • pp.99-111
    • /
    • 2012
  • The log-density ratio of the conditional densities of the predictors given the response variable provides useful information for variable selection in the logistic regression model. In this paper, we consider the predictors that are needed and how they should be included in the model. If the conditional distributions are skewed, the distributions can be considered as gamma distributions. Under this assumption, linear and log terms are generally included in the model. The log-odds graph is a very useful graphical tool in this study. A graphical study is presented which shows that if the conditional distributions of x|y for the two groups overlap significantly, we need both the linear and quadratic terms. On the contrary, if they are well separated, only the linear or log term is needed in the model.

A Study on Prediction Techniques through Machine Learning of Real-time Solar Radiation in Jeju (제주 실시간 일사량의 기계학습 예측 기법 연구)

  • Lee, Young-Mi;Bae, Joo-Hyun;Park, Jeong-keun
    • Journal of Environmental Science International
    • /
    • v.26 no.4
    • /
    • pp.521-527
    • /
    • 2017
  • Solar radiation forecasts are important for predicting the amount of ice on road and the potential solar energy. In an attempt to improve solar radiation predictability in Jeju, we conducted machine learning with various data mining techniques such as tree models, conditional inference tree, random forest, support vector machines and logistic regression. To validate machine learning models, the results from the simulation was compared with the solar radiation data observed over Jeju observation site. According to the model assesment, it can be seen that the solar radiation prediction using random forest is the most effective method. The error rate proposed by random forest data mining is 17%.

Estimation of Logistic Regression for Two-Stage Case-Control Data (2단계 사례-대조자료를 위한 로지스틱 회귀모형의 추론)

  • 신미영;신은순
    • The Korean Journal of Applied Statistics
    • /
    • v.13 no.2
    • /
    • pp.237-245
    • /
    • 2000
  • In this paper we consider a logistic regression model based on two-stage case-control sampling and study the Weighted Exogeneous Sampling Maximum Likelihood(WESML) method to get an asymptotically normal estimates of the parameters in a logistic regression model. A numerical example is carried out to demonstrate the differences between the Conditional Maximum Likelihood(CML) estimates and the WESML estimates for two-stage case-control data.

  • PDF

Exploring interaction using 3-D residual plots in logistic regression model (3차원 잔차산점도를 이용한 로지스틱회귀모형에서 교호작용의 탐색)

  • Kahng, Myung-Wook
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.1
    • /
    • pp.177-185
    • /
    • 2014
  • Under bivariate normal distribution assumptions, the interaction and quadratic terms are needed in the logistic regression model with two predictors. However, depending on the correlation coefficient and the variances of two conditional distributions, the interaction and quadratic terms may not be necessary. Although the need for these terms can be determined by comparing the two scatter plots, it is not as useful for interaction terms. We explore the structure and usefulness of the 3-D residual plot as a tool for dealing with interaction in logistic regression models. If predictors have an interaction effect, a 3-D residual plot can show the effect. This is illustrated by simulated and real data.

Associations between Air Pollution and Asthma-related Hospital Admissions in Children in Seoul, Korea: A Case-crossover Study (환자교차 설계 방법을 적용한 소아천식 입원에 대한 도시 대기오염의 급성영향평가)

  • Lee, Jong-Tae
    • Journal of Preventive Medicine and Public Health
    • /
    • v.36 no.1
    • /
    • pp.47-53
    • /
    • 2003
  • Objectives : I used a case-crossover design to investigate the association between air pollution, and hospital admissions for asthmatic children under the age of 15 years in Seoul, Korea Methods : I estimated the changes in the levels of hospitalization risk from theinterquartile (IQR) increase in each pollutant concentrations, using conditional logistic regression analyses, with controls for weather information. Results : Using bidirectional control sampling, the results from a conditional logistic regression model, with controls for weather conditions, showed the estimated relative risk of hospitalization for asthma among children to be 1.04 (95% confidence interval (CI), 1.01-1.08) for particulate matter with an aerodynamic diameter less than or equal to 10m (IQR=40.4ug/m3); 1.05 (95% CI, 1.00- 1.09) for nitrogen dioxide (IQR=14.6ppb): 1.02 (95% CI,0.97-1.06) for sulfur dioxide (IQR=4.4ppb): 1.03 (95% CI, 0.99-1.08) for ozone (IQR=21.7ppb): and 1.03 (95% CI, 0.99-1.08) for carbon monoxide f10R=1.0ppm). Conclusions : This empirical analysis indicates the bidirectional control sampling methods, by design, would successfully control the confounding factors due to the long-term time trends of air pollution. These findings also support the hypothesis that air pollution at levels below the current ambient air quality standards of Korea is harmful to sensitive subjects, such as asthmatic children.