Search | Korea Science

Effect of zero imputation methods for log-transformation of independent variables in logistic regression

Seo Young Park
- Communications for Statistical Applications and Methods
- /
- v.31 no.4
- /
- pp.409-425
- /
- 2024
Logistic regression models are commonly used to explain binary health outcome variable using independent variables such as patient characteristics in medical science and public health research. Although there is no distributional assumption required for independent variables in logistic regression, variables with severely right-skewed distribution such as lab values are often log-transformed to achieve symmetry or approximate normality. However, lab values often have zeros due to limit of detection which makes it impossible to apply log-transformation. Therefore, preprocessing to handle zeros in the observation before log-transformation is necessary. In this study, five methods that remove zeros (shift by 1, shift by half of the smallest nonzero, shift by square root of the smallest nonzero, replace zeros with half of the smallest nonzero, replace zeros with the square root of the smallest nonzero) are investigated in logistic regression setting. To evaluate performances of these methods, we performed a simulation study based on randomly generated data from log-normal distribution and logistic regression model. Shift by 1 method has the worst performance, and overall shift by half of the smallest nonzero method, replace zeros with half of the smallest nonzero method, and replace zeros with the square root of the smallest nonzero method showed comparable and stable performances.
https://doi.org/10.29220/CSAM.2024.31.4.409 인용 PDF

Imputation for Binary or Ordered Categorical Traits Based on the Bayesian Threshold Model (베이지안 분계점 모형에 의한 순서 범주형 변수의 대체)

Lee Seung-Chun
- The Korean Journal of Applied Statistics
- /
- v.18 no.3
- /
- pp.597-606
- /
- 2005
The nonresponse in sample survey causes a problem when it comes time to analyze dataset in public-use files where the user has only complete-data methods available and has limited information about the reasons for nonresponse. Recently imputation for nonresponse is becoming a standard approach for handling nonresponse and various imputation methods have been devised . However, most imputation methods concern with continuous traits while many interesting features are measured by binary or ordered categorical scales in sample survey. In this note. an imputation method for ignorable nonresponse in binary or ordered categorical traits is considered.
https://doi.org/10.5351/KJAS.2005.18.3.597 인용 PDF KSCI

A Comparative Study of Predictive Factors for Passing the National Physical Therapy Examination using Logistic Regression Analysis and Decision Tree Analysis

Kim, So Hyun;Cho, Sung Hyoun
- Physical Therapy Rehabilitation Science
- /
- v.11 no.3
- /
- pp.285-295
- /
- 2022
Objective: The purpose of this study is to use logistic regression and decision tree analysis to identify the factors that affect the success or failurein the national physical therapy examination; and to build and compare predictive models. Design: Secondary data analysis study Methods: We analyzed 76,727 subjects from the physical therapy national examination data provided by the Korea Health Personnel Licensing Examination Institute. The target variable was pass or fail, and the input variables were gender, age, graduation status, and examination area. Frequency analysis, chi-square test, binary logistic regression, and decision tree analysis were performed on the data. Results: In the logistic regression analysis, subjects in their 20s (Odds ratio, OR=1, reference), expected to graduate (OR=13.616, p<0.001) and from the examination area of Jeju-do (OR=3.135, p<0.001), had a high probability of passing. In the decision tree, the predictive factors for passing result had the greatest influence in the order of graduation status (x²=12366.843, p<0.001) and examination area (x²=312.446, p<0.001). Logistic regression analysis showed a specificity of 39.6% and sensitivity of 95.5%; while decision tree analysis showed a specificity of 45.8% and sensitivity of 94.7%. In classification accuracy, logistic regression and decision tree analysis showed 87.6% and 88.0% prediction, respectively. Conclusions: Both logistic regression and decision tree analysis were adequate to explain the predictive model. Additionally, whether actual test takers passed the national physical therapy examination could be determined, by applying the constructed prediction model and prediction rate.
https://doi.org/10.14474/ptrs.2022.11.3.285 인용 PDF KSCI

A Comparative Study of Predictive Factors for Hypertension using Logistic Regression Analysis and Decision Tree Analysis

SoHyun Kim;SungHyoun Cho
- Physical Therapy Rehabilitation Science
- /
- v.12 no.2
- /
- pp.80-91
- /
- 2023
Objective: The purpose of this study is to identify factors that affect the incidence of hypertension using logistic regression and decision tree analysis, and to build and compare predictive models. Design: Secondary data analysis study Methods: We analyzed 9,859 subjects from the Korean health panel annual 2019 data provided by the Korea Institute for Health and Social Affairs and National Health Insurance Service. Frequency analysis, chi-square test, binary logistic regression, and decision tree analysis were performed on the data. Results: In logistic regression analysis, those who were 60 years of age or older (Odds ratio, OR=68.801, p<0.001), those who were divorced/widowhood/separated (OR=1.377, p<0.001), those who graduated from middle school or younger (OR=1, reference), those who did not walk at all (OR=1, reference), those who were obese (OR=5.109, p<0.001), and those who had poor subjective health status (OR=2.163, p<0.001) were more likely to develop hypertension. In the decision tree, those over 60 years of age, overweight or obese, and those who graduated from middle school or younger had the highest probability of developing hypertension at 83.3%. Logistic regression analysis showed a specificity of 85.3% and sensitivity of 47.9%; while decision tree analysis showed a specificity of 81.9% and sensitivity of 52.9%. In classification accuracy, logistic regression and decision tree analysis showed 73.6% and 72.6% prediction, respectively. Conclusions: Both logistic regression and decision tree analysis were adequate to explain the predictive model. It is thought that both analysis methods can be used as useful data for constructing a predictive model for hypertension.
https://doi.org/10.14474/ptrs.2023.12.2.80 인용 PDF

A Probability Mapping for Land Cover Change Prediction using CLUE Model (토지피복변화 예측을 위한 CLUE 모델의 확률지도 생성)

Oh, Yun-Gyeong;Choi, Jin-Yong;Bae, Seung-Jong;Yoo, Seung-Hwan;Lee, Sang-Hyun
- Journal of Korean Society of Rural Planning
- /
- v.16 no.2
- /
- pp.47-55
- /
- 2010
Land cover and land use change data are important in many studies including climate change and hydrological studies. Although the various theories and models have been developed, it is difficult to identify the driving factors of the land use change because land use change is related to policy options and natural and socio-economic conditions. This study is to attempt to simulate the land cover change using the CLUE model based on a statistical analysis of land-use change. CLUE model has dynamic modeling tools from the competition among land use change in between driving force and land use, so that this model depends on statistical relations between land use change and driving factors. In this study, Yongin, Icheon and Anseong were selected for the study areas, and binary logistic regression and factor analysis were performed verifying with ROC curve. Land cover probability map was also prepared to compare with the land cover data and higher probability areas are well matched with the present land cover demonstrating CLUE model applicability.
PDF KSCI

Optimization Method of Knapsack Problem Based on BPSO-SA in Logistics Distribution

Zhang, Yan;Wu, Tengyu;Ding, Xiaoyue
- Journal of Information Processing Systems
- /
- v.18 no.5
- /
- pp.665-676
- /
- 2022
In modern logistics, the effective use of the vehicle volume and loading capacity will reduce the logistic cost. Many heuristic algorithms can solve this knapsack problem, but lots of these algorithms have a drawback, that is, they often fall into locally optimal solutions. A fusion optimization method based on simulated annealing algorithm (SA) and binary particle swarm optimization algorithm (BPSO) is proposed in the paper. We establish a logistics knapsack model of the fusion optimization algorithm. Then, a new model of express logistics simulation system is used for comparing three algorithms. The experiment verifies the effectiveness of the algorithm proposed in this paper. The experimental results show that the use of BPSO-SA algorithm can improve the utilization rate and the load rate of logistics distribution vehicles. So, the number of vehicles used for distribution and the average driving distance will be reduced. The purposes of the logistics knapsack problem optimization are achieved.
https://doi.org/10.3745/JIPS.01.0090 인용 PDF KSCI

Compliance Level with Therapeutic Regimen of Medication and Life Style among Patients with Hypertension in Rural Communities (일 농촌지역 고혈압 환자의 치료적 요법의 이행수준 - 약물복용과 생활습관을 중심으로 -)

Ahn, Yang-Heui
- Journal of Korean Public Health Nursing
- /
- v.21 no.2
- /
- pp.125-133
- /
- 2007
Purpose: To identify the compliance level with therapeutic regimen among patients with hypertension residing in rural communities. Method: A descriptive-retrospective research design was employed. One hundred patients with hypertension using 8 Primary Health Care Posts under W Public Health Center were randomly recruited on the basis of being over 35 years of age. After obtaining written consent, the patients underwent direct interviews with a structured questionnaire carried out by 8 public health practitioners. Descriptive statistics and binary logistic regression were utilized. Results: In a binary logistic regression model adjusted for age, sex, education, income, and occupation, those who were receiving medication (OR=5.34), were undergoing a weight control program (OR=4.45), restricted alcohol (OR=9.93), or smoking cessation (OR=25.59) as recommended by medical or health professionals were more compliant (p<.05) while those under a low salt diet, exercise, and stress management were not significant statistically (p>.05). Conclusions: Further research should be conducted to validate these findings so as to facilitate the development of nursing intervention strategies for improving the compliance of hypertensive patients in respect to medication and life style modification.
PDF

Analysis of Stress level of Korean Household Members due to Household Debt (한국국민의 가계 금융부채에 대한 체감도 분석)

Oh, Man-Suk;Hyun, Seung-Me
- The Korean Journal of Applied Statistics
- /
- v.22 no.2
- /
- pp.297-307
- /
- 2009
Korean household debt is one of the main sources of the current financial crisis. This paper studies the impact of household members' attributes such as a type of housing(self-own or rent), education, age, average monthly income of the head of household, and the area of residence, on the stress level of the household members due to household debt. We analyze a real data set collected by KB Kookmin Bank in 2004. We consider low and high stress level as a binary response variable and use a logistic regression model with the attributes of household members as explanatory variables. A simple but well-fitting model is selected by backward elimination method based on the likelihood statistic for goodness-of-fit test, and the impact of the attributes on the stress level is studied from parameter estimates of the selected model. We also perform the similar analysis on a binary response variable which distinguishes households with no debt from the rest. From the analysis, the stress level tends to be low for households with self-own houses, high average monthly income, low education level, and young members.
https://doi.org/10.5351/KJAS.2009.22.2.297 인용 PDF KSCI

Characteristics and Influencing Factors of Red Light Running (RLR) Crashes (신호위반사고의 특성과 영향요인 분석)

Park, Jeong Soon;Jung, Yong Il;Kim, Yun Hwan
- Journal of Korean Society of Transportation
- /
- v.32 no.3
- /
- pp.198-206
- /
- 2014
According to the statistics of the National Police Agency, red light running (RLR) crashes represent a significant safety issue throughout Korea. This study deals with the RLR crashes occurred at signalized intersections in Cheongju. The objectives of this study are to comparatively analyze the characteristics of between RLR crashes and the Non-RLR crashes, and to find out factors using a Binary Logistic Regression(BLR) model. In pursuing the above, the study gives particular attentions to testing the differences between the above two groups with the data of 2,246 RLR/ 3,884 Non-RLR crashes (2007-2011). The main results are as follows. First, many RLR crashes were occurred in the nighttime and in going straight. Second, the difference between RLR and Non-RLR crashes were clearly defined by crash type, maneuver of vehicle before crash, age of driver (30s, 50s), alcohol use and accident pattern. Finally, a statistically significant model (Hosmer and Lemeshow test : 7.052, p-value : 0.531) was developed through the BLR model.
https://doi.org/10.7470/jkst.2014.32.3.198 인용 PDF KSCI

Comparison Study of Multi-class Classification Methods

Bae, Wha-Soo;Jeon, Gab-Dong;Seok, Kyung-Ha
- Communications for Statistical Applications and Methods
- /
- v.14 no.2
- /
- pp.377-388
- /
- 2007
As one of multi-class classification methods, ECOC (Error Correcting Output Coding) method is known to have low classification error rate. This paper aims at suggesting effective multi-class classification method (1) by comparing various encoding methods and decoding methods in ECOC method and (2) by comparing ECOC method and direct classification method. Both SVM (Support Vector Machine) and logistic regression model were used as binary classifiers in comparison.
https://doi.org/10.5351/CKSS.2007.14.2.377 인용 PDF KSCI

Search Result 163, Processing Time 0.023 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)