통합 검색 | Korea Science

Two-stage imputation method to handle missing data for categorical response variable

Jong-Min Kim;Kee-Jae Lee;Seung-Joo Lee
- Communications for Statistical Applications and Methods
- /
- 제30권6호
- /
- pp.577-587
- /
- 2023
Conventional categorical data imputation techniques, such as mode imputation, often encounter issues related to overestimation. If the variable has too many categories, multinomial logistic regression imputation method may be impossible due to computational limitations. To rectify these limitations, we propose a two-stage imputation method. During the first stage, we utilize the Boruta variable selection method on the complete dataset to identify significant variables for the target categorical variable. Then, in the second stage, we use the important variables for the target categorical variable for logistic regression to impute missing data in binary variables, polytomous regression to impute missing data in categorical variables, and predictive mean matching to impute missing data in quantitative variables. Through analysis of both asymmetric and non-normal simulated and real data, we demonstrate that the two-stage imputation method outperforms imputation methods lacking variable selection, as evidenced by accuracy measures. During the analysis of real survey data, we also demonstrate that our suggested two-stage imputation method surpasses the current imputation approach in terms of accuracy.
https://doi.org/10.29220/CSAM.2023.30.6.577 인용 PDF

한국 청소년 폭음 영향 요인: 환경 변인 중심으로 (Factors Influencing Adolescent Binge Drinking: Focused on Environmental Variables)

이진화;권민;남은정
- 한국학교보건학회지
- /
- 제35권3호
- /
- pp.133-142
- /
- 2022
Purpose: The purpose of the study was to investigate the effect of the environment on adolescent binge drinking. Methods: The study was designed as a cross-sectional study. Using statistics from the 17th (20201) Korea Youth Risk Behavior Web-based Survey, the raw data target population was 2,629,588 people, and the sample group used for analysis as the final data was 54,848 people. A Rao-scott 𝑥² test and univariate multinomial logistic regression analysis were performed using IBM SPSS 27.0. Results: In the results of univariate logistic regression analysis and multivariate logistic regression analysis, common related variables were gender, school level, academic achievement, sleep satisfaction, current smoking, daily smoking, and alcohol education experience. Conclusion: As a result of confirming the factors influencing binge drinking in Korean adolescents, some variables that increase the possibility of problematic drinking behavior in the socio-environmental areas such as individuals, communities, and national policies were identified. For effective prevention and intervention, it is necessary to develop programs to build a healthy environmental support system with support from national policies, including individuals, peer groups, and communities.
https://doi.org/10.15434/kssh.2022.35.3.133 인용 PDF KSCI

심층 신경망모형을 사용한 미세먼지 PM₁₀의 예측 (Prediction of fine dust PM₁₀ using a deep neural network model)

전성현;손영숙
- 응용통계연구
- /
- 제31권2호
- /
- pp.265-285
- /
- 2018
본 연구에서는 미세먼지 $PM_{10}$의 4가지 분류 등급인 '좋음, 보통, 나쁨, 매우 나쁨' 그리고 2가지 분류 등급인 '좋음 혹은 보통, 나쁨 혹은 매우 나쁨'을 예측하기 위해서 심층 신경망모형을 사용하였다. 2010년부터 2015년까지 국내 6개 대도시 지역에서 관측한 일별 미세먼지 데이터에 대하여 기존 분류기법인 신경망모형, 다항 로지스틱 회귀모형, Support Vector Machine, Random Forest을 적용했을 때에 비해서 심층 신경망모형의 정확도는 더 높아졌다.
https://doi.org/10.5351/KJAS.2018.31.2.265 인용 PDF KSCI

센서 데이터를 이용한 전기 기관차의 이상 상태 요인분석 (Failure Analysis to Derive the Causes of Abnormal Condition of Electric Locomotive Subsystem)

소민섭;전홍배;신종호
- 산업경영시스템학회지
- /
- 제41권2호
- /
- pp.84-94
- /
- 2018
In recent years, the diminishing of operation and maintenance cost using advanced maintenance technology is attracting many companies' attention. Especially, the heavy machinery industry regards it as a crucial problem since a failure of heavy machinery requires high cost and long downtime. To improve the current maintenance process, the heavy machinery industry tries to develop a methodology to predict failure in advance and to find its causes using usage data. A better analysis of failure causes requires more data so that various kinds of sensor are attached to machines and abundant amount of product usage data is collected through the sensor network. However, the systemic analysis of the collected product usage data is still in its infant stage. Many previous works have focused on failure occurrence as statistical data for reliability analysis. There have been less works to apply product usage data into root cause analysis of product failure. The product usage data collected while failures occur should be considered failure cause analysis. To do this, this study proposes a methodology to apply product usage data into failure cause analysis. The proposed methodology in this study is composed of several steps to transform product usage into failure causes. Various statistical analysis combined with product usage data such as multinomial logistic regression, T-test, and so on are used for the root cause analysis. The proposed methodology is applied to field data coming from operated locomotive and the analysis result shows its effectiveness.
https://doi.org/10.11627/jkise.2018.41.2.084 인용 PDF KSCI

안면골 골절의 발생 인자에 대한 통계학적 분석 (Statistical Analysis of Factors Associated with Facial Bone Fractures)

서용훈;김영준
- 대한두개안면성형외과학회지
- /
- 제13권1호
- /
- pp.36-40
- /
- 2012
Purpose: Statistical analysis of facial bone fractures has been performed in various papers. However, reports on risk factors for facial bone fractures are rare. In order to prevent facial bone fractures, it is important to determine the risk factors for their occurrence. This study seeks to perform a statistical analysis on and identify the risk factors associated with facial bone fractures. Methods: A retrospective study was performed to assess facial bone fractures in patients presenting from October 2009 to January 2011 through a chart review. The data collected included age, gender, etiology, and alcohol consumption. Data was analyzed using multinomial logistic regression analysis. The significance level was set at p<0.05 and SAS ver. 9.2 was used. Results: A total of 489 patients were analyzed. The patients' age ranged from 2 to 85 years (mean age, $31.8{\pm}15.4$ years). The ratio of men to women was 5.0:1. The predominant group was age below 19 years old (30.9%). The main causes of facial bone fractures were assaults (37.8%), falls (27.2%), and sport accidents (19.5%). On multinomial logistic regression analysis, age, especially in the teen group was associated with assaults (p<0.05) resulting in facial bone fractures. Alcohol consumption was significantly associated with assaults and falls (p<0.05) leading to facial bone fractures. Conclusion: Facial bone fracture is a challenging problem, because of its high incidence and financial cost. The findings of this study indicate that more effective policies aimed at reducing alcohol intake and teenage violence are needed.
https://doi.org/10.7181/acfs.2012.13.1.36 인용 PDF

관광객 특성에 따른 어촌체험프로그램 선택의 영향력 분석 (A Study on Influence of Fishing Villages Experience Program Choice by the Tourist Characteristics)

이서구;최규철;김정태
- 농촌계획
- /
- 제26권3호
- /
- pp.1-12
- /
- 2020
The purpose of this study is to analysis the influence of fishing villages experience programs choice by the tourist characteristics. As an analysis method, a statistical technique of multinomial logistic regression was used. The dependent variable have typified about 70 fishing experience programs, such as tidal-flat experience, fishery experience, and fishing experience, operated by the fishing village experience recreation villages into 9 programs. The independent variables consisted of 7 groups of people: gender, age, marital status, presence of children, experience of visiting a village in a rural and fishing village experience, preference of a village in a recreational experience, and recognition of a village in a fishing village experience. As a result of analysis, no significant differences were found that the selection group preferring 'fishing culture experience', 'leports experience', 'ecological craft experience', and 'festival and event experience' in the selection of fishing village experience program compared to the group choosing 'rural experience'. On the other hand, the group preferring 'tidal flat experience' analysis that 'married' is about 14 times higher than 'unmarried', and the group preferring 'fishing village experience' is 9.55 times higher than the group preferring 'rural village experience'. In the group preferring 'fishery experience' and 'fishing experience', the group preferring 'fishing experience recreation village' was 9.21 times and 14.34 times higher than the group preferring 'rural experience recreation village'. In the 'food experience', 'married' was 25 times higher than 'unmarried'.
https://doi.org/10.7851/Ksrp.2020.26.3.001 인용 PDF KSCI

1인가구의 주관적 건강상태 변화: 잠재계층성장모형을 활용하여 (Trajectories of Self-rated Health among One-person Households: A Latent Class Growth Analysis)

김은주;김향;윤주영
- 지역사회간호학회지
- /
- 제30권4호
- /
- pp.449-459
- /
- 2019
Purpose: The aim of this study is to explore different types of self-rated health trajectories among one-person households in Korea. Methods: We used five time-point data derived from Korea Health Panel (2011~2015). A latent growth curve modeling was used to assess the overall feature of self-rated health trajectory in one-person households, and a latent class growth modeling was used to determine the number and shape of trajectories. We then applied multinomial logistic regression on each class to explore the predicting variables. Results: We found that the overall slope of self-rated health in one-person households decreases. In addition, latent class analysis demonstrated three classes: 1) High-Decreasing class (i.e., high intercept, significantly decreasing slope), 2) Moderate-Decreasing class (i.e., average intercept, significantly decreasing slope), and 3) Low-Stable class (i.e., low intercept, flat and nonsignificant slope). The multinomial logistic regression analysis showed that the predictors of each class were different. Especially, one-person households with poor health condition early were at greater risk of being Low-Stable class compared with High-Decreasing class group. Conclusion: The findings of this study demonstrate that more attentions to one-person households are needed to promote their health status. Policymakers may develop different health and welfare programs depending on different characteristics of one-person household trajectory groups in Korea.
https://doi.org/10.12799/jkachn.2019.30.4.449 인용 PDF KSCI

Safety Attitudes among Vietnamese Medical Staff in a Vietnam Disadvantaged Area: Latent Class Analysis

Thang Huu Nguyen;Thanh Hai Pham;Hue Thi Vu;Minh-Nguyet Thi Doan;Huong Thanh Tran;Mai Phuong Nguyen
- 한국의료질향상학회지
- /
- 제30권1호
- /
- pp.3-14
- /
- 2024
Purpose: We conducted this study with the aim of characterizing safety attitudes (SA) among medical staff in a disadvantaged area of Vietnam and examining associated factors with SA. Methods: A cross-sectional survey was conducted on 442 health staff members at four hospitals in Son La Province from June until August 2021. We used the Vietnamese shortened edition of the Safety Attitudes Questionnaire to measure the SA of study participations. We chose latent class analysis (LCA) to identifying the number of latent classes of SA among the study subjects. Multinomial logistic regression was used to examine factors associated with the identified SA classes. Results: The results of our LCA showed that there were three latent classes, namely high SA group (n=150, 33.9%), moderate SA group (n=236, 53.4%), and low SA group (n=56, 12.7%). The multinomial logistic regression analysis found that medical staff who had university education and above, who were nurses, and who served in non-clinical areas were more likely to be in the moderate SA group and in the high SA group than in the low SA group. Conclusion: Based on these results, several recommendations could be made to improve the SA of healthcare workers in disadvantaged areas. Further research with larger sample sizes and more diverse populations is needed to confirm these findings and to develop effective interventions to improve the SA of healthcare workers in disadvantaged areas.
https://doi.org/10.14371/QIH.2024.30.1.3 인용 PDF

청소년 우울 증상의 변화 궤적에 따른 잠재계층유형 및 영향요인 (Latent Classes of Depressive Symptom Trajectories of Adolescents and Determinants of Classes)

김은주
- 지역사회간호학회지
- /
- 제33권3호
- /
- pp.299-311
- /
- 2022
Purpose: Untreated depression in adolescents affects their entire life. It is important to detect and intervene early depression in adolescence considering the characteristics of adolescent's depressive symptoms accompanied by internalization and externalization. The aim of this study was to identify latent classes of depressive symptom trajectories of adolescents and determinants of classes in Korea. Methods: The three time-point (2018~2020) data derived from the Korean Children and Youth Panel Survey 2018 were used (N=2,325). Latent Growth Curve Modeling (LGCM) was conducted to explore the depressive symptom trajectories in all adolescents, and Latent Class Growth Modeling (LCGM) was conducted to identify each latent class. Multinomial logistic regression analysis was performed to confirm the determinants of each latent class. Results: The LGCM results showed that there was no statistically significant change in all adolescents' depressive symptoms for 3 years. However, the LCGM results showed that four latent classes showing different trajectories were distinguished: 1) Low-stable (intercept=14.39, non-significant slope), 2) moderate-increasing (intercept=19.62, significantly increasing slope), 3) high-stable (intercept=26.30, non-significant slope), and 4) high-rapidly decreasing (intercept=26.34, significantly rapidly decreasing slope). The multinomial logistic regression analysis showed that the significant determinants (i.e., gender, self-esteem, aggression, somatization, peer relationship) of each latent class were different. Conclusion: When screening adolescent's depression, it is necessary to monitor not only direct depression symptoms but also self-esteem, aggression, somatization symptoms, and peer relationships. The findings of this study may be valuable for nurses and policy makers to develop mental health programs for adolescents.
https://doi.org/10.12799/jkachn.2022.33.3.299 인용 PDF KSCI

Ranking subjects based on paired compositional data with application to age-related hearing loss subtyping

Nam, Jin Hyun;Khatiwada, Aastha;Matthews, Lois J.;Schulte, Bradley A.;Dubno, Judy R.;Chung, Dongjun
- Communications for Statistical Applications and Methods
- /
- 제27권2호
- /
- pp.225-239
- /
- 2020
Analysis approaches for single compositional data are well established; however, effective analysis strategies for paired compositional data remain to be investigated. The current project was motivated by studies of age-related hearing loss (presbyacusis), where subjects are classified into four audiometric phenotypes that need to be ranked within these phenotypes based on their paired compositional data. We address this challenge by formulating this problem as a classification problem and integrating a penalized multinomial logistic regression model with compositional data analysis approaches. We utilize Elastic Net for a penalty function, while considering average, absolute difference, and perturbation operators for compositional data. We applied the proposed approach to the presbyacusis study of 532 subjects with probabilities that each ear of a subject belongs to each of four presbyacusis subtypes. We further investigated the ranking of presbyacusis subjects using the proposed approach based on previous literature. The data analysis results indicate that the proposed approach is effective for ranking subjects based on paired compositional data.
https://doi.org/10.29220/CSAM.2020.27.2.225 인용 PDF KSCI

검색결과 133건 처리시간 0.028초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)