Search | Korea Science

A Comparative Study of Microarray Data with Survival Times Based on Several Missing Mechanism

Kim Jee-Yun;Hwang Jin-Soo;Kim Seong-Sun
- Communications for Statistical Applications and Methods
- /
- v.13 no.1
- /
- pp.101-111
- /
- 2006
One of the most widely used method of handling missingness in microarray data is the kNN(k Nearest Neighborhood) method. Recently Li and Gui (2004) suggested, so called PCR(Partial Cox Regression) method which deals with censored survival times and microarray data efficiently via kNN imputation method. In this article, we try to show that the way to treat missingness eventually affects the further statistical analysis.
https://doi.org/10.5351/CKSS.2006.13.1.101 인용 PDF KSCI

Bayesian Pattern Mixture Model for Longitudinal Binary Data with Nonignorable Missingness

Kyoung, Yujung;Lee, Keunbaik
- Communications for Statistical Applications and Methods
- /
- v.22 no.6
- /
- pp.589-598
- /
- 2015
In longitudinal studies missing data are common and require a complicated analysis. There are two popular modeling frameworks, pattern mixture model (PMM) and selection models (SM) to analyze the missing data. We focus on the PMM and we also propose Bayesian pattern mixture models using generalized linear mixed models (GLMMs) for longitudinal binary data. Sensitivity analysis is used under the missing not at random assumption.
https://doi.org/10.5351/CSAM.2015.22.6.589 인용 PDF KSCI

A Bayesian uncertainty analysis for nonignorable nonresponse in two-way contingency table

Woo, Namkyo;Kim, Dal Ho
- Journal of the Korean Data and Information Science Society
- /
- v.26 no.6
- /
- pp.1547-1555
- /
- 2015
We study the problem of nonignorable nonresponse in a two-way contingency table and there may be one or two missing categories. We describe a nonignorable nonresponse model for the analysis of two-way categorical table. One approach to analyze these data is to construct several tables (one complete and the others incomplete). There are nonidentifiable parameters in incomplete tables. We describe a hierarchical Bayesian model to analyze two-way categorical data. We use a nonignorable nonresponse model with Bayesian uncertainty analysis by placing priors in nonidentifiable parameters instead of a sensitivity analysis for nonidentifiable parameters. To reduce the effects of nonidentifiable parameters, we project the parameters to a lower dimensional space and we allow the reduced set of parameters to share a common distribution. We use the griddy Gibbs sampler to fit our models and compute DIC and BPP for model diagnostics. We illustrate our method using data from NHANES III data to obtain the finite population proportions.
https://doi.org/10.7465/jkdi.2015.26.6.1547 인용 PDF KSCI

Comparison of tree-based ensemble models for regression

Park, Sangho;Kim, Chanmin
- Communications for Statistical Applications and Methods
- /
- v.29 no.5
- /
- pp.561-589
- /
- 2022
When multiple classifications and regression trees are combined, tree-based ensemble models, such as random forest (RF) and Bayesian additive regression trees (BART), are produced. We compare the model structures and performances of various ensemble models for regression settings in this study. RF learns bootstrapped samples and selects a splitting variable from predictors gathered at each node. The BART model is specified as the sum of trees and is calculated using the Bayesian backfitting algorithm. Throughout the extensive simulation studies, the strengths and drawbacks of the two methods in the presence of missing data, high-dimensional data, or highly correlated data are investigated. In the presence of missing data, BART performs well in general, whereas RF provides adequate coverage. The BART outperforms in high dimensional, highly correlated data. However, in all of the scenarios considered, the RF has a shorter computation time. The performance of the two methods is also compared using two real data sets that represent the aforementioned situations, and the same conclusion is reached.
https://doi.org/10.29220/CSAM.2022.29.5.561 인용 PDF KSCI

Methods for Handling Incomplete Repeated Measures Data (불완전한 반복측정 자료의 보정방법)

Woo, Hae-Bong;Yoon, In-Jin
- Survey Research
- /
- v.9 no.2
- /
- pp.1-27
- /
- 2008
Problems of incomplete data are pervasive in statistical analysis. In particular, incomplete data have been an important challenge in repeated measures studies. The objective of this study is to give a brief introduction to missing data mechanisms and conventional/recent missing data methods and to assess the performance of various missing data methods under ignorable and non-ignorable missingness mechanisms. Given the inadequate attention to longitudinal studies with missing data, this study applied recent advances in missing data methods to repeated measures models and investigated the performance of various missing data methods, such as FIML (Full Information Maximum Likelihood Estimation) and MICE(Multivariate Imputation by Chained Equations), under MCAR, MAR, and MNAR mechanisms. Overall, the results showed that listwise deletion and mean imputation performed poorly compared to other recommended missing data procedures. The better performance of EM, FIML, and MICE was more noticeable under MAR compared to MCAR. With the non-ignorable missing data, this study showed that missing data methods did not perform well. In particular, this problem was noticeable in slope-related estimates. Therefore, this study suggests that if missing data are suspected to be non-ignorable, developmental research may underestimate true rates of change over the life course. This study also suggests that bias from non-ignorable missing data can be substantially reduced by considering rich information from variables related to missingness.
PDF

Filling in Hydrological Missing Data Using Imputation Methods (Imputation Method를 활용한 수문 결측자료의 보정)

Kang, Tae-Ho;Hong, Il-Pyo;Km, Young-Oh
- Proceedings of the Korea Water Resources Association Conference
- /
- 2009.05a
- /
- pp.1254-1259
- /
- 2009
과거 관측된 수문자료는 분석을 통해 다양한 수문모형의 평가 및 예측과 수자원 정책결정에서 활용된다. 하지만 관측장비의 오작동 및 관측범위의 한계에 의해 수집된 자료에는 결측이 존재한다. 단순히 결측이 존재하는 벡터를 제외하거나, 결측이 존재하는 자료 구간에 선형성이 존재한다는 가정 하에 평균을 활용하기도 했으나, 이로 인하여 자료의 통계특성에 왜곡이 야기될 수 있다. 본 연구는 결측의 보정으로 자료가 보유하는 정보의 손실 및 왜곡을 최소화 할 수 있는 방안을 연구하고자 한다. 자료의 결측은 크게 완벽한 무작위 결측(missing completely at random, MCAR), 무작위 결측(missing at random, MAR), 무작위성이 없는 결측(nonrandom missingness)으로 분류되며, 수문자료는 결측을 포함한 기간이 그 외 기간의 자료와 통계적으로 동일하지는 않지만 결측자료의 추정이 가능한 MAR에 속하는 것이 일반적이므로 이를 가정으로 결측을 보정하였다. Local Lest Squares Imputation(LLSimput)을 결측의 추정을 위해 사용하였으며, 기존에 쉽게 사용되던 선형보간법과 비교하였다. 적용성 평가를 위해 소양강댐 일 유입량 자료에 1 - 5 %의 결측자료를 임의로 생성하였다. 동일한 양의 결측자료에 대해 100개의 셋을 사용하여 보정의 불확실성 범위를 적용된 방법에 대해 비교..평가하였으며, 결측 증가에 따른 보정효과의 변화를 검토하였다. Normalized Root Mean Squared Error(NRMSE)를 사용하여 적용된 두 방법을 평가한 결과, (1) 결측자료의 비가 낮을수록 간단한 선형보간법을 사용한 보정이 효과적이었다. (2) 하지만 결측의 비가 증가할수록 선형보간법의 보정효과는 점차 큰 불확실성과 낮은 보정효과를 보인 반면, (3) LLSimpute는 결측의 증가에 관계없이 일정한 보정효과 및 불확실성 범위를 나타내는 것으로 드러났다.
PDF

Maximum a posteriori estimation based wind fragility analysis with application to existing linear or hysteretic shear frames

Wang, Vincent Z.;Ginger, John D.
- Structural Engineering and Mechanics
- /
- v.50 no.5
- /
- pp.653-664
- /
- 2014
Wind fragility analysis provides a quantitative instrument for delineating the safety performance of civil structures under hazardous wind loading conditions such as cyclones and tornados. It has attracted and would be expected to continue to attract intensive research spotlight particularly in the nowadays worldwide context of adapting to the changing climate. One of the challenges encumbering efficacious assessment of the safety performance of existing civil structures is the possible incompleteness of the structural appraisal data. Addressing the issue of the data missingness, the study presented in this paper forms a first attempt to investigate the feasibility of using the expectation-maximization (EM) algorithm and Bayesian techniques to predict the wind fragilities of existing civil structures. Numerical examples of typical linear or hysteretic shear frames are introduced with the wind loads derived from a widely used power spectral density function. Specifically, the application of the maximum a posteriori estimates of the distribution parameters for the story stiffness is examined, and a surrogate model is developed and applied to facilitate the nonlinear response computation when studying the fragilities of the hysteretic shear frame involved.
https://doi.org/10.12989/sem.2014.50.5.653 인용 KSCI

Development of a Machine Learning Model for Imputing Time Series Data with Massive Missing Values (결측치 비율이 높은 시계열 데이터 분석 및 예측을 위한 머신러닝 모델 구축)

Bangwon Ko;Yong Hee Han
- The Journal of Korea Institute of Information, Electronics, and Communication Technology
- /
- v.17 no.3
- /
- pp.176-182
- /
- 2024
In this study, we compared and analyzed various methods of missing data handling to build a machine learning model that can effectively analyze and predict time series data with a high percentage of missing values. For this purpose, Predictive State Model Filtering (PSMF), MissForest, and Imputation By Feature Importance (IBFI) methods were applied, and their prediction performance was evaluated using LightGBM, XGBoost, and Explainable Boosting Machines (EBM) machine learning models. The results of the study showed that MissForest and IBFI performed the best among the methods for handling missing values, reflecting the nonlinear data patterns, and that XGBoost and EBM models performed better than LightGBM. This study emphasizes the importance of combining nonlinear imputation methods and machine learning models in the analysis and prediction of time series data with a high percentage of missing values, and provides a practical methodology.
https://doi.org/10.17661/jkiiect.2024.17.3.176 인용 PDF HTML

Search Result 8, Processing Time 0.018 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)