Search | Korea Science

Analysis of Incomplete Data with Nonignorable Missing Values

Kim, Hyun-Jeong
- Journal of the Korean Data and Information Science Society
- /
- v.13 no.2
- /
- pp.167-174
- /
- 2002
In the case of "nonignorable missing data", it is necessary to assume a model dealing with the missing on each situations. In this article, for example, we sometimes meet situations where data set are income amounts in a survey of individuals and assume a model as the values are the larger, a missing data probability is the higher. The method is to maximize using the EM(Expectation and Maximization) algorithm based on the (missing data) mechanism that creates missing data of the case of exponential distribution. The method started from any initial values, and converged in a few iterations. We changed the missing data probability and the artificial data size to show the estimated accuracy. Then we discuss the properties of estimates.
PDF

BAYES EMPIRICAL BAYES ESTIMATION OF A PROPORT10N UNDER NONIGNORABLE NONRESPONSE

Choi, Jai-Won;Nandram, Balgobin
- Journal of the Korean Statistical Society
- /
- v.32 no.2
- /
- pp.121-150
- /
- 2003
The National Health Interview Survey (NHIS) is one of the surveys used to assess the health status of the US population. One indicator of the nation's health is the total number of doctor visits made by the household members in the past year, There is a substantial nonresponse among the sampled households, and the main issue we address here is that the nonrespones mechanism should not be ignored because respondents and nonrespondents differ. It is standard practice to summarize the number of doctor visits by the binary variable of no doctor visit versus at least one doctor visit by a household for each of the fifty states and the District of Columbia. We consider a nonignorable nonresponse model that expresses uncertainty about ignorability through the ratio of odds of a household doctor visit among respondents to the odds of doctor visit among all households. This is a hierarchical model in which a nonignorable nonresponse model is centered on an ignorable nonresponse model. Another feature of this model is that it permits us to "borrow strength" across states as in small area estimation; this helps because some of the parameters are weakly identified. However, for simplicity we assume that the hyperparameters are fixed but unknown, and these hyperparameters are estimated by the EM algorithm; thereby making our method Bayes empirical Bayes. Our main result is that for some of the states the nonresponse mechanism can be considered non-ignorable, and that 95% credible intervals of the probability of a household doctor visit and the probability that a household responds shed important light on the NHIS.
PDF KSCI

Fitting Hierarchical Generalized Linear Model for Nonignorable Drop-out Data in Longitudinal Study (경시적 연구에서 무시할 수 없는 중도탈락 자료의 분석을 위한 다단계 일반화 선형모형 적합)

Noh Maeng-Seok;Lee Moo-Song;Kang Wee-Chang;Ha Il-Do;Lee Youn-Gjo
- 대한예방의학회:학술대회논문집
- /
- 2002.10a
- /
- pp.197-198
- /
- 2002
PDF

Bias-corrected imputation method for non-ignorable nonresponse with heteroscedasticity in super-population model (초모집단 모형의 오차가 이분산일 때 무시할 수 없는 무응답에서 편향수정 무응답 대체)

Yujin Lee;Key-Il Shin
- The Korean Journal of Applied Statistics
- /
- v.37 no.3
- /
- pp.283-295
- /
- 2024
Many studies have been conducted to properly handle nonresponse. Recently, many nonresponse imputation methods have been developed and practically used. Most imputation methods assume MCAR (missing completely at random) or MAR (missing at random). On the contrary, there are relatively few studies on imputation under the assumption of MNAR (missing not at random) or NN (nonignorable nonresponse) that are affected by the study variable. The MNAR causes Bias and reduces the accuracy of imputation whenever response probability is not properly estimated. Lee and Shin (2022) proposed a nonresponse imputation method that can be applied to nonignorable nonresponse assuming homoscedasticity in super-population model. In this paper we propose an generalized version of the imputation method proposed by Lee and Shin (2022) to improve the accuracy of estimation by removing the Bias caused by MNAR under heteroscedasticity. In addition, the superiority of the proposed method is confirmed through simulation studies.
https://doi.org/10.5351/KJAS.2024.37.3.283 인용 PDF

Banded vector heterogeneous autoregression models (밴드구조 VHAR 모형)

Sangtae Kim;Changryong Baek
- The Korean Journal of Applied Statistics
- /
- v.36 no.6
- /
- pp.529-545
- /
- 2023
This paper introduces the Banded-VHAR model suitable for high-dimensional long-memory time series with band structure. The Banded-VHAR model has nonignorable correlations only with adjacent dimensions due to data features, for example, geographical information. Row-wise estimation method is adapted for fast computation. Also, two estimation methods, namely BIC and ratio methods, are proposed to estimate the width of band. We demonstrate asymptotic consistency of our proposed estimation methods through simulation study. Real data applications to pm2.5 and apartment trading volume substantiate that our Banded-VHAR model outperforms traditional sparse VHAR model in forecasting and easy to interpret model coefficients.
https://doi.org/10.5351/KJAS.2023.36.6.529 인용 PDF

Comparison of Trend Tests for Genetic Association on Censored Ages of Onset (미완결 발병연령에 근거한 연관성 추세 검정법의 비교)

Yoon, Hye-Kyoung;Song, Hae-Hiang
- The Korean Journal of Applied Statistics
- /
- v.21 no.6
- /
- pp.933-945
- /
- 2008
The genetic association test on age of onset trait aims to detect the putative gene by means of linear rank tests for a significant trend of onset distributions with genotypes. However, due to the selective sampling of recruiting subjects with ages less than a pre-specified limit, the genotype groups are subject to substantially different censored distributions and thus this is one reason for the low efficiencies in the linear rank tests. In testing the equality of two survival distributions, log-rank statistic is preferred to the Wilcoxon statistic, when censored observations are nonignorable. Therefore, for more then two groups, we propose a generalized log-rank test for trend as a genetic association test. Monte Carlo studies are conducted to investigate the performances of the test statistics examined in this paper.
https://doi.org/10.5351/KJAS.2008.21.6.933 인용 PDF KSCI

An Approach to Survey Data with Nonresponse: Evaluation of KEPEC Data with BMI (무응답이 있는 설문조사연구의 접근법 : 한국노인약물역학코호트 자료의 평가)

Baek, Ji-Eun;Kang, Wee-Chang;Lee, Young-Jo;Park, Byung-Joo
- Journal of Preventive Medicine and Public Health
- /
- v.35 no.2
- /
- pp.136-140
- /
- 2002
Objectives : A common problem with analyzing survey data involves incomplete data with either a nonresponse or missing data. The mail questionnaire survey conducted for collecting lifestyle variables on the members of the Korean Elderly Phamacoepidemiologic Cohort(KEPEC) in 1996 contains some nonresponse or missing data. The proper statistical method was applied to evaluate the missing pattern of a specific KEPEC data, which had no missing data in the independent variable and missing data in the response variable, BMI. Methods : The number of study subjects was 8,689 elderly people. Initially, the BMI and significant variables that influenced the BMI were categorized. After fitting the log-linear model, the probabilities of the people on each category were estimated. The EM algorithm was implemented using a log-linear model to determine the missing mechanism causing the nonresponse. Results : Age, smoking status, and a preference of spicy hot food were chosen as variables that influenced the BMI. As a result of fitting the nonignorable and ignorable nonresponse log-linear model considering these variables, the difference in the deviance in these two models was 0.0034(df=1). Conclusion : There is a lot of risk if an inference regarding the variables and large samples is made without considering the pattern of missing data. On the basis of these results, the missing data occurring in the BMI is the ignorable nonresponse. Therefore, when analyzing the BMI in KEPEC data, the inference can be made about the data without considering the missing data.
PDF KSCI

Search Result 17, Processing Time 0.018 seconds

Analysis of Incomplete Data with Nonignorable Missing Values

BAYES EMPIRICAL BAYES ESTIMATION OF A PROPORT10N UNDER NONIGNORABLE NONRESPONSE

Fitting Hierarchical Generalized Linear Model for Nonignorable Drop-out Data in Longitudinal Study (경시적 연구에서 무시할 수 없는 중도탈락 자료의 분석을 위한 다단계 일반화 선형모형 적합)

Bias-corrected imputation method for non-ignorable nonresponse with heteroscedasticity in super-population model (초모집단 모형의 오차가 이분산일 때 무시할 수 없는 무응답에서 편향수정 무응답 대체)

Banded vector heterogeneous autoregression models (밴드구조 VHAR 모형)

Comparison of Trend Tests for Genetic Association on Censored Ages of Onset (미완결 발병연령에 근거한 연관성 추세 검정법의 비교)

An Approach to Survey Data with Nonresponse: Evaluation of KEPEC Data with BMI (무응답이 있는 설문조사연구의 접근법 : 한국노인약물역학코호트 자료의 평가)

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)