• Title/Summary/Keyword: nonignorable

Search Result 17, Processing Time 0.018 seconds

Analysis of Incomplete Data with Nonignorable Missing Values

  • Kim, Hyun-Jeong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.13 no.2
    • /
    • pp.167-174
    • /
    • 2002
  • In the case of "nonignorable missing data", it is necessary to assume a model dealing with the missing on each situations. In this article, for example, we sometimes meet situations where data set are income amounts in a survey of individuals and assume a model as the values are the larger, a missing data probability is the higher. The method is to maximize using the EM(Expectation and Maximization) algorithm based on the (missing data) mechanism that creates missing data of the case of exponential distribution. The method started from any initial values, and converged in a few iterations. We changed the missing data probability and the artificial data size to show the estimated accuracy. Then we discuss the properties of estimates.

  • PDF

BAYES EMPIRICAL BAYES ESTIMATION OF A PROPORT10N UNDER NONIGNORABLE NONRESPONSE

  • Choi, Jai-Won;Nandram, Balgobin
    • Journal of the Korean Statistical Society
    • /
    • v.32 no.2
    • /
    • pp.121-150
    • /
    • 2003
  • The National Health Interview Survey (NHIS) is one of the surveys used to assess the health status of the US population. One indicator of the nation's health is the total number of doctor visits made by the household members in the past year, There is a substantial nonresponse among the sampled households, and the main issue we address here is that the nonrespones mechanism should not be ignored because respondents and nonrespondents differ. It is standard practice to summarize the number of doctor visits by the binary variable of no doctor visit versus at least one doctor visit by a household for each of the fifty states and the District of Columbia. We consider a nonignorable nonresponse model that expresses uncertainty about ignorability through the ratio of odds of a household doctor visit among respondents to the odds of doctor visit among all households. This is a hierarchical model in which a nonignorable nonresponse model is centered on an ignorable nonresponse model. Another feature of this model is that it permits us to "borrow strength" across states as in small area estimation; this helps because some of the parameters are weakly identified. However, for simplicity we assume that the hyperparameters are fixed but unknown, and these hyperparameters are estimated by the EM algorithm; thereby making our method Bayes empirical Bayes. Our main result is that for some of the states the nonresponse mechanism can be considered non-ignorable, and that 95% credible intervals of the probability of a household doctor visit and the probability that a household responds shed important light on the NHIS.

Bias-corrected imputation method for non-ignorable nonresponse with heteroscedasticity in super-population model (초모집단 모형의 오차가 이분산일 때 무시할 수 없는 무응답에서 편향수정 무응답 대체)

  • Yujin Lee;Key-Il Shin
    • The Korean Journal of Applied Statistics
    • /
    • v.37 no.3
    • /
    • pp.283-295
    • /
    • 2024
  • Many studies have been conducted to properly handle nonresponse. Recently, many nonresponse imputation methods have been developed and practically used. Most imputation methods assume MCAR (missing completely at random) or MAR (missing at random). On the contrary, there are relatively few studies on imputation under the assumption of MNAR (missing not at random) or NN (nonignorable nonresponse) that are affected by the study variable. The MNAR causes Bias and reduces the accuracy of imputation whenever response probability is not properly estimated. Lee and Shin (2022) proposed a nonresponse imputation method that can be applied to nonignorable nonresponse assuming homoscedasticity in super-population model. In this paper we propose an generalized version of the imputation method proposed by Lee and Shin (2022) to improve the accuracy of estimation by removing the Bias caused by MNAR under heteroscedasticity. In addition, the superiority of the proposed method is confirmed through simulation studies.

Banded vector heterogeneous autoregression models (밴드구조 VHAR 모형)

  • Sangtae Kim;Changryong Baek
    • The Korean Journal of Applied Statistics
    • /
    • v.36 no.6
    • /
    • pp.529-545
    • /
    • 2023
  • This paper introduces the Banded-VHAR model suitable for high-dimensional long-memory time series with band structure. The Banded-VHAR model has nonignorable correlations only with adjacent dimensions due to data features, for example, geographical information. Row-wise estimation method is adapted for fast computation. Also, two estimation methods, namely BIC and ratio methods, are proposed to estimate the width of band. We demonstrate asymptotic consistency of our proposed estimation methods through simulation study. Real data applications to pm2.5 and apartment trading volume substantiate that our Banded-VHAR model outperforms traditional sparse VHAR model in forecasting and easy to interpret model coefficients.

Comparison of Trend Tests for Genetic Association on Censored Ages of Onset (미완결 발병연령에 근거한 연관성 추세 검정법의 비교)

  • Yoon, Hye-Kyoung;Song, Hae-Hiang
    • The Korean Journal of Applied Statistics
    • /
    • v.21 no.6
    • /
    • pp.933-945
    • /
    • 2008
  • The genetic association test on age of onset trait aims to detect the putative gene by means of linear rank tests for a significant trend of onset distributions with genotypes. However, due to the selective sampling of recruiting subjects with ages less than a pre-specified limit, the genotype groups are subject to substantially different censored distributions and thus this is one reason for the low efficiencies in the linear rank tests. In testing the equality of two survival distributions, log-rank statistic is preferred to the Wilcoxon statistic, when censored observations are nonignorable. Therefore, for more then two groups, we propose a generalized log-rank test for trend as a genetic association test. Monte Carlo studies are conducted to investigate the performances of the test statistics examined in this paper.

An Approach to Survey Data with Nonresponse: Evaluation of KEPEC Data with BMI (무응답이 있는 설문조사연구의 접근법 : 한국노인약물역학코호트 자료의 평가)

  • Baek, Ji-Eun;Kang, Wee-Chang;Lee, Young-Jo;Park, Byung-Joo
    • Journal of Preventive Medicine and Public Health
    • /
    • v.35 no.2
    • /
    • pp.136-140
    • /
    • 2002
  • Objectives : A common problem with analyzing survey data involves incomplete data with either a nonresponse or missing data. The mail questionnaire survey conducted for collecting lifestyle variables on the members of the Korean Elderly Phamacoepidemiologic Cohort(KEPEC) in 1996 contains some nonresponse or missing data. The proper statistical method was applied to evaluate the missing pattern of a specific KEPEC data, which had no missing data in the independent variable and missing data in the response variable, BMI. Methods : The number of study subjects was 8,689 elderly people. Initially, the BMI and significant variables that influenced the BMI were categorized. After fitting the log-linear model, the probabilities of the people on each category were estimated. The EM algorithm was implemented using a log-linear model to determine the missing mechanism causing the nonresponse. Results : Age, smoking status, and a preference of spicy hot food were chosen as variables that influenced the BMI. As a result of fitting the nonignorable and ignorable nonresponse log-linear model considering these variables, the difference in the deviance in these two models was 0.0034(df=1). Conclusion : There is a lot of risk if an inference regarding the variables and large samples is made without considering the pattern of missing data. On the basis of these results, the missing data occurring in the BMI is the ignorable nonresponse. Therefore, when analyzing the BMI in KEPEC data, the inference can be made about the data without considering the missing data.