통합 검색 | Korea Science

Comparison of EM with Jackknife Standard Errors and Multiple Imputation Standard Errors

Kang, Shin-Soo
- Journal of the Korean Data and Information Science Society
- /
- 제16권4호
- /
- pp.1079-1086
- /
- 2005
Most discussions of single imputation methods and the EM algorithm concern point estimation of population quantities with missing values. A second concern is how to get standard errors of the point estimates obtained from the filled-in data by single imputation methods and EM algorithm. Now we focus on how to estimate standard errors with incorporating the additional uncertainty due to nonresponse. There are some approaches to account for the additional uncertainty. The general two possible approaches are considered. One is the jackknife method of resampling methods. The other is multiple imputation(MI). These two approaches are reviewed and compared through simulation studies.
PDF

Imputation Using Factor Score Regression

Lee, Sang-Eun;Hwang, Hee-Jin;Shin, Key-Il
- Communications for Statistical Applications and Methods
- /
- 제16권2호
- /
- pp.317-323
- /
- 2009
Recently not even government polices but small town decisions are based on the survey data/information, so the most of government agencies/organizations demand various sample surveys in each fields for more detail information. However in conducting the sample survey, nonresponse problem rises very often and it becomes a major issue on judging the accuracy of survey. For that matters, one solution ran be using the administration data. However unfortunately most of administration data are restricted to the common users. The other solution can be the imputation. Therefore several method, of imputation are studied in various fields. In this study, in stead of the simple regression imputation method which is commonly used, factor score regression method is applied specially to the incomplete data which have the unit and item misting values in survey data. Here for simulation study, Consumer Expenditure Surveys in Korea are used.
https://doi.org/10.5351/CKSS.2009.16.2.317 인용 PDF KSCI

Imputation Method Using Local Linear Regression Based on Bidirectional k-nearest-components

Yonggeol, Lee
- Journal of information and communication convergence engineering
- /
- 제21권1호
- /
- pp.62-67
- /
- 2023
This paper proposes an imputation method using a bidirectional k-nearest components search based local linear regression method. The bidirectional k-nearest-components search method selects components in the dynamic range from the missing points. Unlike the existing methods, which use a fixed-size window, the proposed method can flexibly select adjacent components in an imputation problem. The weight values assigned to the components around the missing points are calculated using local linear regression. The local linear regression method is free from the rank problem in a matrix of dependent variables. In addition, it can calculate the weight values that reflect the data flow in a specific environment, such as a blackout. The original missing values were estimated from a linear combination of the components and their weights. Finally, the estimated value imputes the missing values. In the experimental results, the proposed method outperformed the existing methods when the error between the original data and imputation data was measured using MAE and RMSE.
https://doi.org/10.56977/jicce.2023.21.1.62 인용 PDF

Jackknife Variance Estimation under Imputation for Nonrandom Nonresponse with Follow-ups

Park, Jinwoo
- Journal of the Korean Statistical Society
- /
- 제29권4호
- /
- pp.385-394
- /
- 2000
Jackknife variance estimation based on adjusted imputed values when nonresponse is nonrandom and follow-up data are available for a subsample of nonrespondents is provided. Both hot-deck and ratio imputation method are considered as imputation method. The performance of the proposed variance estimator under nonrandom response mechanism is investigated through numerical simulation.
PDF

표본조사에서 공간 변수(SPATIAL VARIABLE)를 이용한 결측 대체(MISSING IMPUTATION)의 효율성 비교 (Missing Imputation Methods Using the Spatial Variable in Sample Survey)

이진희;김진;이기재
- 응용통계연구
- /
- 제19권1호
- /
- pp.57-67
- /
- 2006
표본조사에서 무응답은 여러 가지 이유로 발생하며, 이 때 응답자들의 정보로만 분석을 실시한다면 편향된 결과를 산출할 수 있어 보조변수를 이 용한 많은 무응답 대체 방법들이 연구되고 있다. 만일 결측자료 대체를 위한 보조변수들이 충분하지 않고 응답자들과 무응답자들 사이에 지역적 상관관계가 존재한다면 이를 결측자료 대체(missing data imputation)에 이용 할 수 있을 것이다. 본 논문에서는 2002년 강원지역의 농가경제 자료를 예제로 하여 공간상관을 이용한 무응답 대체 방법을 살펴보았으며, 공간상관이 존재할 경우 공간 대체 방법이 효율적임을 확인하였다.
https://doi.org/10.5351/KJAS.2006.19.1.057 인용 PDF KSCI

Missing Value Imputation Technique for Water Quality Dataset

Jin-Young Jun;Youn-A Min
- 한국컴퓨터정보학회논문지
- /
- 제29권4호
- /
- pp.39-46
- /
- 2024
많은 연구자들이 다양한 모델을 이용하여 물의 수질을 평가하기 위해 노력하고 있다. 평가 모델에는 결측값이 없는 데이터셋이 필요하지만, 관측 데이터셋에는 결측값이 다수 포함되는 것이 현실이다. 단순히 결측값을 삭제하는 방법은 경우에 따라 기저 데이터의 분포를 왜곡시키고 모델의 예측성능에도 편의(bias)를 불러올 위험성이 있다. 본 연구에서는 수질 데이터의 결측값 처리에 적합한 기법을 탐색하기 위해, 기존의 KNN과 MICE Imputation, 그리고 생성형 신경망 모델인 Autoencoder와 Denoising Autoencoder를 기반으로 몇 가지 대치 기법을 실험하였다. 실험 결과, KNN과 MICE Imputation의 결과를 평균한 Combined Imputation이 실측치에 가장 가깝게 값을 추정하였으며, 이 기법을 적용하여 결측값을 처리한 관측 데이터셋을 support vector machine과 ensemble 기반의 분류 모델로 평가한 결과, 결측값을 삭제했을 때에 비해 Accuracy, F1 score, ROC-AUC score, 그리고 MCC(Mathews Correlation Coefficient) 지표가 향상되었다.
https://doi.org/10.9708/jksci.2024.29.04.039 인용 PDF HTML

K-nn을 이용한 Hot Deck 기반의 결측치 대체 (Imputation of Missing Data Based on Hot Deck Method Using K-nn)

권순창
- 한국IT서비스학회지
- /
- 제13권4호
- /
- pp.359-375
- /
- 2014
Researchers cannot avoid missing data in collecting data, because some respondents arbitrarily or non-arbitrarily do not answer questions in studies and experiments. Missing data not only increase and distort standard deviations, but also impair the convenience of estimating parameters and the reliability of research results. Despite widespread use of hot deck, researchers have not been interested in it, since it handles missing data in ambiguous ways. Hot deck can be complemented using K-nn, a method of machine learning, which can organize donor groups closest to properties of missing data. Interested in the role of k-nn, this study was conducted to impute missing data based on the hot deck method using k-nn. After setting up imputation of missing data based on hot deck using k-nn as a study objective, deletion of listwise, mean, mode, linear regression, and svm imputation were compared and verified regarding nominal and ratio data types and then, data closest to original values were obtained reasonably. Simulations using different neighboring numbers and the distance measuring method were carried out and better performance of k-nn was accomplished. In this study, imputation of hot deck was re-discovered which has failed to attract the attention of researchers. As a result, this study shall be able to help select non-parametric methods which are less likely to be affected by the structure of missing data and its causes.
https://doi.org/10.9716/KITS.2014.13.4.359 인용 PDF KSCI

Investigation of multiple imputation variance estimation

김재광
- 한국통계학회:학술대회논문집
- /
- 한국통계학회 2002년도 춘계 학술발표회 논문집
- /
- pp.183-188
- /
- 2002
Multiple imputation, proposed by Rubin, is a procedure for handling missing data. One of the attractive parts of multiple imputation is the simplicity of the variance estimation formula. Because of the simplicity, it has been often abused and misused beyond its original prescription. This paper provides the bias of the multiple imputation variance estimator for a linear point estimator and discusses when the bias can be safely neglected.
PDF

Application of SOLAS to the Multiple Imputation for Missing Data

Moon, Sung-Ho;Kim, Hyun-Jeong;Shin, Jae-Kyoung
- Journal of the Korean Data and Information Science Society
- /
- 제14권3호
- /
- pp.579-590
- /
- 2003
When we analyze incomplete data, i.e., data with missing values, we need treatment for the missing values. A common way to deal with this problem is to delete the cases with missing values. Various other methods have been developed. Among them are EM algorithm and regression algorithm which can estimate missing values and impute the missing elements with the estimated values. In this paper, we introduce multiple imputation software SOLAS which generates multiple data sets and imputes with them.
PDF

Multiple imputation for competing risks survival data via pseudo-observations

Han, Seungbong;Andrei, Adin-Cristian;Tsui, Kam-Wah
- Communications for Statistical Applications and Methods
- /
- 제25권4호
- /
- pp.385-396
- /
- 2018
Competing risks are commonly encountered in biomedical research. Regression models for competing risks data can be developed based on data routinely collected in hospitals or general practices. However, these data sets usually contain the covariate missing values. To overcome this problem, multiple imputation is often used to fit regression models under a MAR assumption. Here, we introduce a multivariate imputation in a chained equations algorithm to deal with competing risks survival data. Using pseudo-observations, we make use of the available outcome information by accommodating the competing risk structure. Lastly, we illustrate the practical advantages of our approach using simulations and two data examples from a coronary artery disease data and hepatocellular carcinoma data.
https://doi.org/10.29220/CSAM.2018.25.4.385 인용 PDF KSCI

검색결과 202건 처리시간 0.022초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)