• Title/Summary/Keyword: Imputing

Search Result 22, Processing Time 0.02 seconds

A Study on Estimating Mean Lifetime After Modifying Censored Observations

  • Kim, Jinh-eum;Kim, Jee-hoon
    • Journal of Korean Society for Quality Management
    • /
    • v.26 no.1
    • /
    • pp.161-171
    • /
    • 1998
  • Kim and Kim (1997) developed a method of estimating the mean lifetime based on the augmented data after imputing censored observations. Assuming the linear relationship between lifetime and covariates, and then introducing the procedure of Buckley and James (1979) to estimate the mean lifetimes of censored observations, they proposed a mean lifetime estimator and its consistency under the regularity conditions. In this article, the Kim and Kim's estimator is compared with the estimator introduced by Gill (1983) through simulations under the various configurations. Also, their estimator is illustrated with two real data sets.

  • PDF

Development of a Machine Learning Model for Imputing Time Series Data with Massive Missing Values (결측치 비율이 높은 시계열 데이터 분석 및 예측을 위한 머신러닝 모델 구축)

  • Bangwon Ko;Yong Hee Han
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.17 no.3
    • /
    • pp.176-182
    • /
    • 2024
  • In this study, we compared and analyzed various methods of missing data handling to build a machine learning model that can effectively analyze and predict time series data with a high percentage of missing values. For this purpose, Predictive State Model Filtering (PSMF), MissForest, and Imputation By Feature Importance (IBFI) methods were applied, and their prediction performance was evaluated using LightGBM, XGBoost, and Explainable Boosting Machines (EBM) machine learning models. The results of the study showed that MissForest and IBFI performed the best among the methods for handling missing values, reflecting the nonlinear data patterns, and that XGBoost and EBM models performed better than LightGBM. This study emphasizes the importance of combining nonlinear imputation methods and machine learning models in the analysis and prediction of time series data with a high percentage of missing values, and provides a practical methodology.

The political issue on women's unpaid work I : Imputing the Value of Household Work (가사노동의 정책과정 개발에 대한 연구 I :가사노동의 측정을 위한 제안)

  • 문숙재
    • Journal of the Korean Home Economics Association
    • /
    • v.36 no.4
    • /
    • pp.35-48
    • /
    • 1998
  • The imputation of monetary value of women's contribution to the informal economy for inclusion in satellite accounts to the formal System of National Accounts has been attempted along many methods. This is bases on official laborforce statistics and time-use survey. In this statistical system, household work is not an economic activity(or productive labor). Also, the clssification of activities involved in household work is different from that of sampling survey relating evaluation. The measurement of women's unpaid work is one of the important tasks for the improvement of women's status and the establishment of a development policy. To measure unpaid work in the economic terms, we should take following measures; 1) develop satellite or other official accouts to measure unpaid work outside national accounts. 2) conduct a nation-wide time-use survey to measure the unpaid work. 3) develp a proper classificaition of activities for time-use statistics. 4) reexamine the minimum time criterion. 5) determine a proper method of valuing along the law system.

  • PDF

A modified partial least squares regression for the analysis of gene expression data with survival information

  • Lee, So-Yoon;Huh, Myung-Hoe;Park, Mira
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.5
    • /
    • pp.1151-1160
    • /
    • 2014
  • In DNA microarray studies, the number of genes far exceeds the number of samples and the gene expression measures are highly correlated. Partial least squares regression (PLSR) is one of the popular methods for dimensional reduction and known to be useful for the classifications of microarray data by several studies. In this study, we suggest a modified version of the partial least squares regression to analyze gene expression data with survival information. The method is designed as a new gene selection method using PLSR with an iterative procedure of imputing censored survival time. Mean square error of prediction criterion is used to determine the dimension of the model. To visualize the data, plot for variables superimposed with samples are used. The method is applied to two microarray data sets, both containing survival time. The results show that the proposed method works well for interpreting gene expression microarray data.

Modified BLS Weight Adjustment (수정된 BLS 가중치보정법)

  • Park, Jung-Joon;Cho, Ki-Jong;Lee, Sang-Eun;Shin, Key-Il
    • Communications for Statistical Applications and Methods
    • /
    • v.18 no.3
    • /
    • pp.367-376
    • /
    • 2011
  • BLS weight adjustment is a widely used method for business surveys with non-responses and outliers. Recent surveys show that the non-response weight adjustment of the BLS method is the same as the ratio imputation method. In this paper, we suggested a modified BLS weight adjustment method by imputing missing values instead of using weight adjustment for non-response. Monthly labor survey data is used for a small Monte-Carlo simulation and we conclude that the suggested method is superior to the original BLS weight adjustment method.

Imputation Model for Link Travel Speed Measurement Using UTIS (UTIS 구간통행속도 결측치 보정모델)

  • Ki, Yong-Kul;Ahn, Gye-Hyeong;Kim, Eun-Jeong;Bae, Kwang-Soo
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.10 no.6
    • /
    • pp.63-73
    • /
    • 2011
  • Travel speed is an important parameter for measuring road traffic. UTIS(Urban Traffic Information System) was developed as a mobile detector for measuring link travel speeds in South Korea. After investigation, we founded that UTIS includes some missing data caused by the lack of probe vehicles on road segments, system failures and etc. Imputation is the practice of filling in missing data with estimated values. In this paper, we suggests a new model for imputing missing data to provide accurate link travel speeds to the public. In the field test, new model showed the travel speed measuring accuracy of 93.6%. Therefore, it can be concluded that the proposed model significantly improves travel speed measuring accuracy.

A GEE approach for the semiparametric accelerated lifetime model with multivariate interval-censored data

  • Maru Kim;Sangbum Choi
    • Communications for Statistical Applications and Methods
    • /
    • v.30 no.4
    • /
    • pp.389-402
    • /
    • 2023
  • Multivariate or clustered failure time data often occur in many medical, epidemiological, and socio-economic studies when survival data are collected from several research centers. If the data are periodically observed as in a longitudinal study, survival times are often subject to various types of interval-censoring, creating multivariate interval-censored data. Then, the event times of interest may be correlated among individuals who come from the same cluster. In this article, we propose a unified linear regression method for analyzing multivariate interval-censored data. We consider a semiparametric multivariate accelerated failure time model as a statistical analysis tool and develop a generalized Buckley-James method to make inferences by imputing interval-censored observations with their conditional mean values. Since the study population consists of several heterogeneous clusters, where the subjects in the same cluster may be related, we propose a generalized estimating equations approach to accommodate potential dependence in clusters. Our simulation results confirm that the proposed estimator is robust to misspecification of working covariance matrix and statistical efficiency can increase when the working covariance structure is close to the truth. The proposed method is applied to the dataset from a diabetic retinopathy study.

Supplementation of Zinc Nutrient Database and Evaluation of Zinc Intake of Korean Adults Living in Rural Area (한국인 상용 식품의 아연함량표를 보완하여 평가한 한국농촌성인의 아연 섭취 실태)

  • 이주연
    • Journal of Nutrition and Health
    • /
    • v.31 no.8
    • /
    • pp.1324-1377
    • /
    • 1998
  • This study was conducted for two purposes ; (1) to develop a database for zinc levels in commonly usef Korean food items ; and (2) to calculated the zinc intake fo Korean adults living in a rural area. The currently used Korean food compositinotable was supplemented in term sof zinc content using several methods ; (1) analyzing 98 Korean Food items frequently consumed by Korean adults living in rural area. ; (2) adapting values from U.S Minnesota for 71 items ; and (3) imputing values from similar food for 282 items. A new zinc nutrient databse was constructed including zinc contentrs of 1,195 food items. Zinc intake of rural Korean adults was estimated by a 240hours recall method from 2 ,037 adults over 30 over 30years of age in Yeonchon -gun , Kyunggi province of Korea. Mean daily zinc intake of all subjects was 61mg an dmean intake level of males (7.0mg/day, 46.85 of RDA) was significantly thigher than females(5.2mg/day, 43.0% of RDA). Subjects in their 40's had the highest zinc intak ewhile those over 70 years of age consumed the least amount of zinc. The food group that contributed most to the dietary ainc intake of subjects was cereals and grain products supplying 38% of total zinc intake. The next most important group for zinc intak ewas the meat, poultry , and product group supplying 26% ot total intake. This group was followed by fishes and shellfishes, legumes and their products, and vegetales . For individual food items , reicecontribued most, supplying 27% of total zinc intake follwoed by beef(10%) and prok(9%) . Altogether, plant foods supplied 68% of zinc intake suggesting that the bioabailability of dietary ainc is low. In conclusion, these results show ethat the zinc intake of rural Korean adults is low and that sources of dietary zinc are mainly plant foods, suggesting low bioavailability . Further studies are needed to determine zinc intake and status of Korean population. The zinc database developed in this study will be very valuable for such studies.

  • PDF

A Study on Imputing the Missing Values of Continuous Traffic Counts (상시조사 교통량 자료의 결측 보정에 관한 연구)

  • Lee, Sang Hyup;Shin, Jae Myong
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.33 no.5
    • /
    • pp.2009-2019
    • /
    • 2013
  • Traffic volumes are the important basic data which are directly used for transportation network planning, highway design, highway management and so forth. They are collected by two types of collection methods, one of which is the continuous traffic counts and the other is the short duration traffic counts. The continuous traffic counts are conducted for 365 days a year using the permanent traffic counter and the short duration traffic counts are conducted for specific day(s). In case of the continuous traffic counts the missing of data occurs due to breakdown or malfunction of the counter from time to time. Thus, the diverse imputation methods have been developed and applied so far. In this study the applied exponential smoothing method, in which the data from the days before and after the missing day are used, is proposed and compared with other imputation methods. The comparison shows that the applied exponential smoothing method enhances the accuracy of imputation when the coefficient of traffic volume variation is low. In addition, it is verified that the variation of traffic volume at the site is an important factor for the accuracy of imputation. Therefore, it is necessary to apply different imputation methods depending upon site and time to raise the reliability of imputation for missing traffic values.

A comparison of imputation methods for the consecutive missing temperature data (연속적 결측이 존재하는 기온 자료에 대한 결측복원 기법의 비교)

  • Kim, Hee-Kyung;Kang, In-Kyeong;Lee, Jae-Won;Lee, Yung-Seop
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.3
    • /
    • pp.549-557
    • /
    • 2016
  • Consecutive missing values are likely to occur in long climate data due to system error or defective equipment. Furthermore, it is difficult to impute missing values. However, these complicated problems can be overcame by imputing missing values with reference time series. Reference time series must be composed of similar time series to time series that include missing values. We performed a simulation to compare three missing imputation methods (the adjusted normal ratio method, the regression method and the IDW method) to complete the missing values of time series. A comparison of the three missing imputation methods for the daily mean temperatures at 14 climatological stations indicated that the IDW method was better thanx others at south seaside stations. We also found the regression method was better than others at most stations (except south seaside stations).