• Title/Summary/Keyword: Missing Values

Search Result 441, Processing Time 0.024 seconds

A longitudinal study for child aggression with Korea Welfare Panel Study data (한국복지패널 자료를 이용한 아동기 공격성에 대한 경시적 자료 분석)

  • Choi, Nayeon;Huh, Jib
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.6
    • /
    • pp.1439-1447
    • /
    • 2014
  • Most of literatures on Korean child aggression are based on using the cross-sectional data sets. Although there is a related study with a longitudinal data set, it is assumed that the data sets measured repeatedly in the longitudinal data are mutually independent. A longitudinal data analysis for Korean child aggression is then necessary. This study is to analyze the effect of child development outcomes including academic achievement, self-esteem, depression anxiety, delinquency, victimization by peers, abuse by parents and internet using time on child aggression with Korea Welfare Panel Study data observed three times between 2006 and 2012. Since Korea Welfare Panel Study data have missing values, the missing at random is assumed. The linear mixed effect model and the restricted maximum likelihood estimation are considered.

Reconstruction and Change Analysis for Temporal Series of Remotely-sensed Data (연속 원격탐사 영상자료의 재구축과 변화 탐지)

  • 이상훈
    • Korean Journal of Remote Sensing
    • /
    • v.18 no.2
    • /
    • pp.117-125
    • /
    • 2002
  • Multitemporal analysis with remotely sensed data is complicated by numerous intervening factors, including atmospheric attenuation and occurrence of clouds that obscure the relationship between ground and satellite observed spectral measurements. Using an adaptive reconstruction system, dynamic compositing approach was developed to recover missing/bad observations. The reconstruction method incorporates temporal variation in physical properties of targets and anisotropic spatial optical properties into image processing. The adaptive system performs the dynamic compositing by obtaining a composite image as a weighted sum of the observed value and the value predicted according to local temporal trend. The proposed system was applied to the sequence of NDVI images of AVHRR observed on the Korean Peninsula from 1999 year to 2000 year. The experiment shows that the reconstructed series can be used as an estimated series with complete data for the observations including bad/missing values. Additionally, the gradient image, which represents the amount of temporal change at the corresponding time, was generated by the proposed system. It shows more clearly temporal variation than the data image series.

Predictive Optimization Adjusted With Pseudo Data From A Missing Data Imputation Technique (결측 데이터 보정법에 의한 의사 데이터로 조정된 예측 최적화 방법)

  • Kim, Jeong-Woo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.20 no.2
    • /
    • pp.200-209
    • /
    • 2019
  • When forecasting future values, a model estimated after minimizing training errors can yield test errors higher than the training errors. This result is the over-fitting problem caused by an increase in model complexity when the model is focused only on a given dataset. Some regularization and resampling methods have been introduced to reduce test errors by alleviating this problem but have been designed for use with only a given dataset. In this paper, we propose a new optimization approach to reduce test errors by transforming a test error minimization problem into a training error minimization problem. To carry out this transformation, we needed additional data for the given dataset, termed pseudo data. To make proper use of pseudo data, we used three types of missing data imputation techniques. As an optimization tool, we chose the least squares method and combined it with an extra pseudo data instance. Furthermore, we present the numerical results supporting our proposed approach, which resulted in less test errors than the ordinary least squares method.

Associations of periodontal status in periodontitis and rheumatoid arthritis patients

  • Rovas, Adomas;Puriene, Alina;Punceviciene, Egle;Butrimiene, Irena;Stuopelyte, Kristina;Jarmalaite, Sonata
    • Journal of Periodontal and Implant Science
    • /
    • v.51 no.2
    • /
    • pp.124-134
    • /
    • 2021
  • Purpose: The aim of this study was to assess the association between the clinical status of rheumatoid arthritis (RA) and periodontitis (PD) in patients diagnosed with PD and to evaluate the impact of RA treatment on the severity of PD. Methods: The study included 148 participants with PD, of whom 64 were also diagnosed with RA (PD+RA group), while 84 age-matched participants were rheumatologically healthy (PD-only group). PD severity was assessed by the following periodontal parameters: clinical attachment loss, probing pocket depth (PPD), bleeding on probing (BOP), alveolar bone loss, and number of missing teeth. RA disease characteristics and impact of disease were evaluated by the Disease Activity Score 28 using C-reactive protein, disease duration, RA treatment, the RA Impact of Disease tool, and the Health Assessment Questionnaire. Outcome variables were compared using parametric and non-parametric tests and associations were evaluated using regression analysis with the calculation of odds ratios (ORs). Results: Participants in the PD+RA group had higher mean PPD values (2.81 ± 0.59 mm vs. 2.58 ± 0.49 mm, P=0.009) and number of missing teeth (6.27±4.79 vs. 3.93±4.08, P=0.001) than those in the PD-only group. A significant association was found between mean PPD and RA (OR, 2.22; 95% CI, 1.16-4.31; P=0.016). Within the PD+RA group, moderate to severe periodontal disease was significantly more prevalent among participants with higher RA disease activity (P=0.042). The use of biologic disease-modifying antirheumatic drugs (bDMARDs) was associated with a lower BOP percentage (P=0.016). Conclusions: In patients with PD, RA was associated with a higher mean PPD and number of missing teeth. The severity of PD was affected by the RA disease clinical activity and by treatment with bDMARDs, which were associated with a significantly lower mean BOP percentage.

Long-gap Filling Method for the Coastal Monitoring Data (해양모니터링 자료의 장기결측 보충 기법)

  • Cho, Hong-Yeon;Lee, Gi-Seop;Lee, Uk-Jae
    • Journal of Korean Society of Coastal and Ocean Engineers
    • /
    • v.33 no.6
    • /
    • pp.333-344
    • /
    • 2021
  • Technique for the long-gap filling that occur frequently in ocean monitoring data is developed. The method estimates the unknown values of the long-gap by the summation of the estimated trend and selected residual components of the given missing intervals. The method was used to impute the data of the long-term missing interval of about 1 month, such as temperature and water temperature of the Ulleungdo ocean buoy data. The imputed data showed differences depending on the monitoring parameters, but it was found that the variation pattern was appropriately reproduced. Although this method causes bias and variance errors due to trend and residual components estimation, it was found that the bias error of statistical measure estimation due to long-term missing is greatly reduced. The mean, and the 90% confidence intervals of the gap-filling model's RMS errors are 0.93 and 0.35~1.95, respectively.

Probabilistic penalized principal component analysis

  • Park, Chongsun;Wang, Morgan C.;Mo, Eun Bi
    • Communications for Statistical Applications and Methods
    • /
    • v.24 no.2
    • /
    • pp.143-154
    • /
    • 2017
  • A variable selection method based on probabilistic principal component analysis (PCA) using penalized likelihood method is proposed. The proposed method is a two-step variable reduction method. The first step is based on the probabilistic principal component idea to identify principle components. The penalty function is used to identify important variables in each component. We then build a model on the original data space instead of building on the rotated data space through latent variables (principal components) because the proposed method achieves the goal of dimension reduction through identifying important observed variables. Consequently, the proposed method is of more practical use. The proposed estimators perform as the oracle procedure and are root-n consistent with a proper choice of regularization parameters. The proposed method can be successfully applied to high-dimensional PCA problems with a relatively large portion of irrelevant variables included in the data set. It is straightforward to extend our likelihood method in handling problems with missing observations using EM algorithms. Further, it could be effectively applied in cases where some data vectors exhibit one or more missing values at random.

Household, personal, and financial determinants of surrender in Korean health insurance

  • Shim, Hyunoo;Min, Jung Yeun;Choi, Yang Ho
    • Communications for Statistical Applications and Methods
    • /
    • v.28 no.5
    • /
    • pp.447-462
    • /
    • 2021
  • In insurance, the surrender rate is an important variable that threatens the sustainability of insurers and determines the profitability of the contract. Unlike other actuarial assumptions that determine the cash flow of an insurance contract, however, it is characterized by endogenous variables such as people's economic, social, and subjective decisions. Therefore, a microscopic approach is required to identify and analyze the factors that determine the lapse rate. Specifically, micro-level characteristics including the individual, demographic, microeconomic, and household characteristics of policyholders are necessary for the analysis. In this study, we select panel survey data of Korean Retirement Income Study (KReIS) with many diverse dimensions to determine which variables have a decisive effect on the lapse and apply the lasso regularized regression model to analyze it empirically. As the data contain many missing values, they are imputed using the random forest method. Among the household variables, we find that the non-existence of old dependents, the existence of young dependents, and employed family members increase the surrender rate. Among the individual variables, divorce, non-urban residential areas, apartment type of housing, non-ownership of homes, and bad relationship with siblings increase the lapse rate. Finally, among the financial variables, low income, low expenditure, the existence of children that incur child care expenditure, not expecting to bequest from spouse, not holding public health insurance, and expecting to benefit from a retirement pension increase the lapse rate. Some of these findings are consistent with those in the literature.

Application Examples Applying Extended Data Expression Technique to Classification Problems (패턴 분류 문제에 확장된 데이터 표현 기법을 적용한 응용 사례)

  • Lee, Jong Chan
    • Journal of the Korea Convergence Society
    • /
    • v.9 no.12
    • /
    • pp.9-15
    • /
    • 2018
  • The main goal of extended data expression is to develop a data structure suitable for common problems in ubiquitous environments. The greatest feature of this method is that the attribute values can be represented with probability. The next feature is that each event in the training data has a weight value that represents its importance. After this data structure has been developed, an algorithm has been devised that can learn it. In the meantime, this algorithm has been applied to various problems in various fields to obtain good results. This paper first introduces the extended data expression technique, UChoo, and rule refinement method, which are the theoretical basis. Next, this paper introduces some examples of application areas such as rule refinement, missing data processing, BEWS problem, and ensemble system.

An Empirical Comparison of Bagging, Boosting and Support Vector Machine Classifiers in Data Mining (데이터 마이닝에서 배깅, 부스팅, SVM 분류 알고리즘 비교 분석)

  • Lee Yung-Seop;Oh Hyun-Joung;Kim Mee-Kyung
    • The Korean Journal of Applied Statistics
    • /
    • v.18 no.2
    • /
    • pp.343-354
    • /
    • 2005
  • The goal of this paper is to compare classification performances and to find a better classifier based on the characteristics of data. The compared methods are CART with two ensemble algorithms, bagging or boosting and SVM. In the empirical study of twenty-eight data sets, we found that SVM has smaller error rate than the other methods in most of data sets. When comparing bagging, boosting and SVM based on the characteristics of data, SVM algorithm is suitable to the data with small numbers of observation and no missing values. On the other hand, boosting algorithm is suitable to the data with number of observation and bagging algorithm is suitable to the data with missing values.

Prediction of Dissolved Oxygen in Jindong Bay Using Time Series Analysis (시계열 분석을 이용한 진동만의 용존산소량 예측)

  • Han, Myeong-Soo;Park, Sung-Eun;Choi, Youngjin;Kim, Youngmin;Hwang, Jae-Dong
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.26 no.4
    • /
    • pp.382-391
    • /
    • 2020
  • In this study, we used artificial intelligence algorithms for the prediction of dissolved oxygen in Jindong Bay. To determine missing values in the observational data, we used the Bidirectional Recurrent Imputation for Time Series (BRITS) deep learning algorithm, Auto-Regressive Integrated Moving Average (ARIMA), a widely used time series analysis method, and the Long Short-Term Memory (LSTM) deep learning method were used to predict the dissolved oxygen. We also compared accuracy of ARIMA and LSTM. The missing values were determined with high accuracy by BRITS in the surface layer; however, the accuracy was low in the lower layers. The accuracy of BRITS was unstable due to the experimental conditions in the middle layer. In the middle and bottom layers, the LSTM model showed higher accuracy than the ARIMA model, whereas the ARIMA model showed superior performance in the surface layer.