• Title/Summary/Keyword: 결측데이터

Search Result 136, Processing Time 0.031 seconds

Undecided inference using bivariate probit models (이변량 프로빗모형을 이용한 미결정자 추론)

  • Hong, Chong-Sun;Jung, Mi-Yang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.22 no.6
    • /
    • pp.1017-1028
    • /
    • 2011
  • When it is not easy to decide the credit scoring for some loan applicants, credit evaluation is postponded and reserve to ask a specialist for further evaluation of undecided applicants. This undecided inference is one of problems that happen to most statistical models including the biostatistics and sportal statistics as well as credit evaluation area. In this work, the undecided inference is regarded as a missing data mechanism under the assumption of MNAR, and use the bivariate probit model which is one of sample selection models. Two undecided inference methods are proposed: one is to make use of characteristic variables to represent the state for decided applicants, and the other is that more accurate and additional informations are collected and apply these new variables. With an illustrated example, misclassification error rates for undecided and overall applicants are obtainded and compared according to various characteristic variables, undecided intervals, and thresholds. It is found that misclassification error rates could be reduced when the undecided interval is increased and more accurate information is put to model, since more accurate situation of decided applications are reflected in the bivariate probit model.

A longitudinal study for child aggression with Korea Welfare Panel Study data (한국복지패널 자료를 이용한 아동기 공격성에 대한 경시적 자료 분석)

  • Choi, Nayeon;Huh, Jib
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.6
    • /
    • pp.1439-1447
    • /
    • 2014
  • Most of literatures on Korean child aggression are based on using the cross-sectional data sets. Although there is a related study with a longitudinal data set, it is assumed that the data sets measured repeatedly in the longitudinal data are mutually independent. A longitudinal data analysis for Korean child aggression is then necessary. This study is to analyze the effect of child development outcomes including academic achievement, self-esteem, depression anxiety, delinquency, victimization by peers, abuse by parents and internet using time on child aggression with Korea Welfare Panel Study data observed three times between 2006 and 2012. Since Korea Welfare Panel Study data have missing values, the missing at random is assumed. The linear mixed effect model and the restricted maximum likelihood estimation are considered.

Generating GAN-based Virtual data to Prevent the Spread of Highly Pathogenic Avian Influenza(HPAI) (고위험성 조류인플루엔자(HPAI) 확산 방지를 위한 GAN 기반 가상 데이터 생성)

  • Choi, Dae-Woo;Han, Ye-Ji;Song, Yu-Han;Kang, Tae-Hun;Lee, Won-Been
    • The Journal of Bigdata
    • /
    • v.5 no.2
    • /
    • pp.69-76
    • /
    • 2020
  • This study was conducted with the support of the Information and Communication Technology Promotion Center, funded by the government (Ministry of Science and ICT) in 2019. Highly pathogenic avian influenza (HPAI) is an acute infectious disease of birds caused by highly pathogenic avian influenza virus infection, causing serious damage to poultry such as chickens and ducks. High pathogenic avian influenza (HPAI) is caused by focusing on winter rather than year-round, and sometimes does not occur at all during a certain period of time. Due to these characteristics of HPAI, there is a problem that does not accumulate enough actual data. In this paper study, GAN network was utilized to generate actual similar data containing missing values and the process is introduced. The results of this study can be used to measure risk by generating realistic simulation data for certain times when HPAI did not occur.

A Study on the Traffic Volume Correction and Prediction Using SARIMA Algorithm (SARIMA 알고리즘을 이용한 교통량 보정 및 예측)

  • Han, Dae-cheol;Lee, Dong Woo;Jung, Do-young
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.20 no.6
    • /
    • pp.1-13
    • /
    • 2021
  • In this study, a time series analysis technique was applied to calibrate and predict traffic data for various purposes, such as planning, design, maintenance, and research. Existing algorithms have limitations in application to data such as traffic data because they show strong periodicity and seasonality or irregular data. To overcome and supplement these limitations, we applied the SARIMA model, an analytical technique that combines the autocorrelation model, the Seasonal Auto Regressive(SAR), and the seasonal Moving Average(SMA). According to the analysis, traffic volume prediction using the SARIMA(4,1,3)(4,0,3) 12 model, which is the optimal parameter combination, showed excellent performance of 85% on average. In addition to traffic data, this study is considered to be of great value in that it can contribute significantly to traffic correction and forecast improvement in the event of missing traffic data, and is also applicable to a variety of time series data recently collected.

A point-scale gap filling of the flux-tower data using the artificial neural network (인공신경망 기법을 이용한 청미천 유역 Flux tower 결측치 보정)

  • Jeon, Hyunho;Baik, Jongjin;Lee, Seulchan;Choi, Minha
    • Journal of Korea Water Resources Association
    • /
    • v.53 no.11
    • /
    • pp.929-938
    • /
    • 2020
  • In this study, we estimated missing evapotranspiration (ET) data at a eddy-covariance flux tower in the Cheongmicheon farmland site using the Artificial Neural Network (ANN). The ANN showed excellent performance in numerical analysis and is expanding in various fields. To evaluate the performance the ANN-based gap-filling, ET was calculated using the existing gap-filling methods of Mean Diagnostic Variation (MDV) and Food and Aggregation Organization Penman-Monteith (FAO-PM). Then ET was evaluated by time series method and statistical analysis (coefficient of determination, index of agreement (IOA), root mean squared error (RMSE) and mean absolute error (MAE). For the validation of each gap-filling model, we used 30 minutes of data in 2015. Of the 121 missing values, the ANN method showed the best performance by supplementing 70, 53 and 84 missing values, respectively, in the order of MDV, FAO-PM, and ANN methods. Analysis of the coefficient of determination (MDV, FAO-PM, and ANN methods followed by 0.673, 0.784, and 0.841, respectively.) and the IOA (The MDV, FAO-PM, and ANN methods followed by 0.899, 0.890, and 0.951 respectively.) indicated that, all three methods were highly correlated and considered to be fully utilized, and among them, ANN models showed the highest performance and suitability. Based on this study, it could be used more appropriately in the study of gap-filling method of flux tower data using machine learning method.

Modified Transformation and Evaluation for High Concentration Ozone Predictions (고농도 오존 예측을 위한 향상된 변환 기법과 예측 성능 평가)

  • Cheon, Seong-Pyo;Kim, Sung-Shin;Lee, Chong-Bum
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.17 no.4
    • /
    • pp.435-442
    • /
    • 2007
  • To reduce damage from high concentration ozone in the air, we have researched how to predict high concentration ozone before it occurs. High concentration ozone is a rare event and its reaction mechanism has nonlinearities and complexities. In this paper, we have tried to apply and consider as many methods as we could. We clustered the data using the fuzzy c-mean method and took a rejection sampling to fill in the missing and abnormal data. Next, correlations of the input component and output ozone concentration were calculated to transform more correlated components by modified log transformation. Then, we made the prediction models using Dynamic Polynomial Neural Networks. To select the optimal model, we adopted a minimum bias criterion. Finally, to evaluate suggested models, we compared the two models. One model was trained and tested by the transformed data and the other was not. We concluded that the modified transformation effected good to ideal performance In some evaluations. In particular, the data were related to seasonal characteristics or its variation trends.

Verification Study for Remotely Sensed Soil Moisture (인공위성 토양수분 자료 검증에 관한 연구)

  • Hur, Yoo-Mi;Choi, Min-Ha;Jung, Sung-Won
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2010.05a
    • /
    • pp.1564-1569
    • /
    • 2010
  • 토양수분은 수문현상 즉, 물의 순환과정을 이해하고 기상변화를 고려하는데 중요한 인자 중 하나이며 이는 최근 이상기후로 인한 가뭄 및 홍수 등의 자연재해가 우리나라 전역에 빈번히 발생되고 있는 가운데 이러한 현상을 보다 정확히 해석하기 위해 토양수분의 중요성이 더욱 부각되고 있다. 현재 이를 관측 및 분석하고 있으나 대부분 관측기간이 짧고 장비가 노후화되어 많은 결측치를 나타내고 있으며 관측치가 있더라도 여러 가지 요인으로 인해 관측에 대한 분석의 신뢰도가 떨어진다. 이로 인하여 본 연구에서는 광역적 범위에서 정확한 토양수분량 측정을 하고 있는 Advanced Microwave Scanning Radiometer E (AMSR-E) 위성관측 데이터를 기존의 토양수분 자료와 비교/검증하여 이의 활용방안을 모색하고자 한다.

  • PDF

Customer Classification Method for Household Appliances Industries with a Large Number of Incomplete Data (다수의 결측치가 존재하는 가전업 고객 데이터 활용을 위한 고객분류기법의 개발)

  • Chang, Young-Soon;Seo, Jong-Hyen
    • IE interfaces
    • /
    • v.19 no.1
    • /
    • pp.86-96
    • /
    • 2006
  • Some customer data of manufacturing industries have a large number of incomplete data set due to the customer's infrequent purchasing behavior and the limitation of customer profile data gathered from sales representatives. So that, most sophisticated data analysis methods may not be applied directly. This paper proposes a heuristic data analysis method to classify customers in household appliances industries. The proposed PD (percent of difference) method can be used for the discriminant analysis of incomplete customer data with simple mathematical calculations. The method is composed of variable distribution estimation step, PD measure and cluster score evaluation steps, variable impact construction step, and segment assignment step. A real example is also presented.

Lattice Conditional Independence Models Based on the Essential Graph (에센셜 그래프를 바탕으로 한 격자 조건부 독립 모델)

  • Ju Sung, Kim;Myoong Young, Yoon
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.9 no.2
    • /
    • pp.9-16
    • /
    • 2004
  • Recently, lattice conditional independence models(LCIMs) have been introduced for the analysis of non-monotone missing data patterns and of non-nested dependent regression models. This approach has been successfully applied to solve various problems in data pattern analysis, however, it suffers from computational burden to search LCIMs. In order to cope with this drawback, we propose a new scheme for finding LCIMs based on the essential graph. Also, we show that the class of LCIMs coincides with the class of all transitive acyclic directed graph(TADG) models which are Markov equivalent to a specific acyclic directed graph(ADG) models.

  • PDF