• 제목/요약/키워드: Missing Value

검색결과 315건 처리시간 0.024초

RAM 분석 정확도 향상을 위한 야전운용 데이터의 이상값과 결측값 처리 방안 (Method of Processing the Outliers and Missing Values of Field Data to Improve RAM Analysis Accuracy)

  • 김인석;정원
    • 한국신뢰성학회지:신뢰성응용연구
    • /
    • 제17권3호
    • /
    • pp.264-271
    • /
    • 2017
  • Purpose: Field operation data contains missing values or outliers due to various causes of the data collection process, so caution is required when utilizing RAM analysis results by field operation data. The purpose of this study is to present a method to minimize the RAM analysis error of the field data to improve the accuracy. Methods: Statistical methods are presented for processing of the outliers and the missing values of the field operating data, and after analyzing the RAM, the differences between before and after applying the technique are discussed. Results: The availability is estimated to be lower by 6.8 to 23.5% than that before processing, and it is judged that the processing of the missing values and outliers greatly affect the RAM analysis result. Conclusion: RAM analysis of OO weapon system was performed and suggestions for improvement of RAM analysis were presented through comparison with the new and current method. Data analysis results without appropriate treatment of error values may result in incorrect conclusions leading to inappropriate decisions and actions.

Measurement of missing video frames in NPP control room monitoring system using Kalman filter

  • Mrityunjay Chaubey;Lalit Kumar Singh;Manjari Gupta
    • Nuclear Engineering and Technology
    • /
    • 제55권1호
    • /
    • pp.37-44
    • /
    • 2023
  • Using the Kalman filtering technique, we propose a novel method for estimating the missing video frames to monitor the activities inside the control room of a nuclear power plant (NPP). The purpose of this study is to reinforce the existing security and safety procedures in the control room of an NPP. The NPP control room serves as the nervous system of the plant, with instrumentation and control systems used to monitor and control critical plant parameters. Because the safety and security of the NPP control room are critical, it must be monitored closely by security cameras in order to assess and reduce the onset of any incidents and accidents that could adversely impact the safety of the NPP. However, for a variety of technical and administrative reasons, continuous monitoring may be interrupted. Because of the interruption, one or more frames of the video may be distorted or missing, making it difficult to identify the activity during this time period. This could endanger overall safety. The demonstrated Kalman filter model estimates the value of the missing frame pixel-by-pixel using information from the frame that occurred in the video sequence before it and the frame that will occur in the video sequence after it. The results of the experiment provide evidence of the effectiveness of the algorithm.

Exploiting Patterns for Handling Incomplete Coevolving EEG Time Series

  • Thi, Ngoc Anh Nguyen;Yang, Hyung-Jeong;Kim, Sun-Hee
    • International Journal of Contents
    • /
    • 제9권4호
    • /
    • pp.1-10
    • /
    • 2013
  • The electroencephalogram (EEG) time series is a measure of electrical activity received from multiple electrodes placed on the scalp of a human brain. It provides a direct measurement for characterizing the dynamic aspects of brain activities. These EEG signals are formed from a series of spatial and temporal data with multiple dimensions. Missing data could occur due to fault electrodes. These missing data can cause distortion, repudiation, and further, reduce the effectiveness of analyzing algorithms. Current methodologies for EEG analysis require a complete set of EEG data matrix as input. Therefore, an accurate and reliable imputation approach for missing values is necessary to avoid incomplete data sets for analyses and further improve the usage of performance techniques. This research proposes a new method to automatically recover random consecutive missing data from real world EEG data based on Linear Dynamical System. The proposed method aims to capture the optimal patterns based on two main characteristics in the coevolving EEG time series: namely, (i) dynamics via discovering temporal evolving behaviors, and (ii) correlations by identifying the relationships between multiple brain signals. From these exploits, the proposed method successfully identifies a few hidden variables and discovers their dynamics to impute missing values. The proposed method offers a robust and scalable approach with linear computation time over the size of sequences. A comparative study has been performed to assess the effectiveness of the proposed method against interpolation and missing values via Singular Value Decomposition (MSVD). The experimental simulations demonstrate that the proposed method provides better reconstruction performance up to 49% and 67% improvements over MSVD and interpolation approaches, respectively.

손실 값을 갖는 유비쿼터스 헬스케어 환경에서 신경망을 이용한 에이전트 기반 증상 패턴 분류 (Symptom Pattern Classification using Neural Networks in the Ubiquitous Healthcare Environment with Missing Values)

  • 마이클 안젤로 살보;이재완;이말례
    • 인터넷정보학회논문지
    • /
    • 제11권2호
    • /
    • pp.129-142
    • /
    • 2010
  • 무선선서네트워크의 주요 응용분야 중 하나가 유비쿼터스 헬스케어 시스템이다. 하지만 무선센서네트워크가 가지고 있는 과제중의 하나는 데이터 중에 나타나는 높은 손실 율이다. 바이오 센서로부터 들어오는 데이터는 기지국에 도착되지 않을 수 있으며, 이 값은 손실 값(missing value)이 된다. 본 논문은 기지국에서 데이터를 수집하고, 손실 값을 처리한 후, 증상 패턴에 따라 건강상태를 분류하여, 비상시에 적절한 행동을 취할 수 있도록 하는 헬스케어 모니터 에이전트(HMA)를 제안한다. 이 에이전트는 유비쿼터스 헬스케어 환경에 적용되며, 건강상태를 인지하기 위한 증상패턴으로 바이오 센서 및 환자의 가족력으로 부터 생성된 데이터를 사용한다. 손실 값이 나타나면 HMA는 분류하기 전에 증상패턴의 손실 값을 채우기 위한 예측 알고리즘을 수행한다. 시뮬레이션 결과 HMA를 사용한 예측알고리즘이 다른 방법들에 비해 더 정확하게 증상패턴을 분류함을 보여주었다.

Different penalty methods for assessing interval from first to successful insemination in Japanese Black heifers

  • Setiaji, Asep;Oikawa, Takuro
    • Asian-Australasian Journal of Animal Sciences
    • /
    • 제32권9호
    • /
    • pp.1349-1354
    • /
    • 2019
  • Objective: The objective of this study was to determine the best approach for handling missing records of first to successful insemination (FS) in Japanese Black heifers. Methods: Of a total of 2,367 records of heifers born between 2003 and 2015 used, 206 (8.7%) of open heifers were missing. Four penalty methods based on the number of inseminations were set as follows: C1, FS average according to the number of inseminations; C2, constant number of days, 359; C3, maximum number of FS days to each insemination; and C4, average of FS at the last insemination and FS of C2. C5 was generated by adding a constant number (21 d) to the highest number of FS days in each contemporary group. The bootstrap method was used to compare among the 5 methods in terms of bias, mean squared error (MSE) and coefficient of correlation between estimated breeding value (EBV) of non-censored data and censored data. Three percentages (5%, 10%, and 15%) were investigated using the random censoring scheme. The univariate animal model was used to conduct genetic analysis. Results: Heritability of FS in non-censored data was $0.012{\pm}0.016$, slightly lower than the average estimate from the five penalty methods. C1, C2, and C3 showed lower standard errors of estimated heritability but demonstrated inconsistent results for different percentages of missing records. C4 showed moderate standard errors but more stable ones for all percentages of the missing records, whereas C5 showed the highest standard errors compared with noncensored data. The MSE in C4 heritability was $0.633{\times}10^{-4}$, $0.879{\times}10^{-4}$, $0.876{\times}10^{-4}$ and $0.866{\times}10^{-4}$ for 5%, 8.7%, 10%, and 15%, respectively, of the missing records. Thus, C4 showed the lowest and the most stable MSE of heritability; the coefficient of correlation for EBV was 0.88; 0.93 and 0.90 for heifer, sire and dam, respectively. Conclusion: C4 demonstrated the highest positive correlation with the non-censored data set and was consistent within different percentages of the missing records. We concluded that C4 was the best penalty method for missing records due to the stable value of estimated parameters and the highest coefficient of correlation.

Cluster Analysis of Incomplete Microarray Data with Fuzzy Clustering

  • Kim, Dae-Won
    • 한국지능시스템학회논문지
    • /
    • 제17권3호
    • /
    • pp.397-402
    • /
    • 2007
  • In this paper, we present a method for clustering incomplete Microarray data using alternating optimization in which a prior imputation method is not required. To reduce the influence of imputation in preprocessing, we take an alternative optimization approach to find better estimates during iterative clustering process. This method improves the estimates of missing values by exploiting the cluster Information such as cluster centroids and all available non-missing values in each iteration. The clustering results of the proposed method are more significantly relevant to the biological gene annotations than those of other methods, indicating its effectiveness and potential for clustering incomplete gene expression data.

A Naive Multiple Imputation Method for Ignorable Nonresponse

  • Lee, Seung-Chun
    • Communications for Statistical Applications and Methods
    • /
    • 제11권2호
    • /
    • pp.399-411
    • /
    • 2004
  • A common method of handling nonresponse in sample survey is to delete the cases, which may result in a substantial loss of cases. Thus in certain situation, it is of interest to create a complete set of sample values. In this case, a popular approach is to impute the missing values in the sample by the mean or the median of responders. The difficulty with this method which just replaces each missing value with a single imputed value is that inferences based on the completed dataset underestimate the precision of the inferential procedure. Various suggestions have been made to overcome the difficulty but they might not be appropriate for public-use files where the user has only limited information for about the reasons for nonresponse. In this note, a multiple imputation method is considered to create complete dataset which might be used for all possible inferential procedures without misleading or underestimating the precision.

머신러닝 기법을 활용한 에너지 데이터 분석에 관한 연구 (A Research on the Energy Data Analysis using Machine Learning)

  • 김동주;권성철;문종희;심기도;배문성
    • KEPCO Journal on Electric Power and Energy
    • /
    • 제7권2호
    • /
    • pp.301-307
    • /
    • 2021
  • After the spread of the data collection devices such as smart meters, energy data is increasingly collected in a variety of ways, and its importance continues to grow. However, due to technical or practical limitations, errors such as missing or outliers in the data occur during data collection process. Especially in the case of customer-related data, billing problems may occur, so energy companies are conducting various research to process such data. In addition, efforts are being made to create added value from data, which makes it difficult to provide such services unless reliability of data is guaranteed. In order to solve these challenges, this research analyzes prior research related to bad data processing specifically in the energy field, and propose new missing value processing methods to improve the reliability and field utilization of energy data.

Support Vector Regression을 이용한 희소 데이터의 전처리 (A Sparse Data Preprocessing Using Support Vector Regression)

  • 전성해;박정은;오경환
    • 한국지능시스템학회논문지
    • /
    • 제14권6호
    • /
    • pp.789-792
    • /
    • 2004
  • 웹 마이닝, 바이오정보학, 통계적 자료 분석 등 여러 분야에서 매우 다양한 형태의 결측치가 발생하여 학습 데이터를 희소하게 만든다. 결측치는 주로 전처리 과정에서 가장 기본적인 평균과 최빈수뿐만 아니라 조건부 평균, 나무 모형, 그리고 마코프체인 몬테칼로 기법과 같은 결측치 대체 기법들을 적용하여 추정된 값에 의해 대체된다. 그런데 주어진 데이터의 결측치 비율이 크게 되면 기존의 결측치 대체 방법들의 예측의 정확도는 낮아지는 특성을 보인다. 또한 데이터의 결측치 비율이 증가할수록 사용 가능한 결측치 대체 방법들의 수는 제한된다. 이러한 문제점을 해결하기 위하여 본 논문에서는 통계적 학습 이론 중에서 Vapnik의 Support Vector Regression을 데이터 전처리 과정에 알맞게 변형하여 적용하였다. 제안 방법을 이용하여 결측치 비율이 큰 희소 데이터의 전처리도 가능할 수 있도록 하였다 UCI machine learning repository로부터 얻어진 데이터를 이용하여 제안 방법의 성능을 확인하였다.

나이브 성향점수보정 추정량의 정확성 향상을 위한 이중 사후층화 방법 연구 (A study to improve the accuracy of the naive propensity score adjusted estimator using double post-stratification method)

  • 여이수;신기일
    • 응용통계연구
    • /
    • 제36권6호
    • /
    • pp.547-559
    • /
    • 2023
  • 표본조사에서 무응답의 적절한 처리는 추정의 정확성을 향상한다. 결측 메카니즘이 MCAR (missing completely at random) 또는 MAR (missing at random)인 경우에서는 이를 적절히 처리할 수 있는 다양한 방법이 연구되었다. 무응답이 발생하였을 때 사용하는 평균 추정량으로 흔히 성향점수보정 추정량이 사용되며 MAR 또는 MCAR 무응답인 경우, 알려진 표본 가중치와 타당한 방법으로 추정된 응답확률을 사용할 수 있으므로 성향점수보정 추정량은 불편추정량이 된다. 그러나 관심변수 값에 영향을 받는 무응답인 MNAR (missing not at random) 무응답에서는 정확한 응답확률을 구하는 것이 어려워 성향점수보정 추정량에 편향이 발생할 수 있다. Chung과 Shin (2017, 2022)은 무정보적 표본설계에서 MNAR 무응답이 발생하였을 때 평균 추정의 정확성을 향상하는 방법으로 단일 사후층화 방법을 제안하였다. 본 연구에서는 정보적 표본설계를 사용하고, MNAR 무응답이 발생한 경우에서 나이브 성향점수보정 추정량의 정확성 향상을 위한 이중 사후층화 방법을 제안하였다. 또한, 모의실험을 통해 제안된 방법의 우수성을 확인하였다.