• Title/Summary/Keyword: 이상치 판정

Search Result 118, Processing Time 0.025 seconds

Outlier Detection in Time Series Monitoring Datasets using Rule Based and Correlation Analysis Method (규칙기반 및 상관분석 방법을 이용한 시계열 계측 데이터의 이상치 판정)

  • Jeon, Jesung;Koo, Jakap;Park, Changmok
    • Journal of the Korean GEO-environmental Society
    • /
    • v.16 no.5
    • /
    • pp.43-53
    • /
    • 2015
  • In this study, detection methods of outlier in various monitoring data that fit into big data category were developed and outlier detections were conducted for both artificial data and real field monitoring data. Rule-based methods applied rate of change and probability of error for monitoring data are effective to detect a large-scale short faults and constant faults having no change within a certain period. There are however, problems with misjudgement that consider the normal data with a large scale variation as outlier caused by using independent single dataset. Rule-based methods for noise faults detection have a limit to application of real monitoring data due to the problem with a choice of proper window size of data and finding of threshold for outlier judgment. A correlation analysis among different two datasets were very effective to detect localized outlier and abnormal variation for short and long-term monitoring dataset if reasonable range of training data could be selected.

Development of data processing component module for the flood management in an agricultural watershed (농촌유역 홍수관리를 위한 자료처리 요소모듈 개발)

  • Lee, Do Gil;Kang, Moon Seong;Park, Jihoon;Ryu, Jeong Hoon
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2016.05a
    • /
    • pp.289-289
    • /
    • 2016
  • 신뢰성 높은 홍수관리는 경향성 분석, 이상치 판정 등의 전처리를 수행한 입력 자료를 구축하는 것을 필요로 한다. 경향성 분석은 방법에 따라 경향성의 유무가 다르게 나타나기 때문에 하나의 방법으로만 판단하기 어려우며, 이상치 분석은 지역 특성에 따라 기준이 변동하므로 일정한 기준을 적용하기가 어려워 주로 수동으로 이루어지며 이 작업을 완료하는 데에는 많은 시간이 소요된다. 입력 자료 전처리에 수반되는 비용과 시간을 절감하기 위해 이러한 문제점의 개선이 필요한 실정이다. 따라서 본 연구의 목적은 농촌유역 홍수관리를 위한 자료처리 요소 모듈을 개발하는 데 있다. 홍수관리를 위한 자료처리 요소 모듈은 크게 기상자료의 경향성을 분석하는 모듈과 수위자료의 이상치를 탐지하고 판정하는 모듈로 구성하였다. 경향성 분석 모듈은 모수적 방법인 t-test와 비모수적 방법인 Hotelling-Pabst test 및 Mann-Kendall test를 분석 방법으로 제공하여 하나의 입력 자료로 세 가지 방법으로 분석한 결과를 비교할 수 있도록 개발하였다. 이상치 탐지 모듈은 IQR (interquartile range) 규칙과 규칙기반의 방법을 이용한 이상치 탐지를 제공할 수 있도록 개발하였다. 개발된 모듈은 한강 유역의 용당저수지에 적용하여 검정을 실시하였다. 본 연구에서 개발된 농촌유역 홍수관리를 위한 자료처리 요소 모듈은 추후 홍수관리 및 그에 관한 연구를 하는데 있어 활용될 수 있을 것으로 기대된다.

  • PDF

The Outlier-Filtering Algorithm for National Highway Continuous Traffic Counts Data (일반국도 상시조사 교통량 자료의 이상치 판정 알고리즘 개발)

  • Shin, Jae Myong;Lee, Sang Hyup;Kim, Hyun Suk
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.33 no.2
    • /
    • pp.691-702
    • /
    • 2013
  • In this study the quantitative outlier-filtering algorithm has been developed using the smoothing method based on the day-of-the-week traffic volume variation pattern and then, in order to test the effectiveness of the algorithm, it has been used to identify outliers from the traffic volume data collected at 14 continuous traffic counts sites on the national highways in the year 2010. The test results are satisfactory since the filtering rate is 98.2% for normal days and the mis-filtering rate is 8.0% for abnormal days. Therefore, the algorithm will be able to be used for roughly-but-quickly filtering outliers from the collected traffic volume data.

Outlier detection and treatment in industrial sampling survey (경제조사에서의 이상치 탐지와 처리방법)

  • Joo, Young Sun;Cho, Gyo-Young
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.1
    • /
    • pp.131-142
    • /
    • 2016
  • Outliers in surveys can have a large effect on estimates of totals. This is especially true in business surveys where the populations are drawn are typically skewed. In this paper, we discussed the practical development and implementation of methods to identify and deal with outliers. A detection method is based on quartile method and detected outlier is processed in various ways. The study examines two versions of winsorised estimators with three different cut-off thresholds for each one. For the simulation study, four types of weight transformation function have been considered.

Outlier Detection By Clustering-Based Ensemble Model Construction (클러스터링 기반 앙상블 모델 구성을 이용한 이상치 탐지)

  • Park, Cheong Hee;Kim, Taegong;Kim, Jiil;Choi, Semok;Lee, Gyeong-Hoon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.7 no.11
    • /
    • pp.435-442
    • /
    • 2018
  • Outlier detection means to detect data samples that deviate significantly from the distribution of normal data. Most outlier detection methods calculate an outlier score that indicates the extent to which a data sample is out of normal state and determine it to be an outlier when its outlier score is above a given threshold. However, since the range of an outlier score is different for each data and the outliers exist at a smaller ratio than the normal data, it is very difficult to determine the threshold value for an outlier score. Further, in an actual situation, it is not easy to acquire data including a sufficient amount of outliers available for learning. In this paper, we propose a clustering-based outlier detection method by constructing a model representing a normal data region using only normal data and performing binary classification of outliers and normal data for new data samples. Then, by dividing the given normal data into chunks, and constructing a clustering model for each chunk, we expand it to the ensemble method combining the decision by the models and apply it to the streaming data with dynamic changes. Experimental results using real data and artificial data show high performance of the proposed method.

A sequential outlier detecting method using a clustering algorithm (군집 알고리즘을 이용한 순차적 이상치 탐지법)

  • Seo, Han Son;Yoon, Min
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.4
    • /
    • pp.699-706
    • /
    • 2016
  • Outlier detection methods without performing a test often do not succeed in detecting multiple outliers because they are structurally vulnerable to a masking effect or a swamping effect. This paper considers testing procedures supplemented to a clustering-based method of identifying the group with a minority of the observations as outliers. One of general steps is performing a variety of t-test on individual outlier-candidates. This paper proposes a sequential procedure for searching for outliers by changing cutoff values on a cluster tree and performing a test on a set of outlier-candidates. The proposed method is illustrated and compared to existing methods by an example and Monte Carlo studies.

Compound Outlier Assessment and Verification for Multiple Field Monitoring Data (다수 계측 데이터에 대한 복합 이상치 평가 및 검증)

  • Jeon, Jesung
    • Journal of the Korean GEO-environmental Society
    • /
    • v.19 no.1
    • /
    • pp.5-14
    • /
    • 2018
  • All kinds of monitoring data in construction site could have outlier created from diverse cause. In this study generation technique of synthesis value, its regression, final outlier detection and assessment are conducted to distinct outlier data included in extensive time series dataset. Synthesis value having weight factor of correlation between a number of datasets consist of many monitoring data enable to detect outlier by increasing its correlation. Standard artificial dataset in which intentional outliers are inserted has been used for assessment of synthesis value technique. These results showed increase of detection accuracy for outlier and general tendency in case of having different time series models in common. Accuracy of outlier detection increased in case of using more dataset and showing similar time series pattern.

Development of Integrated Outlier Analysis System for Construction Monitoring Data (건설 계측 데이터에 대한 통합 이상치 분석 시스템 개발)

  • Jeon, Jesung
    • Journal of the Korean GEO-environmental Society
    • /
    • v.21 no.5
    • /
    • pp.5-11
    • /
    • 2020
  • Outliers detection and elimination included in field monitoring datum are essential for effective foundation of unusual movement, long and short range forecast of stability and future behavior to various structures. Integrated outlier analysis system for assessing long term time series data was developed in this study. Outlier analysis could be conducted in two step of primary analysis targeted at single dataset and second multi datasets analysis using synthesis value. Integrated outlier analysis system presents basic information for evaluating stability and predicting movement of structure combined with real-time safety management platform. Field application results showed increased correlation between synthesis value including similar sort of sensor showing constant trend and each single dataset. Various monitoring data in case of showing different trend can be used to analyse outlier through correlation-weighted value.

Combined Filtering Model Using Voting Rule and Median Absolute Deviation for Travel Time Estimation (통행시간 추정을 위한 Voting Rule과 중위절대편차법 기반의 복합 필터링 모형)

  • Jeong, Youngje;Park, Hyun Suk;Kim, Byung Hwa;Kim, Youngchan
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.12 no.6
    • /
    • pp.10-21
    • /
    • 2013
  • This study suggested combined filtering model to eliminate outlier travel time data in transportation information system, and it was based on Median Absolute Deviation and Voting Rule. This model applied Median Absolute Deviation (MAD) method to follow normal distribution as first filtering process. After that, Voting rule is applied to eliminate remaining outlier travel time data after Median Absolute Deviation. In Voting Rule, travel time samples are judged as outliers according to travel-time difference between sample data and mean data. Elimination or not of outliers are determined using a majority rule. In case study of national highway No. 3, combined filtering model selectively eliminated outliers only and could improve accuracy of estimated travel time.

Comparative Analysis of Anomaly Detection Models using AE and Suggestion of Criteria for Determining Outliers

  • Kang, Gun-Ha;Sohn, Jung-Mo;Sim, Gun-Wu
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.8
    • /
    • pp.23-30
    • /
    • 2021
  • In this study, we present a comparative analysis of major autoencoder(AE)-based anomaly detection methods for quality determination in the manufacturing process and a new anomaly discrimination criterion. Due to the characteristics of manufacturing site, anomalous instances are few and their types greatly vary. These properties degrade the performance of an AI-based anomaly detection model using the dataset for both normal and anomalous cases, and incur a lot of time and costs in obtaining additional data for performance improvement. To solve this problem, the studies on AE-based models such as AE and VAE are underway, which perform anomaly detection using only normal data. In this work, based on Convolutional AE, VAE, and Dilated VAE models, statistics on residual images, MSE, and information entropy were selected as outlier discriminant criteria to compare and analyze the performance of each model. In particular, the range value applied to the Convolutional AE model showed the best performance with AUC PRC 0.9570, F1 Score 0.8812 and AUC ROC 0.9548, accuracy 87.60%. This shows a performance improvement of an accuracy about 20%P(Percentage Point) compared to MSE, which was frequently used as a standard for determining outliers, and confirmed that model performance can be improved according to the criteria for determining outliers.