• 제목/요약/키워드: outlier detection algorithm

검색결과 75건 처리시간 0.029초

대용량 자료 분석을 위한 밀도기반 이상치 탐지 (Density-based Outlier Detection for Very Large Data)

  • 김승;조남욱;강석호
    • 한국경영과학회지
    • /
    • 제35권2호
    • /
    • pp.71-88
    • /
    • 2010
  • A density-based outlier detection such as an LOF (Local Outlier Factor) tries to find an outlying observation by using density of its surrounding space. In spite of several advantages of a density-based outlier detection method, the computational complexity of outlier detection has been one of major barriers in its application. In this paper, we present an LOF algorithm that can reduce computation time of a density based outlier detection algorithm. A kd-tree indexing and approximated k-nearest neighbor search algorithm (ANN) are adopted in the proposed method. A set of experiments was conducted to examine performance of the proposed algorithm. The results show that the proposed method can effectively detect local outliers in reduced computation time.

Temporal and spatial outlier detection in wireless sensor networks

  • Nguyen, Hoc Thai;Thai, Nguyen Huu
    • ETRI Journal
    • /
    • 제41권4호
    • /
    • pp.437-451
    • /
    • 2019
  • Outlier detection techniques play an important role in enhancing the reliability of data communication in wireless sensor networks (WSNs). Considering the importance of outlier detection in WSNs, many outlier detection techniques have been proposed. Unfortunately, most of these techniques still have some potential limitations, that is, (a) high rate of false positives, (b) high time complexity, and (c) failure to detect outliers online. Moreover, these approaches mainly focus on either temporal outliers or spatial outliers. Therefore, this paper aims to introduce novel algorithms that successfully detect both temporal outliers and spatial outliers. Our contributions are twofold: (i) modifying the Hampel Identifier (HI) algorithm to achieve high accuracy identification rate in temporal outlier detection, (ii) combining the Gaussian process (GP) model and graph-based outlier detection technique to improve the performance of the algorithm in spatial outlier detection. The results demonstrate that our techniques outperform the state-of-the-art methods in terms of accuracy and work well with various data types.

Dam Sensor Outlier Detection using Mixed Prediction Model and Supervised Learning

  • Park, Chang-Mok
    • International journal of advanced smart convergence
    • /
    • 제7권1호
    • /
    • pp.24-32
    • /
    • 2018
  • An outlier detection method using mixed prediction model has been described in this paper. The mixed prediction model consists of time-series model and regression model. The parameter estimation of the prediction model was performed using supervised learning and a genetic algorithm is adopted for a learning method. The experiments were performed in artificial and real data set. The prediction performance is compared with the existing prediction methods using artificial data. Outlier detection is conducted using the real sensor measurements in a dam. The validity of the proposed method was shown in the experiments.

Density-based Outlier Detection in Multi-dimensional Datasets

  • Wang, Xite;Cao, Zhixin;Zhan, Rongjuan;Bai, Mei;Ma, Qian;Li, Guanyu
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제16권12호
    • /
    • pp.3815-3835
    • /
    • 2022
  • Density-based outlier detection is one of the hot issues in data mining. A point is determined as outlier on basis of the density of points near them. The existing density-based detection algorithms have high time complexity, in order to reduce the time complexity, a new outlier detection algorithm DODMD (Density-based Outlier Detection in Multidimensional Datasets) is proposed. Firstly, on the basis of ZH-tree, the concept of micro-cluster is introduced. Each leaf node is regarded as a micro-cluster, and the micro-cluster is calculated to achieve the purpose of batch filtering. In order to obtain n sets of approximate outliers quickly, a greedy method is used to calculate the boundary of LOF and mark the minimum value as LOFmin. Secondly, the outliers can filtered out by LOFmin, the real outliers are calculated, and then the result set is updated to make the boundary closer. Finally, the accuracy and efficiency of DODMD algorithm are verified on real dataset and synthetic dataset respectively.

A Novel Battery State of Health Estimation Method Based on Outlier Detection Algorithm

  • Piao, Chang-hao;Hu, Zi-hao;Su, Ling;Zhao, Jian-fei
    • Journal of Electrical Engineering and Technology
    • /
    • 제11권6호
    • /
    • pp.1802-1811
    • /
    • 2016
  • A novel battery SOH estimation algorithm based on outlier detection has been presented. The Battery state of health (SOH) is one of the most important parameters that describes the usability state of the power battery system. Firstly, a battery system model with lifetime fading characteristic was established, and the battery characteristic parameters were acquired from the lifetime fading process. Then, the outlier detection method based on angular distribution was used to identify the outliers among the battery behaviors. Lastly, the functional relationship between battery SOH and the outlier distribution was obtained by polynomial fitting method. The experimental results show that the algorithm can identify the outliers accurately, and the absolute error between the SOH estimation value and true value is less than 3%.

시계열 자료에서의 특이치 발견 (Outlier detection in time series data)

  • 최정인;엄인옥;조형준
    • 응용통계연구
    • /
    • 제29권5호
    • /
    • pp.907-920
    • /
    • 2016
  • 본 논문의 목표는 분위수 자기회귀모형을 활용하여 시계열 자료에서 특이치를 발견하는 알고리즘을 제안하고, 기존의 방법들과 그 성능을 비교하여 실제 주가 조작 사례에 적용해 보는 것이다. 지금까지의 특이치 발견 연구는 대부분 일반적인 데이터 형태에서만 있어왔기 때문에 시계열 데이터에서의 연구는 미미한 편이다. 또한 모수적인 방법에만 제한되었는데, 모수적 모형은 복잡할 뿐만 아니라 소요되는 분석 시간도 길기 때문에 편리하지 않다. 따라서 본 연구에서는 분위수 자기회귀모형을 활용한 특이치 발견 알고리즘을 새롭게 제시하고, 다양한 경우의 모의실험을 통해 기존 알고리즘과 비교하도록 한다. 특히 시계열 자료에서의 특이치 발견은 주가 조작을 적발하는 데에 유용하게 활용될 수 있다. 시간에 따라 관측되던 주가가 갑자기 그 동안의 흐름에서 벗어나 특이치로 발견되었다면 혹시 인위적인 개입으로 조작된 것은 아닌지 의심해 볼 수 있기 때문이다. 따라서 실제 주가 조작 사례에 적용해 봄으로써 얼마나 빠른 시일 내에 주가 조작을 적발해 낼 수 있는지 살펴보았다.

MULTIPLE OUTLIER DETECTION IN LOGISTIC REGRESSION BY USING INFLUENCE MATRIX

  • Lee, Gwi-Hyun;Park, Sung-Hyun
    • Journal of the Korean Statistical Society
    • /
    • 제36권4호
    • /
    • pp.457-469
    • /
    • 2007
  • Many procedures are available to identify a single outlier or an isolated influential point in linear regression and logistic regression. But the detection of influential points or multiple outliers is more difficult, owing to masking and swamping problems. The multiple outlier detection methods for logistic regression have not been studied from the points of direct procedure yet. In this paper we consider the direct methods for logistic regression by extending the $Pe\tilde{n}a$ and Yohai (1995) influence matrix algorithm. We define the influence matrix in logistic regression by using Cook's distance in logistic regression, and test multiple outliers by using the mean shift model. To show accuracy of the proposed multiple outlier detection algorithm, we simulate artificial data including multiple outliers with masking and swamping.

Outlier detection of GPS monitoring data using relational analysis and negative selection algorithm

  • Yi, Ting-Hua;Ye, X.W.;Li, Hong-Nan;Guo, Qing
    • Smart Structures and Systems
    • /
    • 제20권2호
    • /
    • pp.219-229
    • /
    • 2017
  • Outlier detection is an imperative task to identify the occurrence of abnormal events before the structures are suffered from sudden failure during their service lives. This paper proposes a two-phase method for the outlier detection of Global Positioning System (GPS) monitoring data. Prompt judgment of the occurrence of abnormal data is firstly carried out by use of the relational analysis as the relationship among the data obtained from the adjacent locations following a certain rule. Then, a negative selection algorithm (NSA) is adopted for further accurate localization of the abnormal data. To reduce the computation cost in the NSA, an improved scheme by integrating the adjustable radius into the training stage is designed and implemented. Numerical simulations and experimental verifications demonstrate that the proposed method is encouraging compared with the original method in the aspects of efficiency and reliability. This method is only based on the monitoring data without the requirement of the engineer expertise on the structural operational characteristics, which can be easily embedded in a software system for the continuous and reliable monitoring of civil infrastructure.

R을 이용한 이상점 탐지 알고리즘의 구현 (Realization of an outlier detection algorithm using R)

  • 송규문;문지은;박철용
    • Journal of the Korean Data and Information Science Society
    • /
    • 제22권3호
    • /
    • pp.449-458
    • /
    • 2011
  • 불법 오물 투기는 정부가 당면한 시급한 문제들 중의 하나이다. 최근 들어 관련기관들은 실시간으로 연속적으로 수질의 상태를 감지 할 수 있는 화학적 산소요구량 자동측정기를 강과 하천 등에 설치하고 있다. 본 논문에서는 시계열 간섭모형을 이용하여 화학적 산소요구량 자동측정기로부터 발생하는 데이터를 분석하여 투기시점이라고 여겨지는 이상점을 탐지하는 알고리즘을 R언어를 이용하여 구현한다. R을 이용한 알고리즘을 통해 단계별 계산에서 수동 작업을 피할 수 있기 때문에 알고리즘의 자동화를 달성할 수 있고, 한 단계 더 나아가 모의실험에서 사용될 수 있을 것이다.

실시간 네트워크 침입탐지 시스템을 위한 아웃라이어 클러스터 검출 기법 (An Outlier Cluster Detection Technique for Real-time Network Intrusion Detection Systems)

  • 장재영;박종명;김한준
    • 인터넷정보학회논문지
    • /
    • 제8권6호
    • /
    • pp.43-53
    • /
    • 2007
  • 최근의 네트워크 침입탐지 시스템은 기존의 시그너처(또는 패턴) 기반 탐지 기법에 비정상행위 탐지 기법이 새롭게 결합되면서 더욱 발전되고 있다. 일반적으로 시그너처 기반 침입 탐지 시스템들은 기계학습 알고리즘을 활용함에도 불구하고 사전에 이미 알려진 침입 패턴만을 탐지할 수 있었다. 이상적인 네트워크 침입탐지 시스템을 구축하기 위해서는 침입 패턴이 저장된 시그너처 데이터베이스를 항상 최신의 정보로 유지해야 한다. 따라서 시스템은 유입되는 네트워크 데이터를 모니터링하고 분석하는 과정에서 새로운 공격에 대한 시그너처를 생성할 수 있는 기능이 필요하다. 본 논문에서는 이를 위해 밀도(또는 영향력) 함수를 이용한 새로운 아웃라이어 클러스터 검출 알고리즘을 제안한다. 제안된 알고리즘에서는 네트워크 침입 패턴을 하나의 객체가 아닌 유사 인스턴스들의 집합 형태인 아웃라이어 클러스터로 가정하였다. 본 논문에서는 KDD 1999 Cup 침입탐지 데이터 집합을 이용한 실험을 수행하여, 침입이 자주 발생하는 상황에서 본 논문의 방법이 유클리디언 거리를 이용한 기존의 아웃라이어 탐지 기법에 비해서 좋은 성능을 보임을 증명하였다.

  • PDF