• Title/Summary/Keyword: Outlier Prediction

Search Result 61, Processing Time 0.024 seconds

Dam Sensor Outlier Detection using Mixed Prediction Model and Supervised Learning

  • Park, Chang-Mok
    • International journal of advanced smart convergence
    • /
    • v.7 no.1
    • /
    • pp.24-32
    • /
    • 2018
  • An outlier detection method using mixed prediction model has been described in this paper. The mixed prediction model consists of time-series model and regression model. The parameter estimation of the prediction model was performed using supervised learning and a genetic algorithm is adopted for a learning method. The experiments were performed in artificial and real data set. The prediction performance is compared with the existing prediction methods using artificial data. Outlier detection is conducted using the real sensor measurements in a dam. The validity of the proposed method was shown in the experiments.

A Binary Prediction Method for Outlier Detection using One-class SVM and Spectral Clustering in High Dimensional Data (고차원 데이터에서 One-class SVM과 Spectral Clustering을 이용한 이진 예측 이상치 탐지 방법)

  • Park, Cheong Hee
    • Journal of Korea Multimedia Society
    • /
    • v.25 no.6
    • /
    • pp.886-893
    • /
    • 2022
  • Outlier detection refers to the task of detecting data that deviate significantly from the normal data distribution. Most outlier detection methods compute an outlier score which indicates the degree to which a data sample deviates from normal. However, setting a threshold for an outlier score to determine if a data sample is outlier or normal is not trivial. In this paper, we propose a binary prediction method for outlier detection based on spectral clustering and one-class SVM ensemble. Given training data consisting of normal data samples, a clustering method is performed to find clusters in the training data, and the ensemble of one-class SVM models trained on each cluster finds the boundaries of the normal data. We show how to obtain a threshold for transforming outlier scores computed from the ensemble of one-class SVM models into binary predictive values. Experimental results with high dimensional text data show that the proposed method can be effectively applied to high dimensional data, especially when the normal training data consists of different shapes and densities of clusters.

Modeling of Strength of High Performance Concrete with Artificial Neural Network and Mahalanobis Distance Outlier Detection Method (신경망 이론과 Mahalanobis Distance 이상치 탐색방법을 이용한 고강도 콘크리트 강도 예측 모델 개발에 관한 연구)

  • Hong, Jung-Eui
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.33 no.4
    • /
    • pp.122-129
    • /
    • 2010
  • High-performance concrete (HPC) is a new terminology used in concrete construction industry. Several studies have shown that concrete strength development is determined not only by the water-to-cement ratio but also influenced by the content of other concrete ingredients. HPC is a highly complex material, which makes modeling its behavior a very difficult task. This paper aimed at demonstrating the possibilities of adapting artificial neural network (ANN) to predict the comprresive strength of HPC. Mahalanobis Distance (MD) outlier detection method used for the purpose increase prediction ability of ANN. The detailed procedure of calculating Mahalanobis Distance (MD) is described. The effects of outlier compared with before and after artificial neural network training. MD outlier detection method successfully removed existence of outlier and improved the neural network training and prediction performance.

Machine learning application to seismic site classification prediction model using Horizontal-to-Vertical Spectral Ratio (HVSR) of strong-ground motions

  • Francis G. Phi;Bumsu Cho;Jungeun Kim;Hyungik Cho;Yun Wook Choo;Dookie Kim;Inhi Kim
    • Geomechanics and Engineering
    • /
    • v.37 no.6
    • /
    • pp.539-554
    • /
    • 2024
  • This study explores development of prediction model for seismic site classification through the integration of machine learning techniques with horizontal-to-vertical spectral ratio (HVSR) methodologies. To improve model accuracy, the research employs outlier detection methods and, synthetic minority over-sampling technique (SMOTE) for data balance, and evaluates using seven machine learning models using seismic data from KiK-net. Notably, light gradient boosting method (LGBM), gradient boosting, and decision tree models exhibit improved performance when coupled with SMOTE, while Multiple linear regression (MLR) and Support vector machine (SVM) models show reduced efficacy. Outlier detection techniques significantly enhance accuracy, particularly for LGBM, gradient boosting, and voting boosting. The ensemble of LGBM with the isolation forest and SMOTE achieves the highest accuracy of 0.91, with LGBM and local outlier factor yielding the highest F1-score of 0.79. Consistently outperforming other models, LGBM proves most efficient for seismic site classification when supported by appropriate preprocessing procedures. These findings show the significance of outlier detection and data balancing for precise seismic soil classification prediction, offering insights and highlighting the potential of machine learning in optimizing site classification accuracy.

The Filtering Method to Reduce Corner Outlier Artifacts in HEVC (Corner Outlier Artifacts를 감소시키기 위한 HEVC 필터링 방법)

  • Ko, Kyung-hwan
    • Journal of Broadcast Engineering
    • /
    • v.22 no.3
    • /
    • pp.313-320
    • /
    • 2017
  • The In-loop filtering methods such as de-blocking filter and SAO(Sample Adaptive Offset) applied to the HEVC standard achieves coding efficiency and subjective quality improvement by reducing the blocking artifacts and the ringing artifacts. However, despite the use of In-loop filtering methods, the artifacts called a corner outlier occurring at the corner points of block boundaries are not removed. In this paper, the corner outlier artifacts are reduced by the detection, determination, and filtering processes on the corner outlier pixels. Experimental results show that the proposed method improves the subjective picture quality and slightly increases the coding efficiency in Inter prediction.

A Note on Bayesian Prediction Analysis for the Rayleigh Model in the presence of Outliers

  • Ko, Jeong-Hwan;Kim, Yeung-Hoon
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 2003.05a
    • /
    • pp.171-176
    • /
    • 2003
  • This paper deals with the problem of predicting order statistics in samples from a Rayleigh population when an outlier is present. Bayesian predictive distribution and prediction bounds of the p-th order statistics is obtained where an outlier of type $\theta\delta$ is present. In this connection, some identies are derived.

  • PDF

Big Data Smoothing and Outlier Removal for Patent Big Data Analysis

  • Choi, JunHyeog;Jun, Sunghae
    • Journal of the Korea Society of Computer and Information
    • /
    • v.21 no.8
    • /
    • pp.77-84
    • /
    • 2016
  • In general statistical analysis, we need to make a normal assumption. If this assumption is not satisfied, we cannot expect a good result of statistical data analysis. Most of statistical methods processing the outlier and noise also need to the assumption. But the assumption is not satisfied in big data because of its large volume and heterogeneity. So we propose a methodology based on box-plot and data smoothing for controling outlier and noise in big data analysis. The proposed methodology is not dependent upon the normal assumption. In addition, we select patent documents as target domain of big data because patent big data analysis is a important issue in management of technology. We analyze patent documents using big data learning methods for technology analysis. The collected patent data from patent databases on the world are preprocessed and analyzed by text mining and statistics. But the most researches about patent big data analysis did not consider the outlier and noise problem. This problem decreases the accuracy of prediction and increases the variance of parameter estimation. In this paper, we check the existence of the outlier and noise in patent big data. To know whether the outlier is or not in the patent big data, we use box-plot and smoothing visualization. We use the patent documents related to three dimensional printing technology to illustrate how the proposed methodology can be used for finding the existence of noise in the searched patent big data.

Evolutionary Computing Driven Extreme Learning Machine for Objected Oriented Software Aging Prediction

  • Ahamad, Shahanawaj
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.2
    • /
    • pp.232-240
    • /
    • 2022
  • To fulfill user expectations, the rapid evolution of software techniques and approaches has necessitated reliable and flawless software operations. Aging prediction in the software under operation is becoming a basic and unavoidable requirement for ensuring the systems' availability, reliability, and operations. In this paper, an improved evolutionary computing-driven extreme learning scheme (ECD-ELM) has been suggested for object-oriented software aging prediction. To perform aging prediction, we employed a variety of metrics, including program size, McCube complexity metrics, Halstead metrics, runtime failure event metrics, and some unique aging-related metrics (ARM). In our suggested paradigm, extracting OOP software metrics is done after pre-processing, which includes outlier detection and normalization. This technique improved our proposed system's ability to deal with instances with unbalanced biases and metrics. Further, different dimensional reduction and feature selection algorithms such as principal component analysis (PCA), linear discriminant analysis (LDA), and T-Test analysis have been applied. We have suggested a single hidden layer multi-feed forward neural network (SL-MFNN) based ELM, where an adaptive genetic algorithm (AGA) has been applied to estimate the weight and bias parameters for ELM learning. Unlike the traditional neural networks model, the implementation of GA-based ELM with LDA feature selection has outperformed other aging prediction approaches in terms of prediction accuracy, precision, recall, and F-measure. The results affirm that the implementation of outlier detection, normalization of imbalanced metrics, LDA-based feature selection, and GA-based ELM can be the reliable solution for object-oriented software aging prediction.

Outlier prediction in sensor network data using periodic pattern (주기 패턴을 이용한 센서 네트워크 데이터의 이상치 예측)

  • Kim, Hyung-Il
    • Journal of Sensor Science and Technology
    • /
    • v.15 no.6
    • /
    • pp.433-441
    • /
    • 2006
  • Because of the low power and low rate of a sensor network, outlier is frequently occurred in the time series data of sensor network. In this paper, we suggest periodic pattern analysis that is applied to the time series data of sensor network and predict outlier that exist in the time series data of sensor network. A periodic pattern is minimum period of time in which trend of values in data is appeared continuous and repeated. In this paper, a quantization and smoothing is applied to the time series data in order to analyze the periodic pattern and the fluctuation of each adjacent value in the smoothed data is measured to be modified to a simple data. Then, the periodic pattern is abstracted from the modified simple data, and the time series data is restructured according to the periods to produce periodic pattern data. In the experiment, the machine learning is applied to the periodic pattern data to predict outlier to see the results. The characteristics of analysis of the periodic pattern in this paper is not analyzing the periods according to the size of value of data but to analyze time periods according to the fluctuation of the value of data. Therefore analysis of periodic pattern is robust to outlier. Also it is possible to express values of time attribute as values in time period by restructuring the time series data into periodic pattern. Thus, it is possible to use time attribute even in the general machine learning algorithm in which the time series data is not possible to be learned.

An Outlier Detection Algorithm and Data Integration Technique for Prediction of Hypertension (고혈압 예측을 위한 이상치 탐지 알고리즘 및 데이터 통합 기법)

  • Khongorzul Dashdondov;Mi-Hye Kim;Mi-Hwa Song
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.05a
    • /
    • pp.417-419
    • /
    • 2023
  • Hypertension is one of the leading causes of mortality worldwide. In recent years, the incidence of hypertension has increased dramatically, not only among the elderly but also among young people. In this regard, the use of machine-learning methods to diagnose the causes of hypertension has increased in recent years. In this study, we improved the prediction of hypertension detection using Mahalanobis distance-based multivariate outlier removal using the KNHANES database from the Korean national health data and the COVID-19 dataset from Kaggle. This study was divided into two modules. Initially, the data preprocessing step used merged datasets and decision-tree classifier-based feature selection. The next module applies a predictive analysis step to remove multivariate outliers using the Mahalanobis distance from the experimental dataset and makes a prediction of hypertension. In this study, we compared the accuracy of each classification model. The best results showed that the proposed MAH_RF algorithm had an accuracy of 82.66%. The proposed method can be used not only for hypertension but also for the detection of various diseases such as stroke and cardiovascular disease.