• 제목/요약/키워드: Outlier analysis

검색결과 239건 처리시간 0.034초

Underwater Navigation of AUVs Using Uncorrelated Measurement Error Model of USBL

  • Lee, Pan-Mook;Park, Jin-Yeong;Baek, Hyuk;Kim, Sea-Moon;Jun, Bong-Huan;Kim, Ho-Sung;Lee, Phil-Yeob
    • 한국해양공학회지
    • /
    • 제36권5호
    • /
    • pp.340-352
    • /
    • 2022
  • This article presents a modeling method for the uncorrelated measurement error of the ultra-short baseline (USBL) acoustic positioning system for aiding navigation of underwater vehicles. The Mahalanobis distance (MD) and principal component analysis are applied to decorrelate the errors of USBL measurements, which are correlated in the x- and y-directions and vary according to the relative direction and distance between a reference station and the underwater vehicles. The proposed method can decouple the radial-direction error and angular direction error from each USBL measurement, where the former and latter are independent and dependent, respectively, of the distance between the reference station and the vehicle. With the decorrelation of the USBL errors along the trajectory of the vehicles in every time step, the proposed method can reduce the threshold of the outlier decision level. To demonstrate the effectiveness of the proposed method, simulation studies were performed with motion data obtained from a field experiment involving an autonomous underwater vehicle and USBL signals generated numerically by matching the specifications of a specific USBL with the data of a global positioning system. The simulations indicated that the navigation system is more robust in rejecting outliers of the USBL measurements than conventional ones. In addition, it was shown that the erroneous estimation of the navigation system after a long USBL blackout can converge to the true states using the MD of the USBL measurements. The navigation systems using the uncorrelated error model of the USBL, therefore, can effectively eliminate USBL outliers without loss of uncontaminated signals.

ON THEIL'S METHOD IN FUZZY LINEAR REGRESSION MODELS

  • Choi, Seung Hoe;Jung, Hye-Young;Lee, Woo-Joo;Yoon, Jin Hee
    • 대한수학회논문집
    • /
    • 제31권1호
    • /
    • pp.185-198
    • /
    • 2016
  • Regression analysis is an analyzing method of regression model to explain the statistical relationship between explanatory variable and response variables. This paper propose a fuzzy regression analysis applying Theils method which is not sensitive to outliers. This method use medians of rate of increment based on randomly chosen pairs of each components of ${\alpha}$-level sets of fuzzy data in order to estimate the coefficients of fuzzy regression model. An example and two simulation results are given to show fuzzy Theils estimator is more robust than the fuzzy least squares estimator.

Switching Regression Analysis via Fuzzy LS-SVM

  • Hwang, Chang-Ha
    • Journal of the Korean Data and Information Science Society
    • /
    • 제17권2호
    • /
    • pp.609-617
    • /
    • 2006
  • A new fuzzy c-regression algorithm for switching regression analysis is presented, which combines fuzzy c-means clustering and least squares support vector machine. This algorithm can detect outliers in switching regression models while yielding the simultaneous estimates of the associated parameters together with a fuzzy c-partitions of data. It can be employed for the model-free nonlinear regression which does not assume the underlying form of the regression function. We illustrate the new approach with some numerical examples that show how it can be used to fit switching regression models to almost all types of mixed data.

  • PDF

L-모멘트법에 의한 극치강우의 빈도분석 (Frequency Analysis of Extreme Rainfall by L-Moments)

  • 맹승진;이순혁;김병준
    • 한국농공학회:학술대회논문집
    • /
    • 한국농공학회 2002년도 학술발표회 발표논문집
    • /
    • pp.225-228
    • /
    • 2002
  • This research seeks to derive the design rainfalls through the L-moment with the test of homogeneity, independence and outlier of data on annual maximum daily rainfall in 38 Korean rainfall stations. To select the fit appropriate distribution of annual maximum daily rainfall data according to rainfall stations, applied were Generalized Extreme Value (GEV), Generalized Logistic (GLO) and Generalized Pareto (GPA) probability distributions were applied. and their aptness was judged Dusing an L-moment ratio diagram and the Kolmogorov-Smirnov (K-S) test, the aptitude was judged of applied distributions such as GEV, GLO and GPA. The GEV and GLO distributions were selected as the appropriate distributions. Their parameters were estimated Targetingfrom the observed and simulated annual maximum daily rainfalls and using Monte Carlo techniques, the parameters of GEV and GLO selected as suitable distributions were estimated and. dDesign rainfallss were then derived, using the L-moment. Appropriate design rainfalls were suggested by doing a comparative analysis of design rainfall from the GEV and GLO distributions according to rainfall stations.

  • PDF

K-means 알고리즘 기반 클러스터링 인덱스 비교 연구 (A Performance Comparison of Cluster Validity Indices based on K-means Algorithm)

  • 심요성;정지원;최인찬
    • Asia pacific journal of information systems
    • /
    • 제16권1호
    • /
    • pp.127-144
    • /
    • 2006
  • The K-means algorithm is widely used at the initial stage of data analysis in data mining process, partly because of its low time complexity and the simplicity of practical implementation. Cluster validity indices are used along with the algorithm in order to determine the number of clusters as well as the clustering results of datasets. In this paper, we present a performance comparison of sixteen indices, which are selected from forty indices in literature, while considering their applicability to nonhierarchical clustering algorithms. Data sets used in the experiment are generated based on multivariate normal distribution. In particular, four error types including standardization, outlier generation, error perturbation, and noise dimension addition are considered in the comparison. Through the experiment the effects of varying number of points, attributes, and clusters on the performance are analyzed. The result of the simulation experiment shows that Calinski and Harabasz index performs the best through the all datasets and that Davis and Bouldin index becomes a strong competitor as the number of points increases in dataset.

Speed-up of the Matrix Computation on the Ridge Regression

  • Lee, Woochan;Kim, Moonseong;Park, Jaeyoung
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제15권10호
    • /
    • pp.3482-3497
    • /
    • 2021
  • Artificial intelligence has emerged as the core of the 4th industrial revolution, and large amounts of data processing, such as big data technology and rapid data analysis, are inevitable. The most fundamental and universal data interpretation technique is an analysis of information through regression, which is also the basis of machine learning. Ridge regression is a technique of regression that decreases sensitivity to unique or outlier information. The time-consuming calculation portion of the matrix computation, however, basically includes the introduction of an inverse matrix. As the size of the matrix expands, the matrix solution method becomes a major challenge. In this paper, a new algorithm is introduced to enhance the speed of ridge regression estimator calculation through series expansion and computation recycle without adopting an inverse matrix in the calculation process or other factorization methods. In addition, the performances of the proposed algorithm and the existing algorithm were compared according to the matrix size. Overall, excellent speed-up of the proposed algorithm with good accuracy was demonstrated.

가공송전 전선 자산데이터의 정제 자동화 알고리즘 개발 연구 (Automatic Algorithm for Cleaning Asset Data of Overhead Transmission Line)

  • Mun, Sung-Duk;Kim, Tae-Joon;Kim, Kang-Sik;Hwang, Jae-Sang
    • KEPCO Journal on Electric Power and Energy
    • /
    • 제7권1호
    • /
    • pp.73-77
    • /
    • 2021
  • As the big data analysis technologies has been developed worldwide, the importance of asset management for electric power facilities based data analysis is increasing. It is essential to secure quality of data that will determine the performance of the RISK evaluation algorithm for asset management. To improve reliability of asset management, asset data must be preprocessed. In particular, the process of cleaning dirty data is required, and it is also urgent to develop an algorithm to reduce time and improve accuracy for data treatment. In this paper, the result of the development of an automatic cleaning algorithm specialized in overhead transmission asset data is presented. A data cleaning algorithm was developed to enable data clean by analyzing quality and overall pattern of raw data.

전력데이터 분석에서 이상점 추출을 위한 데이터 클러스터링 아키텍처에 관한 연구 (A Novel of Data Clustering Architecture for Outlier Detection to Electric Power Data Analysis)

  • 정세훈;신창선;조용윤;박장우;박명혜;김영현;이승배;심춘보
    • 정보처리학회논문지:소프트웨어 및 데이터공학
    • /
    • 제6권10호
    • /
    • pp.465-472
    • /
    • 2017
  • 과거에는 전력데이터를 분석하는 기법으로 주로 기계학습의 지도학습 기법을 많이 활용하였고 데이터 마이닝 기법을 통한 패턴 검출을 주로 연구하였다. 그러나 전력데이터의 규모 커지고 실시간 데이터 공급이 가능해진 현재에는 과거의 데이터 분류 및 분석 기법을 통한 데이터 분석 연구는 한계가 존재한다. 이에 본 논문에서는 큰 규모의 전력데이터를 분석할 수 있는 클러스터링 아키텍처를 제안한다. 제안하는 클러스터링 프로세스는 비지도학습기법인 K-means 알고리즘의 문제점을 보완하고 전력데이터 수집과 분석까지의 모든 과정을 자동화할 수 있는 프로세스이다. 총 3 Level로 구분하여 Row Data Level, Clustering Level, User Interface Level로 구분하여 전력데이터를 분류 및 분석한다. 또한 클러스터링의 효율성 향상을 위하여 주성분분석 및 정규분포기반의 최적의 클러스터 수 K값 추출과 이상점으로 분류되는 데이터 감소를 위한 변형된 K-means 알고리즘을 제시한다.

RAM 분석 정확도 향상을 위한 야전운용 데이터의 이상값과 결측값 처리 방안 (Method of Processing the Outliers and Missing Values of Field Data to Improve RAM Analysis Accuracy)

  • 김인석;정원
    • 한국신뢰성학회지:신뢰성응용연구
    • /
    • 제17권3호
    • /
    • pp.264-271
    • /
    • 2017
  • Purpose: Field operation data contains missing values or outliers due to various causes of the data collection process, so caution is required when utilizing RAM analysis results by field operation data. The purpose of this study is to present a method to minimize the RAM analysis error of the field data to improve the accuracy. Methods: Statistical methods are presented for processing of the outliers and the missing values of the field operating data, and after analyzing the RAM, the differences between before and after applying the technique are discussed. Results: The availability is estimated to be lower by 6.8 to 23.5% than that before processing, and it is judged that the processing of the missing values and outliers greatly affect the RAM analysis result. Conclusion: RAM analysis of OO weapon system was performed and suggestions for improvement of RAM analysis were presented through comparison with the new and current method. Data analysis results without appropriate treatment of error values may result in incorrect conclusions leading to inappropriate decisions and actions.

A Big Data-Driven Business Data Analysis System: Applications of Artificial Intelligence Techniques in Problem Solving

  • Donggeun Kim;Sangjin Kim;Juyong Ko;Jai Woo Lee
    • 한국빅데이터학회지
    • /
    • 제8권1호
    • /
    • pp.35-47
    • /
    • 2023
  • It is crucial to develop effective and efficient big data analytics methods for problem-solving in the field of business in order to improve the performance of data analytics and reduce costs and risks in the analysis of customer data. In this study, a big data-driven data analysis system using artificial intelligence techniques is designed to increase the accuracy of big data analytics along with the rapid growth of the field of data science. We present a key direction for big data analysis systems through missing value imputation, outlier detection, feature extraction, utilization of explainable artificial intelligence techniques, and exploratory data analysis. Our objective is not only to develop big data analysis techniques with complex structures of business data but also to bridge the gap between the theoretical ideas in artificial intelligence methods and the analysis of real-world data in the field of business.