• 제목/요약/키워드: Outlier analysis

검색결과 238건 처리시간 0.024초

대규모 궤적 데이타를 위한 데이타 마이닝 툴 (A Data Mining Tool for Massive Trajectory Data)

  • 이재길
    • 한국정보과학회논문지:컴퓨팅의 실제 및 레터
    • /
    • 제15권3호
    • /
    • pp.145-153
    • /
    • 2009
  • 궤적(trajectory) 데이타는 실세계 어디에서든지 쉽게 찾아볼 수 있다. 최근 들어, 위성, 센서, RFID, 비디오 및 무선 통신 기술의 발전으로 말미암아 이동 객체를 체계적으로 추적하고, 많은 양의 궤적데이타를 수집할 수 있게 되었다. 이에 따라, 궤적 데이타의 분석에 대한 필요성이 점차 증대되고 있다. 본 논문에서는 대규모 궤적 데이타를 위한 마이닝 툴을 개발한다. 본 마이닝 툴에서는 가장 널리 사용되는 마이닝 연산인 집단화(clustering), 분류(classification), 이상치 발견(outlier detection)을 제공한다. 궤적 집단화는 공통적인 이동 패턴을 발견하며, 궤적 분류는 궤적에 기반하여 이동 객체의 범주를 예측하며, 궤적 이상치 발견은 나머지 궤적들과 크게 다르거나 일관적이지 않은 궤적을 발견한다. 본 마이닝 툴의 가장 큰 장점은 데이타 마이닝 도중에 부분 궤적 정보를 활용한다는 점이다. 본 마이닝 툴의 우수성은 다양한 실제 궤적 데이타 셋을 사용하여 입증되었다. 본 논문의 결과로 궤적 데이타 마이닝을 위한 실용적인 소프트웨어를 개발하였고 많은 실제 응용에 적용될 수 있을 것이라 사료된다.

한국형 기동무기체계 양산비 비용추정관계식 개발에 관한 연구 (A Study on Developing a CER Using Production Cost Data in Korean Maneuver Weapon System)

  • 이두현;김각규
    • 한국경영과학회지
    • /
    • 제39권3호
    • /
    • pp.51-61
    • /
    • 2014
  • In this paper, we deal with developing a cost estimation relationships (CER) for Korean maneuverable weapons systems using historical production cost. To develop the CER, we collected the historical data of the production cost of four tanks and five armored vehicles. We also analyzed the Required Operational Capability (ROC) of the weapons systems and chose cost drivers that can compare operational capabilities of the weapons systems We used Forward selection, Backward selection, Stepwise Regression and $R^2$ selection as the cost drivers which have the greatest influence with the dependent variables. And we used Principle Component Regression, Robust Regression and Weighted Regression to deal with multicollinearity and outlier among the data to develop a more appropriate CER. As a result, we were able to develop a production cost CER for Korean maneuverable weapons systems that have the lowest cost errors. Thus, this research is meaningful in terms of developing a CER based on Korean original cost data without foreign data and these methods will contribute to developing a Korean cost analysis program in the future.

통계적 지반 공간 정보 기법을 이용한 지층구조 분석 (Application of Statistical Geo-Spatial Information Technology to Soil Stratification)

  • 김한샘;김현기;신시열;정충기
    • 한국지반공학회논문집
    • /
    • 제27권7호
    • /
    • pp.59-68
    • /
    • 2011
  • 지반조사 결과 자료는 지반의 불균질성과 낮은 신뢰성을 갖는 일부 자료로 인해 불확실성을 갖게 된다. 이에 따라 지반조사 결과 자료를 활용하여 지반 특성을 해석할 경우 합리적인 공학적 판단을 위해 적절한 통계분석이 요구된다. 본 연구에서는 이상치 분석기법과 교차검증기법을 접목한 통계적 지반 공간 정보 분석 기법을 이용하여 대상지역 지반조사 자료 중 경향성을 상회하는 지반조사 지점을 선별하는 전문가시스템을 개발하였다. 개발 시스템을 이용하여 서울시 여의도 지역의 시추조사 자료를 바탕으로 지층구조 분석을 수행하였다. 그 결과 신뢰도가 낮은 것으로 판단되는 시추조사 자료를 결정하고, 본 지점을 제외한 여의도 지역의 기반암 섬도 분포를 확인하였다.

Improving the Quality of Response Surface Analysis of an Experiment for Coffee-Supplemented Milk Beverage: I. Data Screening at the Center Point and Maximum Possible R-Square

  • Rheem, Sungsue;Oh, Sejong
    • 한국축산식품학회지
    • /
    • 제39권1호
    • /
    • pp.114-120
    • /
    • 2019
  • Response surface methodology (RSM) is a useful set of statistical techniques for modeling and optimizing responses in research studies of food science. As a design for a response surface experiment, a central composite design (CCD) with multiple runs at the center point is frequently used. However, sometimes there exist situations where some among the responses at the center point are outliers and these outliers are overlooked. Since the responses from center runs are those from the same experimental conditions, there should be no outliers at the center point. Outliers at the center point ruin statistical analysis. Thus, the responses at the center point need to be looked at, and if outliers are observed, they have to be examined. If the reasons for the outliers are not errors in measuring or typing, such outliers need to be deleted. If the outliers are due to such errors, they have to be corrected. Through a re-analysis of a dataset published in the Korean Journal for Food Science of Animal Resources, we have shown that outlier elimination resulted in the increase of the maximum possible R-square that the modeling of the data can obtain, which enables us to improve the quality of response surface analysis.

상수관망의 누수감지를 위한 주성분 분석의 적용 가능성에 대한 연구 (Study on the applicability of the principal component analysis for detecting leaks in water pipe networks)

  • 김기민;박수완
    • 상하수도학회지
    • /
    • 제33권2호
    • /
    • pp.159-167
    • /
    • 2019
  • In this paper the potential of the principal component analysis(PCA) technique for the application of detecting leaks in water pipe networks was evaluated. For this purpose the PCA was conducted to evaluate the relevance of the calculated outliers of a PCA model utilizing the recorded pipe flows and the recorded pipe leak incidents of a case study water distribution system. The PCA technique was enhanced by applying the computational algorithms developed in this study which were designed to extract a partial set of flow data from the original 24 hour flow data so that the effective outlier detection rate was maximized. The relevance of the calculated outliers of a PCA model and the recorded pipe leak incidents was analyzed. The developed algorithm may be applied in determining further leak detection field work for water distribution blocks that have more than 70% of the effective outlier detection rate. However, the analysis suggested that further development on the algorithm is needed to enhance the applicability of the PCA in detecting leaks by considering series of leak reports happening in a relatively short period.

분식 적발을 위한 재무이상치 분석시스템 개발 (Development of the Financial Account Pre-screening System for Corporate Credit Evaluation)

  • 노태협
    • 한국정보시스템학회지:정보시스템연구
    • /
    • 제18권4호
    • /
    • pp.41-57
    • /
    • 2009
  • Although financial information is a great influence upon determining of the group which use them, detection of management fraud and earning manipulation is a difficult task using normal audit procedures and corporate credit evaluation processes, due to the shortage of knowledge concerning the characteristics of management fraud, and the limitation of time and cost. These limitations suggest the need of systemic process for !he effective risk of earning manipulation for credit evaluators, external auditors, financial analysts, and regulators. Moot researches on management fraud have examined how various characteristics of the company's management features affect the occurrence of corporate fraud. This study examines financial characteristics of companies engaged in fraudulent financial reporting and suggests a model and system for detecting GAAP violations to improve reliability of accounting information and transparency of their management. Since the detection of management fraud has limited proven theory, this study used the detecting method of outlier(upper, and lower bound) financial ratio, as a real-field application. The strength of outlier detecting method is its use of easiness and understandability. In the suggested model, 14 variables of the 7 useful variable categories among the 76 financial ratio variables are examined through the distribution analysis as possible indicators of fraudulent financial statements accounts. The developed model from these variables show a 80.82% of hit ratio for the holdout sample. This model was developed as a financial outlier detecting system for a financial institution. External auditors, financial analysts, regulators, and other users of financial statements might use this model to pre-screen potential earnings manipulators in the credit evaluation system. Especially, this model will be helpful for the loan evaluators of financial institutes to decide more objective and effective credit ratings and to improve the quality of financial statements.

헤드램프용 필라멘트 램프 가속열화데이터 분석을 통한 로버스트 열화모형 연구 (A Study of the Roust Degradation Model by Analyzing the Filament Lamp Degradation Data)

  • 성기우
    • 한국자동차공학회논문집
    • /
    • 제20권6호
    • /
    • pp.132-139
    • /
    • 2012
  • It is generally needed to test durability and lifetime when we develop parts in new technology. In this paper, the accelerated degradation analysis methods are developed to test them. This study is presented robust model estimation method that is less affected by outlier in regresstion model estimation. In addition, the lifetime can be predicted by Degradation-stress relationship in stress level.

3변수 확률분포형에 의한 극치강우의 빈도분석 (Frequency Analysis of Extreme Rainfall Using 3 Parameter Probability Distributions)

  • 김병준;맹승진;류경식;이순혁
    • 한국농공학회논문집
    • /
    • 제46권3호
    • /
    • pp.31-42
    • /
    • 2004
  • This research seeks to derive the design rainfalls through the L-moment with the test of homogeneity, independence and outlier of data on annual maximum daily rainfall at 38 rainfall stations in Korea. To select the appropriate distribution of annual maximum daily rainfall data by the rainfall stations, Generalized Extreme Value (GEV), Generalized Logistic (GLO), Generalized Pareto (GPA), Generalized Normal (GNO) and Pearson Type 3 (PT3) probability distributions were applied and their aptness were judged using an L-moment ratio diagram and the Kolmogorov-Smirnov (K-S) test. Parameters of appropriate distributions were estimated from the observed and simulated annual maximum daily rainfall using Monte Carlo techniques. Design rainfalls were finally derived by GEV distribution, which was proved to be more appropriate than the other distributions.

각도 기반 이상치 탐지 방법의 분석과 성능 개선 (Analysis and Performance enhancement of angle-based outlier detection)

  • 신용준;박정희
    • 한국정보과학회:학술대회논문집
    • /
    • 한국정보과학회 2010년도 한국컴퓨터종합학술대회논문집 Vol.37 No.1(C)
    • /
    • pp.452-457
    • /
    • 2010
  • 고차원 공간에서 효과적인 이상치 탐지 방법으로 제안되었던 각도 기반 이상치 탐지(Angle Based Outlier Detection)는 객체와 객체를 비교하는 척도로 각도 개념을 사용하여 고차원 공간에서도 일반적인 거리기반 이상치 측정 방법보다 좋은 이상치 탐지 성능을 가진다. 그러나 어떤 이상치가 다른 이상치에 의해 둘러싸인 경우 정상객체와 구분하기 어렵다는 문제가 있다. 이 논문에서는 기존의 이상치 탐지 방법을 개선한 방법을 제안하고 실험을 통하여 기존의 방법과 제안한 새로운 방법을 비교하여 향상된 성능을 입증한다.

  • PDF

지중 송전케이블 자산데이터의 자동 정제 알고리즘 개발연구 (Automatic Cleaning Algorithm of Asset Data for Transmission Cable)

  • Hwang, Jae-Sang;Mun, Sung-Duk;Kim, Tae-Joon;Kim, Kang-Sik
    • KEPCO Journal on Electric Power and Energy
    • /
    • 제7권1호
    • /
    • pp.79-84
    • /
    • 2021
  • The fundamental element to be kept for big data analysis, artificial intelligence technologies and asset management system is a data quality, which could directly affect the entire system reliability. For this reason, the momentum of data cleaning works is recently increased and data cleaning methods have been investigating around the world. In the field of electric power, however, asset data cleaning methods have not been fully determined therefore, automatic cleaning algorithm of asset data for transmission cables has been studied in this paper. Cleaning algorithm is composed of missing data treatment and outlier data one. Rule-based and expert opinion based cleaning methods are converged and utilized for these dirty data.