• Title/Summary/Keyword: Outlier Analysis

Search Result 234, Processing Time 0.021 seconds

Statistical Outliers in Florida Counties at the Presidential Election 2000 (2000년 미국대선 플로리다주의 투표결과 분석)

  • 김현철
    • The Korean Journal of Applied Statistics
    • /
    • v.15 no.1
    • /
    • pp.21-32
    • /
    • 2002
  • We searched out in the votes data of the State of Florida at presidential election 2000. We used a multivariate regression analysis. We got there were several outliers including Palm Beach County. It means that we should analyze the number of disqualified ballots which were double-punched as well as the votes, to insist the " Butterfly Ballot" made Palm Beach outlier.

Minimum Hellinger Distance Bsed Goodness-of-fit Tests in Normal Models: Empirical Approach

  • Dong Bin Jeong
    • Communications for Statistical Applications and Methods
    • /
    • v.6 no.3
    • /
    • pp.967-976
    • /
    • 1999
  • In this paper we study the Hellinger distance based goodness-of-fit tests that are analogs of likelihood ratio tests. The minimum Hellinger distance estimator (MHDE) in normal models provides an excellent robust alternative to the usual maximum likelihood estimator. Our simulation results show that the Hellinger deviance test (Simpson 1989) based goodness-of-fit test is robust when data contain outliers. The proposed hellinger deviance test(Simpson 1989) is a more direcct method for obtaining robust inferences than an automated outlier screen method used before the likelihood ratio test data analysis.

  • PDF

Diagnosis of Observations after Fit of Multivariate Skew t-Distribution: Identification of Outliers and Edge Observations from Asymmetric Data

  • Kim, Seung-Gu
    • The Korean Journal of Applied Statistics
    • /
    • v.25 no.6
    • /
    • pp.1019-1026
    • /
    • 2012
  • This paper presents a method for the identification of "edge observations" located on a boundary area constructed by a truncation variable as well as for the identification of outliers and the after fit of multivariate skew $t$-distribution(MST) to asymmetric data. The detection of edge observation is important in data analysis because it provides information on a certain critical area in observation space. The proposed method is applied to an Australian Institute of Sport(AIS) dataset that is well known for asymmetry in data space.

Analysis of Abnormal Values Obtained from National Groundwater Monitoring Stations (국가지하수 관측소 측정자료의 이상값 분석)

  • Yi Myeong-Jae;Lee Jin-Yong;Kim Gyoo-Bum;Won Jong-Ho
    • Journal of Soil and Groundwater Environment
    • /
    • v.10 no.1
    • /
    • pp.65-74
    • /
    • 2005
  • National groundwater monitoring stations have been managed throughout the country by Korea Water Resources Corporation (KOWACO) in order to monitor variations in quantity and quality of groundwater resources. A multi-sensor installed in each monitoring station well measures groundwater level, water temperature and electrical conductivity every six hours and the logged data are automatically transmitted to a host computer in KOWACO. Meanwhile despite regular station inspection and replacement of deteriorate or broken devices, abnormal values or outliers often occur due to intrinsic limitations of automatic monitoring and transmission. Thus prompt recognition and measures to these values are essentially required to reduce disturbance and missing period of the data. In this study, time and frequency of outlier occurrence were analyzed for the water level data obtained from national groundwater monitoring stations within the Han river basin in 2000. The analysis results indicated that the most prominent patterns of the outliers were rapid decline for water level, no variation for temperature and steep decline for electrical conductivity. This study provided a sample criterion for determining the outlier for each parameter.

Outlier Detection Techniques for Biased Opinion Discovery (편향된 의견 문서 검출을 위한 이상치 탐지 기법)

  • Yeon, Jongheum;Shim, Junho;Lee, Sanggoo
    • The Journal of Society for e-Business Studies
    • /
    • v.18 no.4
    • /
    • pp.315-326
    • /
    • 2013
  • Users in social media post various types of opinions such as product reviews and movie reviews. It is a common trend that customers get assistance from the opinions in making their decisions. However, as opinion usage grows, distorted feedbacks also have increased. For example, exaggerated positive opinions are posted for promoting target products. So are negative opinions which are far from common evaluations. Finding these biased opinions becomes important to keep social media reliable. Techniques of opinion mining (or sentiment analysis) have been developed to determine sentiment polarity of opinionated documents. These techniques can be utilized for finding the biased opinions. However, the previous techniques have some drawback. They categorize the text into only positive and negative, and they also need a large amount of training data to build the classifier. In this paper, we propose methods for discovering the biased opinions which are skewed from the overall common opinions. The methods are based on angle based outlier detection and personalized PageRank, which can be applied without training data. We analyze the performance of the proposed techniques by presenting experimental results on a movie review dataset.

Genetic Variation of Pinus densiflora Populations in South Korea Based on ESTP Markers (ESTP 표지를 이용한 국내 소나무 집단의 유전변이)

  • Ahn, Ji Young;Hong, Kyung Nak;Lee, Jei Wan;Hong, Yong Pyo;Kang, Hoduck
    • Korean Journal of Plant Resources
    • /
    • v.28 no.2
    • /
    • pp.279-289
    • /
    • 2015
  • Genetic diversity and genetic differentiation of thirteen Pinus densiflora populations in South Korea were estimated using nine ESTP (Expressed Sequence Tag Polymorphism) markers. The numbers of allele and the effective allele were 2.2 and 1.8, respectively. The percentage of polymorphic loci (P) was 98.8%. The observed and the expected heterozygosity were 0.391 and 0.402, respectively, and the eleven populations except for Ahngang and Gangneung population were under Hardy-Weinberg equilibrium state. The level of genetic differentiation (Wright’s FST = 0.057) was higher than those of isozyme or nSSR markers. We could not find out any relationship between the genetic distance and geographic distribution among populations from cluster analysis. Also, the genetic differentiation between populations was not correlated with the geographic distance (r = 0.017 and P = 0.344 from Mantel test). From the result of FST-outlier analysis to identify a locus under selection, six loci were detected at confidence interval of 99% by the frequentist’s method. However, only three loci (sams2+AluⅠ, sams2+RsaⅠ, PtNCS_p14A9+HaeⅢ) were presumed as outliers by Bayesian method. The sams2+AluⅠ and sams2+RsaⅠlocus were originated from the sams2 gene and seemed to be the loci under balancing selection.

Derivation and Application of In uence Function in Discriminant Analysis for Three Groups (세 집단 판별분석 상황에서의 영향함수 유도 및 그 응용)

  • Lee, Hae-Jung;Kim, Hong-Gie
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.5
    • /
    • pp.941-949
    • /
    • 2011
  • The influence function is used to develop criteria to detect outliers in discriminant analysis. We derive the influence function of observations that estimate the the misclassification probability in discriminant analysis for three groups. The proposed measures are applied to the facial image data to define outliers and redo the discriminant analysis excluding the outliers. The study proves that the derived influence function is more efficient than using the discriminant probability approach.

Performance Evaluation of Denoising Algorithms for the 3D Construction Digital Map (건설현장 적용을 위한 디지털맵 노이즈 제거 알고리즘 성능평가)

  • Park, Su-Yeul;Kim, Seok
    • Journal of KIBIM
    • /
    • v.10 no.4
    • /
    • pp.32-39
    • /
    • 2020
  • In recent years, the construction industry is getting bigger and more complex, so it is becoming difficult to acquire point cloud data for construction equipments and workers. Point cloud data is measured using a drone and MMS(Mobile Mapping System), and the collected point cloud data is used to create a 3D digital map. In particular, the construction site is located at outdoors and there are many irregular terrains, making it difficult to collect point cloud data. For these reasons, adopting a noise reduction algorithm suitable for the characteristics of the construction industry can affect the improvement of the analysis accuracy of digital maps. This is related to various environments and variables of the construction site. Therefore, this study reviewed and analyzed the existing research and techniques on the noise reduction algorithm. And based on the results of literature review, performance evaluation of major noise reduction algorithms was conducted for digital maps of construction sites. As a result of the performance evaluation in this study, the voxel grid algorithm showed relatively less execution time than the statistical outlier removal algorithm. In addition, analysis results in slope, space, and earth walls of the construction site digital map showed that the voxel grid algorithm was relatively superior to the statistical outlier removal algorithm and that the noise removal performance of voxel grid algorithm was superior and the object preservation ability was also superior. In the future, based on the results reviewed through the performance evaluation of the noise reduction algorithm of this study, we will develop a noise reduction algorithm for 3D point cloud data that reflects the characteristics of the construction site.

Damaged cable detection with statistical analysis, clustering, and deep learning models

  • Son, Hyesook;Yoon, Chanyoung;Kim, Yejin;Jang, Yun;Tran, Linh Viet;Kim, Seung-Eock;Kim, Dong Joo;Park, Jongwoong
    • Smart Structures and Systems
    • /
    • v.29 no.1
    • /
    • pp.17-28
    • /
    • 2022
  • The cable component of cable-stayed bridges is gradually impacted by weather conditions, vehicle loads, and material corrosion. The stayed cable is a critical load-carrying part that closely affects the operational stability of a cable-stayed bridge. Damaged cables might lead to the bridge collapse due to their tension capacity reduction. Thus, it is necessary to develop structural health monitoring (SHM) techniques that accurately identify damaged cables. In this work, a combinational identification method of three efficient techniques, including statistical analysis, clustering, and neural network models, is proposed to detect the damaged cable in a cable-stayed bridge. The measured dataset from the bridge was initially preprocessed to remove the outlier channels. Then, the theory and application of each technique for damage detection were introduced. In general, the statistical approach extracts the parameters representing the damage within time series, and the clustering approach identifies the outliers from the data signals as damaged members, while the deep learning approach uses the nonlinear data dependencies in SHM for the training model. The performance of these approaches in classifying the damaged cable was assessed, and the combinational identification method was obtained using the voting ensemble. Finally, the combination method was compared with an existing outlier detection algorithm, support vector machines (SVM). The results demonstrate that the proposed method is robust and provides higher accuracy for the damaged cable detection in the cable-stayed bridge.

Development of a Observational Settlement Analysis Method Using Outliers (이상치를 이용한 관측적 침하예측기법의 개발)

  • 우철웅;장병욱
    • Magazine of the Korean Society of Agricultural Engineers
    • /
    • v.45 no.5
    • /
    • pp.140-150
    • /
    • 2003
  • Observational methods such as the Asaoka's method and the hyperbolic method are widely applied on the settlement analysis using observed settlement. The most unreliable aspects in those methods is arose from the subjective discretion of initial non-linearity on linear regression. The initial non-linearity is inevitable due to the settlement behaviour itself. Therefore an objective method is essential to achieve more reliable results on settlement analysis. It was found that the initial non-linear data are statistical outliers. New automation algorithms of the hyperbolic and the Asaoka's method were developed based on outlier detection method. The methods are a successive detection of outliers and a searching method of suitable hyperbolic range for the Asaoka's and the hyperbolic method respectively. Applicability of the algorithms was verified through case studies.