• Title/Summary/Keyword: Boxplot

Search Result 14, Processing Time 0.019 seconds

Skew Normal Boxplot and Outliers

  • Huh, Myung-Hoe;Lee, Yong-Goo
    • Communications for Statistical Applications and Methods
    • /
    • v.19 no.4
    • /
    • pp.591-595
    • /
    • 2012
  • We frequently use Tukey's boxplot to identify outliers in the batch of observations of the continuous variable. In doing so, we implicitly assume that the underlying distribution belongs to the family of normal distributions. Such a practice of data handling is often superficial and improper, since in reality too many variables manifest the skewness. In this short paper, we build a modified boxplot and set the outlier identification procedure by assuming that the observations are generated from the skew normal distribution (Azzalini, 1985), which is an extension of the normal distribution. Statistical performance of the proposed procedure is examined with simulated datasets.

A Classification Techniques For Quality Improvement

  • Jichao, Xu;Yumin, Liu;Li, Zhang
    • International Journal of Quality Innovation
    • /
    • v.2 no.2
    • /
    • pp.24-33
    • /
    • 2001
  • As we know, the quality of processes is technically depicted by variation, a product or process with the best quality must naturally require the variation as less as possible. The variation is usually reduced with many ways, say, by adjusting parameters settings under robust design with many turns expensive experiments. So ones are trying to reach the robustness by detecting cheap and simple methods. In this paper, a both practical and simple technique for quality improvement, namely reducing the variation, by data classification is studied. First, all possible system factors are included, which may dominate the variation law. And then we make use of the past observations and their classification as well as boxplot charts to find out the internal rule between the variation and the system factor. Next, adjust the location of the system factor according to the rule so that the variation could, to some extent, be lessened. Finally, two typical quality improvement cases based on data classification are presented.

  • PDF

Outlier Detection and Treatment for the Conversion of Chemical Oxygen Demand to Total Organic Carbon (화학적산소요구량의 총유기탄소 변환을 위한 이상자료의 탐지와 처리)

  • Cho, Beom Jun;Cho, Hong Yeon;Kim, Sung
    • Journal of Korean Society of Coastal and Ocean Engineers
    • /
    • v.26 no.4
    • /
    • pp.207-216
    • /
    • 2014
  • Total organic carbon (TOC) is an important indicator used as an direct biological index in the research field of the marine carbon cycle. It is possible to produce the sufficient TOC estimation data by using the Chemical Oxygen Demand(COD) data because the available TOC data is relatively poor than the COD data. The outlier detection and treatment (removal) should be carried out reasonably and objectively because the equation for a COD-TOC conversion is directly affected the TOC estimation. In this study, it aims to suggest the optimal regression model using the available salinity, COD, and TOC data observed in the Korean coastal zone. The optimal regression model is selected by the comparison and analysis on the changes of data numbers before and after removal, variation coefficients and root mean square (RMS) error of the diverse detection methods of the outlier and influential observations. According to research result, it is shown that a diagnostic case combining SIQR (Semi - Inter-Quartile Range) boxplot and Cook's distance method is most suitable for the outlier detection. The optimal regression function is estimated as the TOC(mg/L) = $0.44{\cdot}COD(mg/L)+1.53$, then determination coefficient is showed a value of 0.47 and RMS error is 0.85 mg/L. The RMS error and the variation coefficients of the leverage values are greatly reduced to the 31% and 80% of the value before the outlier removal condition. The method suggested in this study can provide more appropriate regression curve because the excessive impacts of the outlier frequently included in the COD and TOC monitoring data is removed.

Malicious Users Detection and Nullifying their Effects on Cooperative Spectrum Sensing

  • Prasain, Prakash;Choi, Dong-You
    • Journal of Information Technology Services
    • /
    • v.15 no.1
    • /
    • pp.167-178
    • /
    • 2016
  • Spectrum sensing in cognitive radio (CR) has a great role in order to utilize idle spectrum opportunistically, since it is responsible for making available dynamic spectrum access efficiently. In this research area, collaboration among multiple cognitive radio users has been proposed for the betterment of detection reliability. Even though cooperation among them improves the spectrum sensing performance, some falsely reporting malicious users may degrade the performance rigorously. In this article, we have studied the detection and nullifying the harmful effects of such malicious users by applying some well known outlier detection methods based on Grubb's test, Boxplot method and Dixon's test in cooperative spectrum sensing. Initially, the performance of each technique is compared and found that Boxplot method outperforms both Grubb's and Dixon's test for the case where multiple malicious users are present. Secondly, a new algorithm based on reputation and weight is developed to identify malicious users and cancel out their negative impact in final decision making. Simulation results demonstrate that the proposed scheme effectively identifies the malicious users and suppress their harmful effects at the fusion center to decide whether the spectrum is idle.

An Optimization Method for the Calculation of SCADA Main Grid's Theoretical Line Loss Based on DBSCAN

  • Cao, Hongyi;Ren, Qiaomu;Zou, Xiuguo;Zhang, Shuaitang;Qian, Yan
    • Journal of Information Processing Systems
    • /
    • v.15 no.5
    • /
    • pp.1156-1170
    • /
    • 2019
  • In recent years, the problem of data drifted of the smart grid due to manual operation has been widely studied by researchers in the related domain areas. It has become an important research topic to effectively and reliably find the reasonable data needed in the Supervisory Control and Data Acquisition (SCADA) system has become an important research topic. This paper analyzes the data composition of the smart grid, and explains the power model in two smart grid applications, followed by an analysis on the application of each parameter in density-based spatial clustering of applications with noise (DBSCAN) algorithm. Then a comparison is carried out for the processing effects of the boxplot method, probability weight analysis method and DBSCAN clustering algorithm on the big data driven power grid. According to the comparison results, the performance of the DBSCAN algorithm outperforming other methods in processing effect. The experimental verification shows that the DBSCAN clustering algorithm can effectively screen the power grid data, thereby significantly improving the accuracy and reliability of the calculation result of the main grid's theoretical line loss.

Outlier Detection and Replacement for Vertical Wind Speed in the Measurement of Actual Evapotranspiration (실제증발산 측정 시 연직 풍속 이상치 탐색 및 대체)

  • Park, Chun Gun;Rim, Chang-Soo;Lim, Kwang-Suop;Chae, Hyo-Sok
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.34 no.5
    • /
    • pp.1455-1461
    • /
    • 2014
  • In this study, using flux data measured in Deokgokje reservoir watershed near Deokyu mountain in May, June, and July 2011, statistical analysis was conducted for outlier detection and replacement for vertical wind speed in the measurement of evapotranspiration based on eddy covariance method. To statistically analyze the outliers of vertical wind speed, the outlier detection method based on interquartile range (IQR) in boxplot was employed and the detected outliers were deleted or replaced with mean. The comparison was conducted for the measured evapotranspiration before and after the outlier replacement. The study results showed that there is a difference between evapotranspiration before outlier replacement and evapotranspiration after outlier replacement, especially during the rainy day. Therefore, based on the study results, the outliers should be deleted or replaced in the measurement of evapotranspiration.

Evaluation of Percolated Water Quality of Paddy Fields Using Nonparametric Test (비모수검정을 이용한 논침투수 수질의 평가)

  • Oh, Seung-Young;Kim, Jin-Soo;Oh, Kwang-Young
    • Journal of The Korean Society of Agricultural Engineers
    • /
    • v.47 no.2
    • /
    • pp.99-110
    • /
    • 2005
  • Characteristics of concentrations of total nitrogen (TN), total phosphorus (TP), and chemical oxygen demand (COD) pollutant in percolated water at four paddy field sites (Soro, Odong, Munui, and Boeun) were investigated by a nonparametric test. Percolation rate measurement and percolated water sampling were taken during irrigation periods at $5{\sim}10$ day intervals. The normality of percolation rate and pollutant concentrations were examined using histogram, boxplot, and the Kolmogorov-Smirnov (K-S) test. Pollutant concentrations in percolated water showed positively skewed distribution. The median concentrations of pollutant were 1.91 mg/L for TN, 0.021 mg/L for TP, and 6.6 mg/L for COD, which were lower than its arithmetic mean concentrations by $35\%$ for TN, $36\%$ for TP, and $13\%$ for COD. The median concentrations of TN and TP differed significantly among sample sites according to the Kruskal-Wallis test. However, median concentrations were not significantly different among month except for TN and TP of Soro and COD of Odong. The percolation load of pollutants during irrigation periods in the study area were estimated at $3.12{\sim}7.75\;kg/ha$ for TN, $0.033{\sim}0.155\;kg/ha$ for TP, and 10.7 kg/ha for COD, which were much lower than respective values reported in Japan.

A new method for automatic areal feature matching based on shape similarity using CRITIC method (CRITIC 방법을 이용한 형상유사도 기반의 면 객체 자동매칭 방법)

  • Kim, Ji-Young;Huh, Yong;Kim, Doe-Sung;Yu, Ki-Yun
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.29 no.2
    • /
    • pp.113-121
    • /
    • 2011
  • In this paper, we proposed the method automatically to match areal feature based on similarity using spatial information. For this, we extracted candidate matching pairs intersected between two different spatial datasets, and then measured a shape similarity, which is calculated by an weight sum method of each matching criterion automatically derived from CRITIC method. In this time, matching pairs were selected when similarity is more than a threshold determined by outliers detection of adjusted boxplot from training data. After applying this method to two distinct spatial datasets: a digital topographic map and street-name address base map, we conformed that buildings were matched, that shape is similar and a large area is overlaid in visual evaluation, and F-Measure is highly 0.932 in statistical evaluation.

Value of Contrast-Enhanced Ultrasonography in the Differential Diagnosis of Enlarged Lymph Nodes: a Meta-Analysis of Diagnostic Accuracy Studies

  • Jin, Ya;He, Yu-Shuang;Zhang, Ming-Ming;Parajuly, Shyam Sundar;Chen, Shuang;Zhao, Hai-Na;Peng, Yu-Lan
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.16 no.6
    • /
    • pp.2361-2368
    • /
    • 2015
  • Objective: To evaluate the diagnostic accuracy of contrast-enhanced ultrasonography (CEUS) in differentiating between benign and malignant enlarged lymph nodes using meta-analysis. Materials and Methods: Pubmed, Embase, SCI and Cochrane databases were searched for studies (up to September 1, 2014) reporting the diagnostic performance of CEUS in discriminating between benign and malignant lymph nodes. Inclusion criteria were: prospective study; histopathology as the reference standard; and sufficient data to construct $2{\times}2$ contingency tables. Methodological quality was assessed using Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2). Patient clinical characteristics, sensitivity and specificity were extracted. The summary receiver operating characteristic curve was used to examine the accuracy of CEUS. A meta-analysis was performed to evaluate the clinical utility in identification of benign and malignant lymph nodes. Sensitivity analysis was performed after omitting outliers identified in a bivariate boxplot and publication bias was assessed with Egger testing. Results: The pooled sensitivity, specificity and AUROC were 0.92 (95%CI, 0.85-0.96), 0.91 (95%CI, 0.82-0.95) and 0.97 (95%CI, 0.95-0.98), respectively. After omitting 3 outlier studies, heterogeneity decreased. Sensitivity analysis demonstrated no disproportionate influences of individual studies. Publication bias was not significant. Conclusions: CEUS is a promising diagnostic modality in differentiating between benign and malignant lymph nodes and can potentially reduce unnecessary fine-needle aspiration biopsies of benign nodes.

Outlier Detection Method for Time Synchronization

  • Lee, Young Kyu;Yang, Sung-hoon;Lee, Ho Seong;Lee, Jong Koo;Lee, Joon Hyo;Hwang, Sang-wook
    • Journal of Positioning, Navigation, and Timing
    • /
    • v.9 no.4
    • /
    • pp.397-403
    • /
    • 2020
  • In order to synchronize a remote system time to the reference time like Coordinated Universal Time (UTC), it is required to compare the time difference between the two clocks. The time comparison data may have some outliers and the time synchronization performance can be significantly degraded if the outliers are not removed. Therefore, it is required to employ an effective outlier detection algorithm for keeping high accurate system time. In this paper, an outlier detection method is presented for the time difference data of GNSS time transfer receivers. The time difference data between the system time and the GNSS usually have slopes because the remote system clock is under free running until synchronized to the reference clock time. For investigating the outlier detection performance of the proposed algorithm, simulations are performed by using the time difference data of a GNSS time transfer receiver corrected to a free running Cesium clock with intentionally inserted outliers. From the simulation, it is investigated that the proposed algorithm can effectively detect the inserted outliers while conventional methods such as modified Z-score and adjusted boxplot cannot. Furthermore, it is also observed that the synchronization performance can be degraded to more than 15% with 20 outliers compared to that of original data without outliers.