• 제목/요약/키워드: Boxplot

검색결과 14건 처리시간 0.024초

Skew Normal Boxplot and Outliers

  • Huh, Myung-Hoe;Lee, Yong-Goo
    • Communications for Statistical Applications and Methods
    • /
    • 제19권4호
    • /
    • pp.591-595
    • /
    • 2012
  • We frequently use Tukey's boxplot to identify outliers in the batch of observations of the continuous variable. In doing so, we implicitly assume that the underlying distribution belongs to the family of normal distributions. Such a practice of data handling is often superficial and improper, since in reality too many variables manifest the skewness. In this short paper, we build a modified boxplot and set the outlier identification procedure by assuming that the observations are generated from the skew normal distribution (Azzalini, 1985), which is an extension of the normal distribution. Statistical performance of the proposed procedure is examined with simulated datasets.

A Classification Techniques For Quality Improvement

  • Jichao, Xu;Yumin, Liu;Li, Zhang
    • International Journal of Quality Innovation
    • /
    • 제2권2호
    • /
    • pp.24-33
    • /
    • 2001
  • As we know, the quality of processes is technically depicted by variation, a product or process with the best quality must naturally require the variation as less as possible. The variation is usually reduced with many ways, say, by adjusting parameters settings under robust design with many turns expensive experiments. So ones are trying to reach the robustness by detecting cheap and simple methods. In this paper, a both practical and simple technique for quality improvement, namely reducing the variation, by data classification is studied. First, all possible system factors are included, which may dominate the variation law. And then we make use of the past observations and their classification as well as boxplot charts to find out the internal rule between the variation and the system factor. Next, adjust the location of the system factor according to the rule so that the variation could, to some extent, be lessened. Finally, two typical quality improvement cases based on data classification are presented.

  • PDF

화학적산소요구량의 총유기탄소 변환을 위한 이상자료의 탐지와 처리 (Outlier Detection and Treatment for the Conversion of Chemical Oxygen Demand to Total Organic Carbon)

  • 조범준;조홍연;김성
    • 한국해안·해양공학회논문집
    • /
    • 제26권4호
    • /
    • pp.207-216
    • /
    • 2014
  • 총유기탄소(TOC)는 해양의 탄소순환 연구분야에서 직접적인 생물학적 지표로 이용되는 중요한 인자다. 가용한 TOC 자료가 상대적으로 화학적산소요구량(COD) 자료 보다 부족하기 때문에 COD 자료를 활용하여 TOC 자료를 추정할 수 있다. COD를 TOC 로의 변환 시 TOC 추정에 직접적으로 영향을 미치는 COD 관측자료에 포함된 이상자료의 탐지와 적절한 처리는 합리적이고 객관적으로 수행되어야 한다. 본 연구에서는 국내 연안해역에서 관측된 염분, COD 및 TOC 자료에 대한 최적회귀모형을 제시하였다. 최적회귀모형은 이상자료와 영향자료를 여러 가지 탐색방법으로 진단하여 제거 전 후의 자료 개수 변화, 변동계수 및 RMS 오차를 비교 및 분석하여 선택하였다. 연구수행 결과, Cook의 진단방법과 SIQR의 boxplot 방법을 조합한 방법이 가장 적절한 것으로 파악되었다. 최적 회귀 함수는 TOC(mg/L) = $0.44{\cdot}COD(mg/L)+1.53$ 이고, 결정계수는 0.47 정도로 나타났으며, RMS 오차는 0.85 mg/L이다. RMS 오차와 지레계수(leverage values)의 변동계수는 이상자료 제거 전에 비하여 각각 31%, 80%로 크게 감소되었다. 본 연구에서 제시된 방법을 통해 COD와 TOC 관측자료에 포함된 이상자료와 영향자료의 과도한 영향을 진단 및 제거하였기 때문에 보다 적절한 회귀곡선식을 제시할 수 있었다.

Malicious Users Detection and Nullifying their Effects on Cooperative Spectrum Sensing

  • Prasain, Prakash;Choi, Dong-You
    • 한국IT서비스학회지
    • /
    • 제15권1호
    • /
    • pp.167-178
    • /
    • 2016
  • Spectrum sensing in cognitive radio (CR) has a great role in order to utilize idle spectrum opportunistically, since it is responsible for making available dynamic spectrum access efficiently. In this research area, collaboration among multiple cognitive radio users has been proposed for the betterment of detection reliability. Even though cooperation among them improves the spectrum sensing performance, some falsely reporting malicious users may degrade the performance rigorously. In this article, we have studied the detection and nullifying the harmful effects of such malicious users by applying some well known outlier detection methods based on Grubb's test, Boxplot method and Dixon's test in cooperative spectrum sensing. Initially, the performance of each technique is compared and found that Boxplot method outperforms both Grubb's and Dixon's test for the case where multiple malicious users are present. Secondly, a new algorithm based on reputation and weight is developed to identify malicious users and cancel out their negative impact in final decision making. Simulation results demonstrate that the proposed scheme effectively identifies the malicious users and suppress their harmful effects at the fusion center to decide whether the spectrum is idle.

An Optimization Method for the Calculation of SCADA Main Grid's Theoretical Line Loss Based on DBSCAN

  • Cao, Hongyi;Ren, Qiaomu;Zou, Xiuguo;Zhang, Shuaitang;Qian, Yan
    • Journal of Information Processing Systems
    • /
    • 제15권5호
    • /
    • pp.1156-1170
    • /
    • 2019
  • In recent years, the problem of data drifted of the smart grid due to manual operation has been widely studied by researchers in the related domain areas. It has become an important research topic to effectively and reliably find the reasonable data needed in the Supervisory Control and Data Acquisition (SCADA) system has become an important research topic. This paper analyzes the data composition of the smart grid, and explains the power model in two smart grid applications, followed by an analysis on the application of each parameter in density-based spatial clustering of applications with noise (DBSCAN) algorithm. Then a comparison is carried out for the processing effects of the boxplot method, probability weight analysis method and DBSCAN clustering algorithm on the big data driven power grid. According to the comparison results, the performance of the DBSCAN algorithm outperforming other methods in processing effect. The experimental verification shows that the DBSCAN clustering algorithm can effectively screen the power grid data, thereby significantly improving the accuracy and reliability of the calculation result of the main grid's theoretical line loss.

실제증발산 측정 시 연직 풍속 이상치 탐색 및 대체 (Outlier Detection and Replacement for Vertical Wind Speed in the Measurement of Actual Evapotranspiration)

  • 박천건;임창수;임광섭;채효석
    • 대한토목학회논문집
    • /
    • 제34권5호
    • /
    • pp.1455-1461
    • /
    • 2014
  • 본 연구에서는 2011년 5월, 6월, 7월에 덕유산 덕곡제에서 관측된 플럭스자료를 이용하여 에디공분산방법으로부터 증발산량을 측정하는 경우 발생할 수 있는 연직방향 풍속의 이상치 판별 및 대체에 대한 통계적 분석을 실시하였다. 연직방향 풍속의 이상치를 파악하기 위해 적용된 통계분석방법은 사분위수를 바탕으로 상자그림(boxplot)의 분석결과 중에 이상치를 판별하기 위한 interquartile range (IQR)을 적용하여 이상치를 탐색하였다. 또한 삭제하거나 평균값으로 대체하는 방법을 통하여 보완된 연직방향 풍속자료를 이용하여 증발산량을 측정하였으며, 이를 보완전의 증발산량과 비교분석하였다. 비교분석한 결과에 의하면 이상치를 대체하기 전의 증발산량과 이상치를 대체한 후의 증발산량 사이에 차이를 보였으며, 특히 강우 시에 보다 큰 차이를 보였다. 따라서 증발산량 측정과정에서 발생하는 이상치를 보완하기 위해 이상치를 삭제하거나 대체하여 증발산량을 측정하는 것이 필요하다.

비모수검정을 이용한 논침투수 수질의 평가 (Evaluation of Percolated Water Quality of Paddy Fields Using Nonparametric Test)

  • 오승영;김진수;오광영
    • 한국농공학회논문집
    • /
    • 제47권2호
    • /
    • pp.99-110
    • /
    • 2005
  • Characteristics of concentrations of total nitrogen (TN), total phosphorus (TP), and chemical oxygen demand (COD) pollutant in percolated water at four paddy field sites (Soro, Odong, Munui, and Boeun) were investigated by a nonparametric test. Percolation rate measurement and percolated water sampling were taken during irrigation periods at $5{\sim}10$ day intervals. The normality of percolation rate and pollutant concentrations were examined using histogram, boxplot, and the Kolmogorov-Smirnov (K-S) test. Pollutant concentrations in percolated water showed positively skewed distribution. The median concentrations of pollutant were 1.91 mg/L for TN, 0.021 mg/L for TP, and 6.6 mg/L for COD, which were lower than its arithmetic mean concentrations by $35\%$ for TN, $36\%$ for TP, and $13\%$ for COD. The median concentrations of TN and TP differed significantly among sample sites according to the Kruskal-Wallis test. However, median concentrations were not significantly different among month except for TN and TP of Soro and COD of Odong. The percolation load of pollutants during irrigation periods in the study area were estimated at $3.12{\sim}7.75\;kg/ha$ for TN, $0.033{\sim}0.155\;kg/ha$ for TP, and 10.7 kg/ha for COD, which were much lower than respective values reported in Japan.

CRITIC 방법을 이용한 형상유사도 기반의 면 객체 자동매칭 방법 (A new method for automatic areal feature matching based on shape similarity using CRITIC method)

  • 김지영;허용;김대성;유기윤
    • 한국측량학회지
    • /
    • 제29권2호
    • /
    • pp.113-121
    • /
    • 2011
  • 본 연구에서는 기하학적 정보를 바탕으로 생성된 유사도 기반의 면 객체 자동매칭 방법을 제안하였다. 이를 위하여 서로 다른 공간자료에서 교차되는 후보 매칭 쌍을 추출하고, CRITIC방법을 이용하여 연동 기준별 가중치를 자동으로 생성하여 선형조합으로 추출된 후보매칭 쌍 간의 형상유사도를 측정하였다. 이때, 훈련자료에서 조정된 상자도표의 특이점 탐색을 적용하여 도출된 임계값 이상인 경우가 매칭 쌍으로 탐색된다. 제안된 방법을 이종의 공간자료(수지치도 2.0과 도로명주소 기본도)의 일부지역에 적용한 결과, 시각적으로 형상이 유사하고 교차되는 면적이 넓은 건물객체가 매칭 되었으며, 통계적으로 F-Measure가 0.932로 높게 나타났다.

Value of Contrast-Enhanced Ultrasonography in the Differential Diagnosis of Enlarged Lymph Nodes: a Meta-Analysis of Diagnostic Accuracy Studies

  • Jin, Ya;He, Yu-Shuang;Zhang, Ming-Ming;Parajuly, Shyam Sundar;Chen, Shuang;Zhao, Hai-Na;Peng, Yu-Lan
    • Asian Pacific Journal of Cancer Prevention
    • /
    • 제16권6호
    • /
    • pp.2361-2368
    • /
    • 2015
  • Objective: To evaluate the diagnostic accuracy of contrast-enhanced ultrasonography (CEUS) in differentiating between benign and malignant enlarged lymph nodes using meta-analysis. Materials and Methods: Pubmed, Embase, SCI and Cochrane databases were searched for studies (up to September 1, 2014) reporting the diagnostic performance of CEUS in discriminating between benign and malignant lymph nodes. Inclusion criteria were: prospective study; histopathology as the reference standard; and sufficient data to construct $2{\times}2$ contingency tables. Methodological quality was assessed using Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2). Patient clinical characteristics, sensitivity and specificity were extracted. The summary receiver operating characteristic curve was used to examine the accuracy of CEUS. A meta-analysis was performed to evaluate the clinical utility in identification of benign and malignant lymph nodes. Sensitivity analysis was performed after omitting outliers identified in a bivariate boxplot and publication bias was assessed with Egger testing. Results: The pooled sensitivity, specificity and AUROC were 0.92 (95%CI, 0.85-0.96), 0.91 (95%CI, 0.82-0.95) and 0.97 (95%CI, 0.95-0.98), respectively. After omitting 3 outlier studies, heterogeneity decreased. Sensitivity analysis demonstrated no disproportionate influences of individual studies. Publication bias was not significant. Conclusions: CEUS is a promising diagnostic modality in differentiating between benign and malignant lymph nodes and can potentially reduce unnecessary fine-needle aspiration biopsies of benign nodes.

Outlier Detection Method for Time Synchronization

  • Lee, Young Kyu;Yang, Sung-hoon;Lee, Ho Seong;Lee, Jong Koo;Lee, Joon Hyo;Hwang, Sang-wook
    • Journal of Positioning, Navigation, and Timing
    • /
    • 제9권4호
    • /
    • pp.397-403
    • /
    • 2020
  • In order to synchronize a remote system time to the reference time like Coordinated Universal Time (UTC), it is required to compare the time difference between the two clocks. The time comparison data may have some outliers and the time synchronization performance can be significantly degraded if the outliers are not removed. Therefore, it is required to employ an effective outlier detection algorithm for keeping high accurate system time. In this paper, an outlier detection method is presented for the time difference data of GNSS time transfer receivers. The time difference data between the system time and the GNSS usually have slopes because the remote system clock is under free running until synchronized to the reference clock time. For investigating the outlier detection performance of the proposed algorithm, simulations are performed by using the time difference data of a GNSS time transfer receiver corrected to a free running Cesium clock with intentionally inserted outliers. From the simulation, it is investigated that the proposed algorithm can effectively detect the inserted outliers while conventional methods such as modified Z-score and adjusted boxplot cannot. Furthermore, it is also observed that the synchronization performance can be degraded to more than 15% with 20 outliers compared to that of original data without outliers.