• 제목/요약/키워드: Outlier Analysis

검색결과 234건 처리시간 0.029초

강인한 VQ-PCA에 기반한 효율적인 화자 식별 (Efficient Speaker Identification based on Robust VQ-PCA)

  • 이기용
    • 인터넷정보학회논문지
    • /
    • 제5권3호
    • /
    • pp.57-62
    • /
    • 2004
  • 본 논문에서는, 효율적인 화자 식별을 위하여 강인한 벡터 양자화 주성분 분석을 제안하였다. 제안된 방법은 화자 식별에서 특징벡터의 학습을 위한 고차원(high dimension) 문제와 이상치(Outlier)에 대한 문제를 해결 하기위하여 제안 되었다. 먼저, 제안된 방법은 M-추정을 이용하여 강인한 벡터 양자화(Vector Quantization : VQ) 에 의한 몇 개의 분리된 영역으로 데이터 공간을 나눈다. 분리된 자 영역에서 공분산 행렬로부터 강인한 주성분 분석(Principal Component Analysis)이 얻어지게 된다. 마지막으로 각 영역에서 강인한 PCA에 의하여 줄어든 차원을 갖는 변환된 특징 벡터로부터 화자의 가우시안 혼합 모델(Gaussian Mixture Model : GMM)을 구한다. 제안된 방법은 같은 성능하에서 대각 공분산 행렬을 갖는 전형적인 GMM방법과 비교할 때 더빠른 결과를 얻었으며, 데이터의 저장공간을 줄일 수 있었을 뿐 아니라, 이상치가 존재할 경우에 더욱 강인하였다.

  • PDF

3변수 및 4변수 Kappa 분포에 의한 설계홍수량 추정 (Estimation of Design Floods Using 3 and 4 Parameter Kappa Distributions)

  • 맹승진;김병준;김형산
    • 한국농공학회논문집
    • /
    • 제51권4호
    • /
    • pp.49-55
    • /
    • 2009
  • This paper is to induce design floods through L-moment with 3 and 4 parameter Kappa distributions including test of independence by Wald-Wolfowitz, homogeneity by Mann-Whitney and outlier by Grubbs-Beck on annual maximum flood flows at 9 water level gaging stations in Han, Nakdong and Geum Rivers of South Korea. After analyzing appropriateness of the data of annual maximum flood flows by Kolmogorov-Smirnov test, 3 and 4 Kappa distributions were applied and the appropriateness was judged. The parameters of 3 and 4 Kappa distributions were estimated by L-moment method and the design floods by water level gaging station was calculated. Through the comparative analysis using the relative root mean square errors (RRMSE) and relative absolute errors (RAE) calculated by 3 and 4 parameter Kappa distributions with 4 plotting position formulas, the result showed that the design floods by 4 parameter Kappa distribution with Weibull and Cunnane plotting position formulas are closer to the observed data than those obtained by 3 parameter Kappa distribution with 4 plotting position formulas and 4 parameter Kappa distribution with Hazen and Gringorten plotting position formulas.

데이터 오·결측 저감 정제 알고리즘 (Data Cleansing Algorithm for reducing Outlier)

  • 이종원;김호성;황철현;강인식;정회경
    • 한국정보통신학회:학술대회논문집
    • /
    • 한국정보통신학회 2018년도 추계학술대회
    • /
    • pp.342-344
    • /
    • 2018
  • 본 논문에서는 기존 오 결측 데이터 분석 기법인 평균 대체법, 상관계수 수치분석, 그래프 상관성 분석 및 통계 전문가 분석 등 통계적 방법으로 대체 가능성을 조사하여 정수처리 공정에서 계측되는 각종 이상 데이터를 정제하기 위한 방법을 다양한 분석연구로 진행하였다. 또한 물 정보 데이터 오 결측 저감 정제 알고리즘의 신뢰성 및 검증에 있어 분위수 패턴과 딥러닝 기반의 LSTM 알고리즘으로 동작하는 시스템을 모델링하고, Keras, Theano, Tensorflow 등의 오픈 소스 라이브러리로 구현할 수 있는 체계를 연구하였다.

  • PDF

Bayesian forecasting approach for structure response prediction and load effect separation of a revolving auditorium

  • Ma, Zhi;Yun, Chung-Bang;Shen, Yan-Bin;Yu, Feng;Wan, Hua-Ping;Luo, Yao-Zhi
    • Smart Structures and Systems
    • /
    • 제24권4호
    • /
    • pp.507-524
    • /
    • 2019
  • A Bayesian dynamic linear model (BDLM) is presented for a data-driven analysis for response prediction and load effect separation of a revolving auditorium structure, where the main loads are self-weight and dead loads, temperature load, and audience load. Analyses are carried out based on the long-term monitoring data for static strains on several key members of the structure. Three improvements are introduced to the ordinary regression BDLM, which are a classificatory regression term to address the temporary audience load effect, improved inference for the variance of observation noise to be updated continuously, and component discount factors for effective load effect separation. The effects of those improvements are evaluated regarding the root mean square errors, standard deviations, and 95% confidence intervals of the predictions. Bayes factors are used for evaluating the probability distributions of the predictions, which are essential to structural condition assessments, such as outlier identification and reliability analysis. The performance of the present BDLM has been successfully verified based on the simulated data and the real data obtained from the structural health monitoring system installed on the revolving structure.

Relevancy contemplation in medical data analytics and ranking of feature selection algorithms

  • P. Antony Seba;J. V. Bibal Benifa
    • ETRI Journal
    • /
    • 제45권3호
    • /
    • pp.448-461
    • /
    • 2023
  • This article performs a detailed data scrutiny on a chronic kidney disease (CKD) dataset to select efficient instances and relevant features. Data relevancy is investigated using feature extraction, hybrid outlier detection, and handling of missing values. Data instances that do not influence the target are removed using data envelopment analysis to enable reduction of rows. Column reduction is achieved by ranking the attributes through feature selection methodologies, namely, extra-trees classifier, recursive feature elimination, chi-squared test, analysis of variance, and mutual information. These methodologies are ranked via Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) using weight optimization to identify the optimal features for model building from the CKD dataset to facilitate better prediction while diagnosing the severity of the disease. An efficient hybrid ensemble and novel similarity-based classifiers are built using the pruned dataset, and the results are thereafter compared with random forest, AdaBoost, naive Bayes, k-nearest neighbors, and support vector machines. The hybrid ensemble classifier yields a better prediction accuracy of 98.31% for the features selected by extra tree classifier (ETC), which is ranked as the best by TOPSIS.

강하분진의 화학적 특성파악을 위한 통계학적 해석 (Statistical Analysis for Chemical Characterization of Fall-Out Particles)

  • 김현섭;허정숙;김동술
    • 한국대기환경학회지
    • /
    • 제14권6호
    • /
    • pp.631-642
    • /
    • 1998
  • Fall-out particles were collected by the modified British deposit gauges at 35 sampling sites in Suwon area from January to November, 1996. Twenty chemical species (Al. Ba, Cd, Cr, K, Pb, Sb, Zn, Cu, Fe, Ni, V, F-, Cl-, NO3-, 5042-, Na+, NH4+, Mg2+, and Ca2+) were analyzed by AAS and If. The purposes of this study were to estimate qualitatively various emission sources of the fell-out particle by applying multivariate statistical techniques such as factor analysis, multiple regression analysis, and discriminant analysis. During the study, outlier sites were determined by a z-score method. Cl-, Na+, Mg2+, and SO42- were highly correlated due to their common marine related source. Wind speed was the most influential factor for the deposition fluxes of the particle itself and all the chemical species as well. When applying the factor analysis, 8 source patterns were qualitatively obtained, such as marine source, soil source, oil burning source, Cr related source, tire source, Cd related source, agriculture source, and F- related source. As a result of the multiple regression analysis, we could suggest that some chemical compounds may possibly exist in the form of CaSO4, NaN03, NaCl, MgC12, (NH4)2SO4, NaF, and CaCl2 in the fall-out particles. Finally, spatial and seasonal classification study performed by a discriminant analysis showed th.at SO42-, Ca2+, Cl-, and Fe were dominant in the group of spatial pattern; however, SO42-, Cl-, Al, and V were in the group of seasonal pattern.

  • PDF

단계양수시험 해석 방법에 따른 우물 및 수리 상수 변동 분석 (Comparisons of Different Step-drawdown Test Analysis Methods; Implication for Improrvced Analysis for Step-drawdown Test Data)

  • 안효원;하규철;이은희;도병희
    • 한국지하수토양환경학회지:지하수토양환경
    • /
    • 제25권4호
    • /
    • pp.35-47
    • /
    • 2020
  • Step-drawdown test is one of the widely-used aquifer test methods to evaluate aquifer and well losses. Various approaches have been suggested to estimate well losses using the step-drawdown test data but the uncertainties associated with data interpretation and analysis still exist. In this study, we applied three different step-drawdown test analysis methods -Jacob (1947), Labadie and Helweg (1975), Gupta (1989)- to the step-drawdown test data in Seobu-myeon, Hongseong-gun, South Korea and estimated aquifer and well losses. Comparisons of different step-drawdown test analysis methods revealed that the estimated well losses showed different values depending on the applied methods and these variations are likely to be related to the limitation of the assumptions for each analysis method. Based on the detailed analysis of time-drawdown data, we performed step-drawdown test analysis after removing outlier data during the initial stage of step drawdown test. The results showed that the application of the revised time-drawdown data could substantially decrease the error of the analysis as well as the variations in the estimated well losses from different analysis methods.

계획된 행동이론을 적용한 민간경비원의 건강행동연구 (Study on Health Behavior of Private Security Guards Applying Planned Behavioral Theory)

  • 김혜선;곽한병
    • 시큐리티연구
    • /
    • 제43호
    • /
    • pp.99-120
    • /
    • 2015
  • 본 연구는 계획된 행동이론을 적용하여 민간경비원의 건강행동을 분석하는데 주요 목적이 있다. 이상의 목적을 달성하기 위하여 서울 경기 지역에 거주하고 있는 민간경비원을 유의표집(purposive sampling)하였다. 불성실한 응답 및 이상치를 제외하고 187명의 자료가 분석에 사용되었다. 구체적인 분석 방법은 탐색적 요인분석(Exploratory Factor Analysis: EFA), Polyserial 상관분석, 각 변인간의 인과관계를 추정하기 위하여 다중회귀(multiple regression)분석과 로지스틱 회귀(logistic regression)분석을 실시하였다. 결과를 요약하면 다음과 같다. 첫째, 애착도, 행동에 대한 태도 주관적 규범, 지각된 행동 통제력은 건강행동 지속의지에 정(+)적인 영향을 미치는 것으로 나타났다. 둘째, 애착도는 행동에 대한 태도에 유의한 영향을 미치지 않는 것으로 나타났다. 셋째, 애착도는 건강행동 지속의지에 정(+)적인 영향을 미치는 것으로 나타났다. 넷째, 지각된 행동 통제력은 건강행동 실현여부에 정(+)적인 영향을 미치며 지각된 행동 통제력인 1단위 증가하면 건강행동을 실천할 가능성이 약 62.9%씩 증가하는 것으로 나타났다. 다섯째, 건강행동 지속의지는 건강행동 실현여부에 정(+)적인 영향을 미치며 지각된 행동 통제력인 1단위 증가하면 건강행동을 실천할 가능성이 약 72.3%씩 증가하는 것으로 나타났다.

  • PDF

Value of Contrast-Enhanced Ultrasonography in the Differential Diagnosis of Enlarged Lymph Nodes: a Meta-Analysis of Diagnostic Accuracy Studies

  • Jin, Ya;He, Yu-Shuang;Zhang, Ming-Ming;Parajuly, Shyam Sundar;Chen, Shuang;Zhao, Hai-Na;Peng, Yu-Lan
    • Asian Pacific Journal of Cancer Prevention
    • /
    • 제16권6호
    • /
    • pp.2361-2368
    • /
    • 2015
  • Objective: To evaluate the diagnostic accuracy of contrast-enhanced ultrasonography (CEUS) in differentiating between benign and malignant enlarged lymph nodes using meta-analysis. Materials and Methods: Pubmed, Embase, SCI and Cochrane databases were searched for studies (up to September 1, 2014) reporting the diagnostic performance of CEUS in discriminating between benign and malignant lymph nodes. Inclusion criteria were: prospective study; histopathology as the reference standard; and sufficient data to construct $2{\times}2$ contingency tables. Methodological quality was assessed using Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2). Patient clinical characteristics, sensitivity and specificity were extracted. The summary receiver operating characteristic curve was used to examine the accuracy of CEUS. A meta-analysis was performed to evaluate the clinical utility in identification of benign and malignant lymph nodes. Sensitivity analysis was performed after omitting outliers identified in a bivariate boxplot and publication bias was assessed with Egger testing. Results: The pooled sensitivity, specificity and AUROC were 0.92 (95%CI, 0.85-0.96), 0.91 (95%CI, 0.82-0.95) and 0.97 (95%CI, 0.95-0.98), respectively. After omitting 3 outlier studies, heterogeneity decreased. Sensitivity analysis demonstrated no disproportionate influences of individual studies. Publication bias was not significant. Conclusions: CEUS is a promising diagnostic modality in differentiating between benign and malignant lymph nodes and can potentially reduce unnecessary fine-needle aspiration biopsies of benign nodes.

공간 격자데이터 분석에 대한 우위성 비교 연구 - 이상치가 존재하는 경우 - (A Comparative Study on Spatial Lattice Data Analysis - A Case Where Outlier Exists -)

  • 김수정;최승배;강창완;조장식
    • Communications for Statistical Applications and Methods
    • /
    • 제17권2호
    • /
    • pp.193-204
    • /
    • 2010
  • 최근들어 공간적으로 분석을 필요로 하는 여러 분야에서의 연구자들은 공간통계학에 많은 관심을 가지게 되었다. 그리고 통계학 분야 역시 공간상에서 얻어진 데이터에 공간자기상관이 존재할 경우 공간적으로 분석해야 한다는 주장과 함께 많은 연구가 진행되고 있다. 공간통계학에서 다루고 있는 데이터 중에서 '공간 격자데이터 분석'은 (1) 공간이웃의 정의, (2) 공간이웃 가중치의 정의, (3) 공간모형의 적용 등의 단계를 거쳐서 행해진다. 본 연구에서는 이상치가 존재하는 공간 격자데이터를 분석할 경우 절사평균제곱오차를 이용하여 분석함으로써 예측적인 측면에서 공간통계학적 방법이 일반통계학적 방법보다 더 우수함을 보인다. 본 연구에 대한 내용의 타당성을 보이기 위해서 시뮬레이션을 통하여 공간통계학적인 방법과 일반통계학적인 방법을 비교하였다. 그리고 부산진구의 실제 범죄데이터를 이용한 적용사례를 통하여 절사평균제곱오차를 사용한 공간통계학적 방법의 유용성을 알아보았다.