• 제목/요약/키워드: Multivariate Dataset

검색결과 66건 처리시간 0.03초

기상조건이 하수발생량 및 하수처리장 운전인자에 미치는 영향에 관한 통계적 분석 (Study on the Relationship between Weather Conditions, Sewage and Operational Variables of WWTPs using Multivariate Statistical Methods)

  • 이재현
    • 한국물환경학회지
    • /
    • 제28권2호
    • /
    • pp.285-291
    • /
    • 2012
  • Generally, the rainfall and the influent of wastewater treatment plants (WWTPs) have strong relationship at the case of combined sewers. With the fact that the influent variations in terms of quantity and sewage quality is the most common and significant disturbance, the impact factor to the characteristics of sewage should be searched for. In this paper, the relationship between weather conditions such as humidity, temperature and rainfall and influent flowrate and contaminant concentration was analysed using factor analysis. Additionally, 3 influent types were deduced using cluster analysis and the distributions of operational variables were compared to the each groups by one-way ANOVA. The applied dataset were clustered to three groups that have the similar weather and influent conditions. These different conditions can cause the different operating conditions at WWTPs. That is, the Group 1 is for the condition with high humidity and rainfall, so DO concentration in the reactor was very high but MLSS concentration was very low because of too large flowrate. However, the Group 3 is classified to the case having low humidity, temperature, and rainfall, therefore, the SRT was the longest and the SVI was the highest due to the worst settleability in the winter for a year.

결합 다단계 일반화 선형모형을 이용한 다변량 경시적 자료 분석 (The Use of Joint Hierarchical Generalized Linear Models: Application to Multivariate Longitudinal Data)

  • 이동환;유재근
    • 응용통계연구
    • /
    • 제28권2호
    • /
    • pp.335-342
    • /
    • 2015
  • 경시적 자료는 각 환자마다 시간에 따라 반복 측정되는 코호트 연구 등에서 많이 쓰인다. 본 연구는 반응변수 간 상관성을 고려할 수 있는 결합 다단계 일반화 선형모형을 이용하여, 다변량 경시적 자료 분석을 수행하였다. 한국 유전체 역학 연구에서 실시한 코호트 자료를 적합하고 결과를 해석한다. 조건부 아카이케 정보 기준을 이용하여 모형 선택을 하고, 변량효과들의 추정치들을 설명한다.

Application of multivariate statistics towards the geochemical evaluation of fluoride enrichment in groundwater at Shilabati river bank, West Bengal, India

  • Ghosh, Arghya;Mondal, Sandip
    • Environmental Engineering Research
    • /
    • 제24권2호
    • /
    • pp.279-288
    • /
    • 2019
  • To obtain insightful knowledge of geochemical process controlling fluoride enrichment in groundwater of the villages near Shilabati river bank, West Bengal, India, multivariate statistical techniques were applied to a subgroup of the dataset generated from major ion analysis of groundwater samples. Water quality analysis of major ion chemistry revealed elevated levels of fluoride concentration in groundwater. Factor analysis (FA) of fifteen hydrochemical parameters demonstrated that fluoride occurrence was due to the weathering and dissolution of fluoride-bearing minerals in the aquifer. A strong positive loading (> 0.75) of fluoride with pH and bicarbonate for FA indicates an alkaline dominated environment responsible for leaching of fluoride from the source material. Mineralogical analysis of soli sediment exhibits the presence of fluoride-bearing minerals in underground geology. Hierarchical cluster analysis (HCA) was carried out to isolate the sampling sites according to groundwater quality. With HCA the sampling sites were isolated into three clusters. The occurrence of abundant fluoride in the higher elevated area of the observed three different clusters revealed that there was more contact opportunity of recharging water with the minerals present in the aquifer during infiltration through the vadose zone.

Investigating the performance of different decomposition methods in rainfall prediction from LightGBM algorithm

  • Narimani, Roya;Jun, Changhyun;Nezhad, Somayeh Moghimi;Parisouj, Peiman
    • 한국수자원학회:학술대회논문집
    • /
    • 한국수자원학회 2022년도 학술발표회
    • /
    • pp.150-150
    • /
    • 2022
  • This study investigates the roles of decomposition methods on high accuracy in daily rainfall prediction from light gradient boosting machine (LightGBM) algorithm. Here, empirical mode decomposition (EMD) and singular spectrum analysis (SSA) methods were considered to decompose and reconstruct input time series into trend terms, fluctuating terms, and noise components. The decomposed time series from EMD and SSA methods were used as input data for LightGBM algorithm in two hybrid models, including empirical mode-based light gradient boosting machine (EMDGBM) and singular spectrum analysis-based light gradient boosting machine (SSAGBM), respectively. A total of four parameters (i.e., temperature, humidity, wind speed, and rainfall) at a daily scale from 2003 to 2017 is used as input data for daily rainfall prediction. As results from statistical performance indicators, it indicates that the SSAGBM model shows a better performance than the EMDGBM model and the original LightGBM algorithm with no decomposition methods. It represents that the accuracy of LightGBM algorithm in rainfall prediction was improved with the SSA method when using multivariate dataset.

  • PDF

다변량 통계분석을 이용한 북한강의 수질 및 식물플랑크톤 군집 특성 평가 (Evaluation of Water Quality and Phytoplankton Community Using a Multivariate Analysis in Bukhan River)

  • 김헌년;윤석제;변명섭;유순주;임종권
    • 한국물환경학회지
    • /
    • 제35권1호
    • /
    • pp.19-27
    • /
    • 2019
  • The purpose of this study is to evaluate the water quality and phytoplankton community in Bukhan River which account for 44.4 % of the total inflow into Lake Paldang, using multivariate statistical techniques (i.e., correlation analysis, principal component analysis (PCA)/factor analysis (FA)). Water samples were collected from March to November 2015 and the following parameters measured; water temperature, pH, DO, EC, SS, BOD, Chl-a, COD, TN, $NO_3-N$, $NH_3-N$, TP, DTP, $PO_4-P$, and phytoplankton community. The water quality of the main stream and the tributaries were not significantly different apart from the relatively high concentration of BOD, COD and nutrients recorded in MH. The highest cell density of Stephanodiscus hantzschii and Merismopedia glauca dominated phytoplankton was observed in PD. Based on the correlation analysis, total phytoplankton and cyanophyceae were highly correlated with BOD, COD and nutrients. PCA/FA resulted in four main factors accounting for 82.240 % of the total variance in the water quality dataset. The group of component 1 (TN, DTN, DO, $NO_3-N$, water temperature) and component 2 ($PO_4-P$, T-P, DTP, SS) were classified as nutrient element factor whereas component 3 (Chl-a, COD, BOD, $NH_3-N$, pH) was related to organic substances. Hence, the identification of the main potential environmental pollution factors in Bukhan River will help policy makers make better and more informed decisions on how to improve the water quality.

메탄 가스 기반 가스 누출 위험 예측을 위한 다변량 특이치 제거 (Multivariate Outlier Removing for the Risk Prediction of Gas Leakage based Methane Gas)

  • 홍고르출;김미혜
    • 한국융합학회논문지
    • /
    • 제11권12호
    • /
    • pp.23-30
    • /
    • 2020
  • 본 연구에서는, 천연가스(NG) 데이터와 가스 관련 환경 요소 간의 관계를 기계학습 알고리즘을 사용하여 가스 누출 데이터를 직접 측정하지 않고 가스 누출 위험 수준을 예측하였다. 이번 연구는 서버가 제공하는 오픈 데이터인 IoT 기반 원격 제어 피카로(Picarro) 가스 센서 사양을 기반으로 사용했다. 천연 가스는 공기 중으로 누출이 되며, 대기 오염, 환경, 그리고 건강에 큰 문제가 된다. 본 연구에서 제안하는 방법은 천연 가스의 누출 위험 예측을 위한 랜덤 포레스트(Random Forest) 분류 기반 다변량 특이치 제거 방법이다. 비지도 k-평균 클러스터링 후에 실험 데이터 집합은 불균형 데이터이다. 따라서 우리는 제안된 모델이 중간과 높은 위험 수준을 가장 잘 예측할 수 있다는 점에 초점을 맞춘다. 이 경우 각 분류 모델에 대한 수신자 조작 특성(ROC) 곡선, 정확도, 평균 표준 오차(MSE)를 비교했다. 실험 결과로 정확도, 수신자 조작 특성의 곡선 아래 영역(AUC, Area Under the ROC Curve), MSE가 각각 MOL_RF의 경우 99.71%, 99.57%, 및 0.0016의 결과 값을 얻었다.

Continuous Conditional Random Field Model for Predicting the Electrical Load of a Combined Cycle Power Plant

  • Ahn, Gilseung;Hur, Sun
    • Industrial Engineering and Management Systems
    • /
    • 제15권2호
    • /
    • pp.148-155
    • /
    • 2016
  • Existing power plants may consume significant amounts of fuel and require high operating costs, partly because of poor electrical power output estimates. This paper suggests a continuous conditional random field (C-CRF) model to predict more precisely the full-load electrical power output of a base load operated combined cycle power plant. We introduce three feature functions to model association potential and one feature function to model interaction potential. Together, these functions compose the C-CRF model, and the model is transformed into a multivariate Gaussian distribution with which the operation parameters can be modeled more efficiently. The performance of our model in estimating power output was evaluated by means of a real dataset and our model outperformed existing methods. Moreover, our model can be used to estimate confidence intervals of the predicted output and calculate several probabilities.

Global and Local Views of the Hilbert Space Associated to Gaussian Kernel

  • Huh, Myung-Hoe
    • Communications for Statistical Applications and Methods
    • /
    • 제21권4호
    • /
    • pp.317-325
    • /
    • 2014
  • Consider a nonlinear transform ${\Phi}(x)$ of x in $\mathbb{R}^p$ to Hilbert space H and assume that the dot product between ${\Phi}(x)$ and ${\Phi}(x^{\prime})$ in H is given by < ${\Phi}(x)$, ${\Phi}(x^{\prime})$ >= K(x, x'). The aim of this paper is to propose a mathematical technique to take screen shots of the multivariate dataset mapped to Hilbert space H, particularly suited to Gaussian kernel $K({\cdot},{\cdot})$, which is defined by $K(x,x^{\prime})={\exp}(-{\sigma}{\parallel}x-x^{\prime}{\parallel}^2)$, ${\sigma}$ > 0. Several numerical examples are given.

K-means 알고리즘 기반 클러스터링 인덱스 비교 연구 (A Performance Comparison of Cluster Validity Indices based on K-means Algorithm)

  • 심요성;정지원;최인찬
    • Asia pacific journal of information systems
    • /
    • 제16권1호
    • /
    • pp.127-144
    • /
    • 2006
  • The K-means algorithm is widely used at the initial stage of data analysis in data mining process, partly because of its low time complexity and the simplicity of practical implementation. Cluster validity indices are used along with the algorithm in order to determine the number of clusters as well as the clustering results of datasets. In this paper, we present a performance comparison of sixteen indices, which are selected from forty indices in literature, while considering their applicability to nonhierarchical clustering algorithms. Data sets used in the experiment are generated based on multivariate normal distribution. In particular, four error types including standardization, outlier generation, error perturbation, and noise dimension addition are considered in the comparison. Through the experiment the effects of varying number of points, attributes, and clusters on the performance are analyzed. The result of the simulation experiment shows that Calinski and Harabasz index performs the best through the all datasets and that Davis and Bouldin index becomes a strong competitor as the number of points increases in dataset.

무선 센서 네트워크에서 노이즈 감지를 위한 DWT-PCA 조합 (DWT-PCA Combination for Noise Detection in Wireless Sensor Networks)

  • 당 띠엔 빈;;김문성;추현승
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2020년도 추계학술발표대회
    • /
    • pp.144-146
    • /
    • 2020
  • Discrete Wavelet Transform (DWT) is an effective technique that is commonly used for detecting noise in collected data of an individual sensor. In addition, the detection accuracy can be significant improved by exploiting the correlation in the data of neighboring sensors of Wireless Sensor Networks (WSNs). Principal component analysis is the powerful technique to analyze the correlation in the multivariate data. In this paper, we propose a DWT-PCA combination scheme for noise detection (DWT-PCA-ND). Experimental results on a real dataset show a remarkably higher performance of DWT-PCA-ND comparing to conventional PCA scheme in detection of noise that is a popular anomaly in collected data of WSN.