• Title/Summary/Keyword: multivariate data

Search Result 1,967, Processing Time 0.025 seconds

Efficient Compression Algorithm with Limited Resource for Continuous Surveillance

  • Yin, Ling;Liu, Chuanren;Lu, Xinjiang;Chen, Jiafeng;Liu, Caixing
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.11
    • /
    • pp.5476-5496
    • /
    • 2016
  • Energy efficiency of resource-constrained wireless sensor networks is critical in applications such as real-time monitoring/surveillance. To improve the energy efficiency and reduce the energy consumption, the time series data can be compressed before transmission. However, most of the compression algorithms for time series data were developed only for single variate scenarios, while in practice there are often multiple sensor nodes in one application and the collected data is actually multivariate time series. In this paper, we propose to compress the time series data by the Lasso (least absolute shrinkage and selection operator) approximation. We show that, our approach can be naturally extended for compressing the multivariate time series data. Our extension is novel since it constructs an optimal projection of the original multivariates where the best energy efficiency can be realized. The two algorithms are named by ULasso (Univariate Lasso) and MLasso (Multivariate Lasso), for which we also provide practical guidance for parameter selection. Finally, empirically evaluation is implemented with several publicly available real-world data sets from different application domains. We quantify the algorithm performance by measuring the approximation error, compression ratio, and computation complexity. The results show that ULasso and MLasso are superior to or at least equivalent to compression performance of LTC and PLAMlis. Particularly, MLasso can significantly reduce the smooth multivariate time series data, without breaking the major trends and important changes of the sensor network system.

Estimation of Spatial Distribution Using the Gaussian Mixture Model with Multivariate Geoscience Data (다변량 지구과학 데이터와 가우시안 혼합 모델을 이용한 공간 분포 추정)

  • Kim, Ho-Rim;Yu, Soonyoung;Yun, Seong-Taek;Kim, Kyoung-Ho;Lee, Goon-Taek;Lee, Jeong-Ho;Heo, Chul-Ho;Ryu, Dong-Woo
    • Economic and Environmental Geology
    • /
    • v.55 no.4
    • /
    • pp.353-366
    • /
    • 2022
  • Spatial estimation of geoscience data (geo-data) is challenging due to spatial heterogeneity, data scarcity, and high dimensionality. A novel spatial estimation method is needed to consider the characteristics of geo-data. In this study, we proposed the application of Gaussian Mixture Model (GMM) among machine learning algorithms with multivariate data for robust spatial predictions. The performance of the proposed approach was tested through soil chemical concentration data from a former smelting area. The concentrations of As and Pb determined by ex-situ ICP-AES were the primary variables to be interpolated, while the other metal concentrations by ICP-AES and all data determined by in-situ portable X-ray fluorescence (PXRF) were used as auxiliary variables in GMM and ordinary cokriging (OCK). Among the multidimensional auxiliary variables, important variables were selected using a variable selection method based on the random forest. The results of GMM with important multivariate auxiliary data decreased the root mean-squared error (RMSE) down to 0.11 for As and 0.33 for Pb and increased the correlations (r) up to 0.31 for As and 0.46 for Pb compared to those from ordinary kriging and OCK using univariate or bivariate data. The use of GMM improved the performance of spatial interpretation of anthropogenic metals in soil. The multivariate spatial approach can be applied to understand complex and heterogeneous geological and geochemical features.

Prediction of ultimate load capacity of concrete-filled steel tube columns using multivariate adaptive regression splines (MARS)

  • Avci-Karatas, Cigdem
    • Steel and Composite Structures
    • /
    • v.33 no.4
    • /
    • pp.583-594
    • /
    • 2019
  • In the areas highly exposed to earthquakes, concrete-filled steel tube columns (CFSTCs) are known to provide superior structural aspects such as (i) high strength for good seismic performance (ii) high ductility (iii) enhanced energy absorption (iv) confining pressure to concrete, (v) high section modulus, etc. Numerous studies were reported on behavior of CFSTCs under axial compression loadings. This paper presents an analytical model to predict ultimate load capacity of CFSTCs with circular sections under axial load by using multivariate adaptive regression splines (MARS). MARS is a nonlinear and non-parametric regression methodology. After careful study of literature, 150 comprehensive experimental data presented in the previous studies were examined to prepare a data set and the dependent variables such as geometrical and mechanical properties of circular CFST system have been identified. Basically, MARS model establishes a relation between predictors and dependent variables. Separate regression lines can be formed through the concept of divide and conquers strategy. About 70% of the consolidated data has been used for development of model and the rest of the data has been used for validation of the model. Proper care has been taken such that the input data consists of all ranges of variables. From the studies, it is noted that the predicted ultimate axial load capacity of CFSTCs is found to match with the corresponding experimental observations of literature.

The System for Checking Multivariate Normality and Outliers

  • 강명래;최용석
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2000.11a
    • /
    • pp.253-255
    • /
    • 2000
  • 다변량분석 기법을 사용하기 위해서는 자료가 정규성(normality)가정을 만족해야한다. 본 연구에서는 GUI(graphic user interface)환경 하에서 일변량(univariate)과 다변량자료(multivariate data)의 정규성검정, 이상치(outliers)제거 및 변수변환(variable transformation)을 지원하는 시스템을 구축하여 사용자들이 보다 편리하게 사용할 수 있음을 소개 하고자 한다.

  • PDF

Robust Estimation and Outlier Detection

  • Myung Geun Kim
    • Communications for Statistical Applications and Methods
    • /
    • v.1 no.1
    • /
    • pp.33-40
    • /
    • 1994
  • The conditional expectation of a random variable in a multivariate normal random vector is a multiple linear regression on its predecessors. Using this fact, the least median of squares estimation method developed in a multiple linear regression is adapted to a multivariate data to identify influential observations. The resulting method clearly detect outliers and it avoids the masking effect.

  • PDF

Exponentially Weighted Moving Average Control Charts for Dispersion Matrix

  • Chang, Duk-Joon;Shin, Jae-Kyoung
    • Journal of the Korean Data and Information Science Society
    • /
    • v.15 no.3
    • /
    • pp.633-644
    • /
    • 2004
  • Exponentially Weighted Moving Average(EWMA) control chart for variance-covariance matrix of several quality characteristics based on accumulate-combine approach has proposed. Numerical computations show that multivariate EWMA chart based on accumulate-combine approach is more efficient than corresponding multivariate EWMA chart based on combine-accumulate approach.

  • PDF

Permutation P-values for Inter-rater Agreement Measures

  • Um, Yonghwan
    • Journal of the Korea Society of Computer and Information
    • /
    • v.20 no.12
    • /
    • pp.169-174
    • /
    • 2015
  • Permutation p-values are provided for the agreement measures for multivariate interval data among many raters. Three agreement measures, Berry and Mielke's measure, Janson and Olsson's measure, and Um's measure are described and compared. Exact and resampling permutation methods are utilized to compute p-values and empirical quantile limits for three measures. Comparisons of p-values demonstrate that resampling permutation methods provide close approximations to exact p-values, and Berry and Mielke's measure and Um's measure show similar performance in terms of measuring agreement.

Effect of Dimension in Optimal Dimension Reduction Estimation for Conditional Mean Multivariate Regression (다변량회귀 조건부 평균모형에 대한 최적 차원축소 방법에서 차원수가 결과에 미치는 영향)

  • Seo, Eun-Kyoung;Park, Chong-Sun
    • Communications for Statistical Applications and Methods
    • /
    • v.19 no.1
    • /
    • pp.107-115
    • /
    • 2012
  • Yoo and Cook (2007) developed an optimal sufficient dimension reduction methodology for the conditional mean in multivariate regression and it is known that their method is asymptotically optimal and its test statistic has a chi-squared distribution asymptotically under the null hypothesis. To check the effect of dimension used in estimation on regression coefficients and the explanatory power of the conditional mean model in multivariate regression, we applied their method to several simulated data sets with various dimensions. A small simulation study showed that it is quite helpful to search for an appropriate dimension for a given data set if we use the asymptotic test for the dimension as well as results from the estimation with several dimensions simultaneously.

FAULT DETECTION, MONITORING AND DIAGNOSIS OF SEQUENCING BATCH REACTOR FOR INTEGRATED WASTEWATER TREATMENT MANAGEMENT SYSTEM

  • Yoo, Chang-Kyoo;Vanrolleghem, Peter A.;Lee, In-Beum
    • Environmental Engineering Research
    • /
    • v.11 no.2
    • /
    • pp.63-76
    • /
    • 2006
  • Multivariate analysis and batch monitoring on a pilot-scale sequencing batch reactor (SBR) are described for integrated wastewater treatment management system, where a batchwise multiway independent component analysis method (MICA) are used to extract meaningful hidden information from non-Gaussian wastewater treatment data. Three-way batch data of SBR are unfolded batch-wisely, and then a non-Gaussian multivariate monitoring method is used to capture the non-Gaussian characteristics of normal batches in biological wastewater treatment plant. It is successfully applied to an 80L SBR for biological wastewater treatment, which is characterized by a variety of error sources with non-Gaussian characteristics. The batchwise multivariate monitoring results of a pilot-scale SBR for integrated wastewater treatment management system showed more powerful monitoring performance on a WWTP application than the conventional method since it can extract non-Gaussian source signals which are independent and cross-correlation of variables.