• Title/Summary/Keyword: 다변량 데이터

Search Result 212, Processing Time 0.027 seconds

Application of functional ANOVA and functional MANOVA (단변량 및 다변량 함수 데이터에 대한 분산분석의 활용)

  • Kim, Mijeong
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.5
    • /
    • pp.579-591
    • /
    • 2022
  • Functional data is collected in various fields. It is often necessary to test whether there are differences among groups of functional data. In this case, it is not appropriate to explain using the point-wise ANOVA method, and we should present not the point-wise result but the integrated result. Various studies on functional data analysis of variance have been proposed, and recently implemented those methods in the package fdANOVA of R. In this paper, I first explain ANOVA and multivariate ANOVA, then I will introduce various methods of analysis of variance for univariate and multivariate functional data recently proposed. I also describe how to use the R package fdANOVA. This package is used to test equality of weekly temperatures in Seoul and Busan through univariate functional data ANOVA, and to test equality of multivariate functional data corresponding to handwritten images using multivariate function data ANOVA.

Development of MKDE-ebd for Estimation of Multivariate Probabilistic Distribution Functions (다변량 확률분포함수의 추정을 위한 MKDE-ebd 개발)

  • Kang, Young-Jin;Noh, Yoojeong;Lim, O-Kaung
    • Journal of the Computational Structural Engineering Institute of Korea
    • /
    • v.32 no.1
    • /
    • pp.55-63
    • /
    • 2019
  • In engineering problems, many random variables have correlation, and the correlation of input random variables has a great influence on reliability analysis results of the mechanical systems. However, correlated variables are often treated as independent variables or modeled by specific parametric joint distributions due to difficulty in modeling joint distributions. Especially, when there are insufficient correlated data, it becomes more difficult to correctly model the joint distribution. In this study, multivariate kernel density estimation with bounded data is proposed to estimate various types of joint distributions with highly nonlinearity. Since it combines given data with bounded data, which are generated from confidence intervals of uniform distribution parameters for given data, it is less sensitive to data quality and number of data. Thus, it yields conservative statistical modeling and reliability analysis results, and its performance is verified through statistical simulation and engineering examples.

Feature Extraction of CNN-GRU based Multivariate Time Series Data for Regional Clustering (지역 군집화를 위한 CNN-GRU 기반 다변량 시계열 데이터의 특성 추출)

  • Kim, Jinah;Lee, Ji-Hoon;Choi, Dong-Wook;Moon, Nammee
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2019.10a
    • /
    • pp.950-951
    • /
    • 2019
  • 시계열 데이터에 대한 군집화 관련 연구는 주로 통계 분석을 통해 이뤄지기 때문에 데이터가 갖는 특성을 완전히 반영하는 데 한계를 갖는다. 본 논문에서는 다변량 데이터에서의 군집화를 위하여 변수별로 시간에 따른 변화와 특징을 추출하기 위한 CNN-GRU(Convolutional Neural Network - Gated Recurrent Unit) 기반의 신경망 모델을 제안한다. CNN을 활용하여 변수별로 갖는 특성을 파악하고자 하였으며, GRU을 통해 전체 시간에 따른 소비 추세를 도출하고자 하였다. 지역별로 업종에 따라 사용된 2년 치의 실제 카드 데이터를 활용하였으며, 유사한 소비 추세를 보이는 지역을 군집화하는데 이를 적용하였다. 결과적으로, 다변량 시계열 데이터를 통해 전체적인 흐름을 반영하여 패턴화했다는 점에서 의의를 갖는다.

Prediction of arrhythmia using multivariate time series data (다변량 시계열 자료를 이용한 부정맥 예측)

  • Lee, Minhai;Noh, Hohsuk
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.5
    • /
    • pp.671-681
    • /
    • 2019
  • Studies on predicting arrhythmia using machine learning have been actively conducted with increasing number of arrhythmia patients. Existing studies have predicted arrhythmia based on multivariate data of feature variables extracted from RR interval data at a specific time point. In this study, we consider that the pattern of the heart state changes with time can be important information for the arrhythmia prediction. Therefore, we investigate the usefulness of predicting the arrhythmia with multivariate time series data obtained by extracting and accumulating the multivariate vectors of the feature variables at various time points. When considering 1-nearest neighbor classification method and its ensemble for comparison, it is confirmed that the multivariate time series data based method can have better classification performance than the multivariate data based method if we select an appropriate time series distance function.

Detection of the Change in Blogger Sentiment using Multivariate Control Charts (다변량 관리도를 활용한 블로거 정서 변화 탐지)

  • Moon, Jeounghoon;Lee, Sungim
    • The Korean Journal of Applied Statistics
    • /
    • v.26 no.6
    • /
    • pp.903-913
    • /
    • 2013
  • Social network services generate a considerable amount of social data every day on personal feelings or thoughts. This social data provides changing patterns of information production and consumption but are also a tool that reflects social phenomenon. We analyze negative emotional words from daily blogs to detect the change in blooger sentiment using multivariate control charts. We used the all the blogs produced between 1 January 2008 and 31 December 2009. Hotelling's T-square control chart control chart is commonly used to monitor multivariate quality characteristics; however, it assumes that quality characteristics follow multivariate normal distribution. The performance of a multivariate control chart is affected by this assumption; consequently, we introduce the support vector data description and its extension (K-control chart) suggested by Sun and Tsung (2003) and they are applied to detect the chage in blogger sentiment.

Evaluation of Multivariate Stream Data Reduction Techniques (다변량 스트림 데이터 축소 기법 평가)

  • Jung, Hung-Jo;Seo, Sung-Bo;Cheol, Kyung-Joo;Park, Jeong-Seok;Ryu, Keun-Ho
    • The KIPS Transactions:PartD
    • /
    • v.13D no.7 s.110
    • /
    • pp.889-900
    • /
    • 2006
  • Even though sensor networks are different in user requests and data characteristics depending on each application area, the existing researches on stream data transmission problem focus on the performance improvement of their methods rather than considering the original characteristic of stream data. In this paper, we introduce a hierarchical or distributed sensor network architecture and data model, and then evaluate the multivariate data reduction methods suitable for user requirements and data features so as to apply reduction methods alternatively. To assess the relative performance of the proposed multivariate data reduction methods, we used the conventional techniques, such as Wavelet, HCL(Hierarchical Clustering), Sampling and SVD (Singular Value Decomposition) as well as the experimental data sets, such as multivariate time series, synthetic data and robot execution failure data. The experimental results shows that SVD and Sampling method are superior to Wavelet and HCL ia respect to the relative error ratio and execution time. Especially, since relative error ratio of each data reduction method is different according to data characteristic, it shows a good performance using the selective data reduction method for the experimental data set. The findings reported in this paper can serve as a useful guideline for sensor network application design and construction including multivariate stream data.

Inverter-Based Solar Power Prediction Algorithm Using Artificial Neural Network Regression Model (인공 신경망 회귀 모델을 활용한 인버터 기반 태양광 발전량 예측 알고리즘)

  • Gun-Ha Park;Su-Chang Lim;Jong-Chan Kim
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.19 no.2
    • /
    • pp.383-388
    • /
    • 2024
  • This paper is a study to derive the predicted value of power generation based on the photovoltaic power generation data measured in Jeollanam-do, South Korea. Multivariate variables such as direct current, alternating current, and environmental data were measured in the inverter to measure the amount of power generation, and pre-processing was performed to ensure the stability and reliability of the measured values. Correlation analysis used only data with high correlation with power generation in time series data for prediction using partial autocorrelation function (PACF). Deep learning models were used to measure the amount of power generation to predict the amount of photovoltaic power generation, and the results of correlation analysis of each multivariate variable were used to increase the prediction accuracy. Learning using refined data was more stable than when existing data were used as it was, and the solar power generation prediction algorithm was improved by using only highly correlated variables among multivariate variables by reflecting the correlation analysis results.

KCYP data analysis using Bayesian multivariate linear model (베이지안 다변량 선형 모형을 이용한 청소년 패널 데이터 분석)

  • Insun, Lee;Keunbaik, Lee
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.6
    • /
    • pp.703-724
    • /
    • 2022
  • Although longitudinal studies mainly produce multivariate longitudinal data, most of existing statistical models analyze univariate longitudinal data and there is a limitation to explain complex correlations properly. Therefore, this paper describes various methods of modeling the covariance matrix to explain the complex correlations. Among them, modified Cholesky decomposition, modified Cholesky block decomposition, and hypersphere decomposition are reviewed. In this paper, we review these methods and analyze Korean children and youth panel (KCYP) data are analyzed using the Bayesian method. The KCYP data are multivariate longitudinal data that have response variables: School adaptation, academic achievement, and dependence on mobile phones. Assuming that the correlation structure and the innovation standard deviation structure are different, several models are compared. For the most suitable model, all explanatory variables are significant for school adaptation, and academic achievement and only household income appears as insignificant variables when cell phone dependence is a response variable.

Fault Detection Method for Multivariate Process using ICA (독립성분분석을 이용한 다변량 공정에서의 고장탐지 방법)

  • Jung, Seunghwan;Kim, Minseok;Lee, Hansoo;Kim, Jonggeun;Kim, Sungshin
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.2
    • /
    • pp.192-197
    • /
    • 2020
  • Multivariate processes, such as large scale power plants or chemical processes are operated in very hazardous environment, which can lead to significant human and material losses if a fault occurs. On-line monitoring technology, therefore, is essential to detect system faults. In this paper, the ICA-based fault detection method is conducted using three different multivariate process data. Fault detection procedure based on ICA is divided into off-line and on-line processes. The off-line process determines a threshold for fault detection by using the obtained dataset when the system is normal. And the on-line process computes statistics of query vectors measured in real-time. The fault is detected by comparing computed statistics and previously defined threshold. For comparison, the PCA-based fault detection method is also implemented in this paper. Experimental results show that the ICA-based fault detection method detects the system faults earlier and better than the PCA-based method.

Multivariate volatility for high-frequency financial series (다변량 고빈도 금융시계열의 변동성 분석)

  • Lee, G.J.;Hwang, Sun Young
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.1
    • /
    • pp.169-180
    • /
    • 2017
  • Multivariate GARCH models are interested in conditional variances (volatilities) as well as conditional correlations between return time series. This paper is concerned with high-frequency multivariate financial time series from which realized volatilities and realized conditional correlations of intra-day returns are calculated. Existing multivariate GARCH models are reviewed comparatively with the realized volatility via canonical correlations and value at risk (VaR). Korean stock prices are analysed for illustration.