• Title/Summary/Keyword: Multivariate Data

Search Result 1,980, Processing Time 0.026 seconds

Asymmetric CCC Modelling in Multivariate-GARCH with Illustrations of Multivariate Financial Data (금융시계열 분석을 위한 다변량-GARCH 모형에서 비대칭-CCC의 도입 및 응용)

  • Park, R.H.;Choi, M.S.;Hwan, S.Y.
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.5
    • /
    • pp.821-831
    • /
    • 2011
  • It has been relatively incomplete in the field of financial time series to adapt asymmetric features to multivar ate GARCH processes (McAleer et al., 2009). Retaining constant conditional correlation(CCC) structure, this article pursues to introduce asymmetric GARCH modelling in analysing multivariate volatilities in time series in a practical point of view. Multivariate Korean financial time series are analyzed in detail to compar our theory with conventional methodologies including GARCH and EGARCH.

Choice of frequency via principal component in high-frequency multivariate volatility models (주성분을 이용한 다변량 고빈도 실현 변동성의 주기 선택)

  • Jin, M.K.;Yoon, J.E.;Hwang, S.Y.
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.5
    • /
    • pp.747-757
    • /
    • 2017
  • We investigate multivariate volatilities based on high frequency time series. The PCA (principal component analysis) method is employed to achieve a dimension reduction in multivariate volatility. Multivariate realized volatilities (RV) with various frequencies are calculated from high frequency data and "optimum" frequency is suggested using PCA. Specifically, RVs with various frequencies are compared with existing daily volatilities such as Cholesky, EWMA and BEKK after dimension reduction via PCA. An analysis of high frequency stock prices of KOSPI, Samsung Electronics and Hyundai motor company is illustrated.

A Sliding Window-based Multivariate Stream Data Classification (슬라이딩 윈도우 기반 다변량 스트림 데이타 분류 기법)

  • Seo, Sung-Bo;Kang, Jae-Woo;Nam, Kwang-Woo;Ryu, Keun-Ho
    • Journal of KIISE:Databases
    • /
    • v.33 no.2
    • /
    • pp.163-174
    • /
    • 2006
  • In distributed wireless sensor network, it is difficult to transmit and analyze the entire stream data depending on limited networks, power and processor. Therefore it is suitable to use alternative stream data processing after classifying the continuous stream data. We propose a classification framework for continuous multivariate stream data. The proposed approach works in two steps. In the preprocessing step, it takes input as a sliding window of multivariate stream data and discretizes the data in the window into a string of symbols that characterize the signal changes. In the classification step, it uses a standard text classification algorithm to classify the discretized data in the window. We evaluated both supervised and unsupervised classification algorithms. For supervised, we tested Bayesian classifier and SVM, and for unsupervised, we tested Jaccard, TFIDF Jaro and Jaro Winkler. In our experiments, SVM and TFIDF outperformed other classification methods. In particular, we observed that classification accuracy is improved when the correlation of attributes is also considered along with the n-gram tokens of symbols.

VaR Estimation of Multivariate Distribution Using Copula Functions (Copula 함수를 이용한 이변량분포의 VaR 추정)

  • Hong, Chong-Sun;Lee, Jae-Hyung
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.3
    • /
    • pp.523-533
    • /
    • 2011
  • Most nancial preference methods for market risk management are to estimate VaR. In many real cases, it happens to obtain the VaRs of the univariate as well as multivariate distributions based on multivariate data. Copula functions are used to explore the dependence of non-normal random variables and generate the corresponding multivariate distribution functions in this work. We estimate Archimedian Copula functions including Clayton Copula, Gumbel Copula, Frank Copula that are tted to the multivariate earning rate distribution, and then obtain their VaRs. With these Copula functions, we estimate the VaRs of both a certain integrated industry and individual industries. The parameters of three kinds of Copula functions are estimated for an illustrated stock data of two Korean industries to obtain the VaR of the bivariate distribution and those of the corresponding univariate distributions. These VaRs are compared with those obtained from other methods to discuss the accuracy of the estimations.

Water Temperature Prediction Study Using Feature Extraction and Reconstruction based on LSTM-Autoencoder

  • Gu-Deuk Song;Su-Hyun Park
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.11
    • /
    • pp.13-20
    • /
    • 2023
  • In this paper, we propose a water temperature prediction method using feature extraction and reconstructed data based on LSTM-Autoencoder. We used multivariate time series data such as sea surface water temperature in the Naksan area of the East Sea where the cold water zone phenomenon occurred, and wind direction and wind speed that affect water temperature. Using the LSTM-Autoencoder model, we used three types of data: feature data extracted through dimensionality reduction of the original data combined with multivariate data of the original data, reconstructed data, and original data. The three types of data were trained by the LSTM model to predict sea surface water temperature and evaluated the accuracy. As a result, the sea surface water temperature prediction accuracy using feature extraction of LSTM-Autoencoder confirmed the best performance with MAE 0.3652, RMSE 0.5604, MAPE 3.309%. The result of this study are expected to be able to prevent damage from natural disasters by improving the prediction accuracy of sea surface temperature changes rapidly such as the cold water zone.

The Evaluation of Water Quality Using a Multivariate Analysis in Changnyeong-Haman weir section (다변량 통계분석을 이용한 낙동강 창녕함안보 구간의 수질 특성 평가)

  • Gwak, Bo-ra;Kim, Il-kyu
    • Journal of Korean Society of Water and Wastewater
    • /
    • v.29 no.6
    • /
    • pp.625-632
    • /
    • 2015
  • The study of water environment system using a multivariate analysis in Changnyeong-Haman weir section has been conducted. The purpose of this study is to establish better understanding related water qualities in the Changnyeong-Haman weir section which can provide useful information. The data were consisted of water quality data and algae data including WT(water temperature), pH, DO, EC, COD, SS, T-N, $NH_3-N$, T-P, $PO_4-P$, Chl-a, TOC, d-silica, t-silica, Cyanobacteria, Diatoms, and Green algae. Statistical analyses used in this study were correlation analysis, principal components, and factor analysis. According to correlation analysis on COD and TOC, it revealed that the each value of correlation coefficient was 0.843. On the other result, a negative correlation was observed between diatoms and d-silica. Furthermore, the results of principal component analysis to the overall water quality were classified into four main factors with contribution rate 81.071%.

Using SEER Data to Quantify Effects of Low Income Neighborhoods on Cause Specific Survival of Skin Melanoma

  • Cheung, Min Rex
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.14 no.5
    • /
    • pp.3219-3221
    • /
    • 2013
  • Background: This study used receiver operating characteristic (ROC) curves to screen Surveillance, Epidemiology and End Results (SEER) skin melanoma data to identify and quantify the effects of socioeconomic factors on cause specific survival. Methods: 'SEER cause-specific death classification' used as the outcome variable. The area under the ROC curve was to select best pretreatment predictors for further multivariate analysis with socioeconomic factors. Race and other socioeconomic factors including rural-urban residence, county level % college graduate and county level family income were used as predictors. Univariate and multivariate analyses were performed to identify and quantify the independent socioeconomic predictors. Results: This study included 49,999 parients. The mean follow up time (SD) was 59.4 (17.1) months. SEER staging (ROC area of 0.08) was the most predictive foctor. Race, lower county family income, rural residence, and lower county education attainment were significant univariates, but rural residence was not significant under multivariate analysis. Living in poor neighborhoods was associated with a 2-4% disadvantage in actuarial cause specific survival. Conclusions: Racial and socioeconomic factors have a significant impact on the survival of melanoma patients. This generates the hypothesis that ensuring access to cancer care may eliminate these outcome disparities.

Artificial Neural Networks for Interest Rate Forecasting based on Structural Change : A Comparative Analysis of Data Mining Classifiers

  • Oh, Kyong-Joo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.14 no.3
    • /
    • pp.641-651
    • /
    • 2003
  • This study suggests the hybrid models for interest rate forecasting using structural changes (or change points). The basic concept of this proposed model is to obtain significant intervals caused by change points, to identify them as the change-point groups, and to reflect them in interest rate forecasting. The model is composed of three phases. The first phase is to detect successive structural changes in the U. S. Treasury bill rate dataset. The second phase is to forecast the change-point groups with data mining classifiers. The final phase is to forecast interest rates with backpropagation neural networks (BPN). Based on this structure, we propose three hybrid models in terms of data mining classifier: (1) multivariate discriminant analysis (MDA)-supported model, (2) case-based reasoning (CBR)-supported model, and (3) BPN-supported model. Subsequently, we compare these models with a neural network model alone and, in addition, determine which of three classifiers (MDA, CBR and BPN) can perform better. For interest rate forecasting, this study then examines the prediction ability of hybrid models to reflect the structural change.

  • PDF

INVITED PAPER MULTIVARIATE ANALYSIS FOR THE CASE WHEN THE DIMENSION IS LARGE COMPARED TO THE SAMPLE SIZE

  • Fujikoshi, Yasunori
    • Journal of the Korean Statistical Society
    • /
    • v.33 no.1
    • /
    • pp.1-24
    • /
    • 2004
  • This paper is concerned with statistical methods for multivariate data when the number p of variables is large compared to the sample size n. Such data appear typically in analysis of DNA microarrays, curve data, financial data, etc. However, there is little statistical theory for high dimensional data. On the other hand, there are some asymptotic results under the assumption that both and p tend to $\infty$, in some ratio p/n ${\rightarrow}$c. The results suggest that the new asymptotic results are more useful and insightful than the classical large sample asymptotics. The main purpose of this paper is to review some asymptotic results for high dimensional statistics as well as classical statistics under a high dimensional asymptotic framework.

Regional Geological Mapping by Principal Component Analysis of the Landsat TM Data in a Heavily Vegetated Area (식생이 무성한 지역에서의 Principal Component Analysis 에 의한 Landsat TM 자료의 광역지질도 작성)

  • 朴鍾南;徐延熙
    • Korean Journal of Remote Sensing
    • /
    • v.4 no.1
    • /
    • pp.49-60
    • /
    • 1988
  • Principal Component Analysis (PCA) was applied for regional geological mapping to a multivariate data set of the Landsat TM data in the heavily vegetated and topographically rugged Chungju area. The multivariate data set selection was made by statistical analysis based on the magnitude of regression of squares in multiple regression, and it includes R1/2/R3/4, R2/3, R5/7/R4/3, R1/2, R3/4. R4/3. AND R4/5. As a result of application of PCA, some of later principal components (in this study PC 3 and PC 5) are geologically more significant than earlier major components, PC 1 and PC 2 herein. The earlier two major components which comprise 96% of the total information of the data set, mainly represent reflectance of vegetation and topographic effects, while though the rest represent 3% of the total information which statistically indicates the information unstable, geological significance of PC3 and PC5 in the study implies that application of the technique in more favorable areas should lead to much better results.