• Title/Summary/Keyword: multivariate data

Search Result 1,967, Processing Time 0.031 seconds

A Bayesian Approach for the Analysis of Times to Multiple Events : An Application on Healthcare Data (다사건 시계열 자료 분석을 위한 베이지안 기반의 통계적 접근의 응용)

  • Seok, Junhee;Kang, Yeong Seon
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.39 no.4
    • /
    • pp.51-69
    • /
    • 2014
  • Times to multiple events (TMEs) are a major data type in large-scale business and medical data. Despite its importance, the analysis of TME data has not been well studied because of the analysis difficulty from censoring of observation. To address this difficulty, we have developed a Bayesian-based multivariate survival analysis method, which can successfully estimate the joint probability density of survival times. In this work, we extended this method for the analysis of precedence, dependency and causality among multiple events. We applied this method to the electronic health records of 2,111 patients in a children's hospital in the US and the proposed analysis successfully shows the relation between times to two types of hospital visits for different medical issues. The overall result implies the usefulness of the multivariate survival analysis method in large-scale big data in a variety of areas including marketing, human resources, and e-commerce. Lastly, we suggest our future research directions based multivariate survival analysis method.

Applications of response dimension reduction in large p-small n problems

  • Minjee Kim;Jae Keun Yoo
    • Communications for Statistical Applications and Methods
    • /
    • v.31 no.2
    • /
    • pp.191-202
    • /
    • 2024
  • The goal of this paper is to show how multivariate regression analysis with high-dimensional responses is facilitated by the response dimension reduction. Multivariate regression, characterized by multi-dimensional response variables, is increasingly prevalent across diverse fields such as repeated measures, longitudinal studies, and functional data analysis. One of the key challenges in analyzing such data is managing the response dimensions, which can complicate the analysis due to an exponential increase in the number of parameters. Although response dimension reduction methods are developed, there is no practically useful illustration for various types of data such as so-called large p-small n data. This paper aims to fill this gap by showcasing how response dimension reduction can enhance the analysis of high-dimensional response data, thereby providing significant assistance to statistical practitioners and contributing to advancements in multiple scientific domains.

Multivariate Process Control Chart for Controlling the False Discovery Rate

  • Park, Jang-Ho;Jun, Chi-Hyuck
    • Industrial Engineering and Management Systems
    • /
    • v.11 no.4
    • /
    • pp.385-389
    • /
    • 2012
  • With the development of computer storage and the rapidly growing ability to process large amounts of data, the multivariate control charts have received an increasing attention. The existing univariate and multivariate control charts are a single hypothesis testing approach to process mean or variance by using a single statistic plot. This paper proposes a multiple hypothesis approach to developing a new multivariate control scheme. Plotted Hotelling's $T^2$ statistics are used for computing the corresponding p-values and the procedure for controlling the false discovery rate in multiple hypothesis testing is applied to the proposed control scheme. Some numerical simulations were carried out to compare the performance of the proposed control scheme with the ordinary multivariate Shewhart chart in terms of the average run length. The results show that the proposed control scheme outperforms the existing multivariate Shewhart chart for all mean shifts.

Evaluation of Multivariate Stream Data Reduction Techniques (다변량 스트림 데이터 축소 기법 평가)

  • Jung, Hung-Jo;Seo, Sung-Bo;Cheol, Kyung-Joo;Park, Jeong-Seok;Ryu, Keun-Ho
    • The KIPS Transactions:PartD
    • /
    • v.13D no.7 s.110
    • /
    • pp.889-900
    • /
    • 2006
  • Even though sensor networks are different in user requests and data characteristics depending on each application area, the existing researches on stream data transmission problem focus on the performance improvement of their methods rather than considering the original characteristic of stream data. In this paper, we introduce a hierarchical or distributed sensor network architecture and data model, and then evaluate the multivariate data reduction methods suitable for user requirements and data features so as to apply reduction methods alternatively. To assess the relative performance of the proposed multivariate data reduction methods, we used the conventional techniques, such as Wavelet, HCL(Hierarchical Clustering), Sampling and SVD (Singular Value Decomposition) as well as the experimental data sets, such as multivariate time series, synthetic data and robot execution failure data. The experimental results shows that SVD and Sampling method are superior to Wavelet and HCL ia respect to the relative error ratio and execution time. Especially, since relative error ratio of each data reduction method is different according to data characteristic, it shows a good performance using the selective data reduction method for the experimental data set. The findings reported in this paper can serve as a useful guideline for sensor network application design and construction including multivariate stream data.

A readjustment procedure in the multivariate integrated process control (다변량 통합공정관리에서 재수정 절차)

  • Cho, Gyo-Young;Park, Jong-Suk
    • Journal of the Korean Data and Information Science Society
    • /
    • v.22 no.6
    • /
    • pp.1123-1135
    • /
    • 2011
  • This paper considers the multivariate integrated process control procedure for detecting special causes in a multivariate IMA(1, 1) process. When the multivariate control chart signals, the special cause will be detected and eliminated from the process. However, when the elimination of the special cause costs high or is not practically possible, an alternative action is to readjust the process with approximately modified adjustment scheme. In this paper, we propose the readjustment procedure after having a true signal, and show that the use of the readjustment can reduce the deviation of a process from the target.

A Test for Multivariate Normality Focused on Elliptical Symmetry Using Mahalanobis Distances

  • Park, Cheol-Yong
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 2006.04a
    • /
    • pp.203-212
    • /
    • 2006
  • A chi-squared test of multivariate normality is suggested which is mainly focused on detecting deviations from elliptical symmetry. This test uses Mahalanobis distances of observations to have some power for deviations from multivariate normality. We derive the limiting distribution of the test statistic by a conditional limit theorem. A simulation study is conducted to study the accuracy of the limiting distribution in finite samples. Finally, we compare the power of our method with those of other popular tests of multivariate normality under two non-normal distributions.

  • PDF

Implementation of Integrated Control Chart Using Zone, Multivariate $T^2$ and ARIMA (Zone, 다변량 $T^2$, ARIMA를 이용한 통합관리도의 적용방안)

  • Choi, Sung-Woon
    • Proceedings of the Safety Management and Science Conference
    • /
    • 2010.04a
    • /
    • pp.259-265
    • /
    • 2010
  • The research discusses the implementation of control charts tools of MINITAB which are classified according to the type of data and the existence of subgrouping, weight and multivariate covariance. The paper presents the three integrated models by the use of zone, multivariate $T^2$-GV(Generalized Variance) and ARIMA(Autoregressive Integrated Moving Average).

  • PDF

Review of Data-Driven Multivariate and Multiscale Methods

  • Park, Cheolsoo
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.4 no.2
    • /
    • pp.89-96
    • /
    • 2015
  • In this paper, time-frequency analysis algorithms, empirical mode decomposition and local mean decomposition, are reviewed and their applications to nonlinear and nonstationary real-world data are discussed. In addition, their generic extensions to complex domain are addressed for the analysis of multichannel data. Simulations of these algorithms on synthetic data illustrate the fundamental structure of the algorithms and how they are designed for the analysis of nonlinear and nonstationary data. Applications of the complex version of the algorithms to the synthetic data also demonstrate the benefit of the algorithms for the accurate frequency decomposition of multichannel data.

Binary classification on compositional data

  • Joo, Jae Yun;Lee, Seokho
    • Communications for Statistical Applications and Methods
    • /
    • v.28 no.1
    • /
    • pp.89-97
    • /
    • 2021
  • Due to boundedness and sum constraint, compositional data are often transformed by logratio transformation and their transformed data are put into traditional binary classification or discriminant analysis. However, it may be problematic to directly apply traditional multivariate approaches to the transformed data because class distributions are not Gaussian and Bayes decision boundary are not polynomial on the transformed space. In this study, we propose to use flexible classification approaches to transformed data for compositional data classification. Empirical studies using synthetic and real examples demonstrate that flexible approaches outperform traditional multivariate classification or discriminant analysis.

On Assessing Inter-observer Agreement Independent of Variables' Measuring Units

  • Um, Yong-Hwan
    • Journal of the Korean Data and Information Science Society
    • /
    • v.17 no.2
    • /
    • pp.529-536
    • /
    • 2006
  • Investigators use either Euclidean distance or volume of a simplex defined composed of data points as agreement index to measure chance-corrected agreement among observers for multivariate interval data. The agreement coefficient proposed by Um(2004) is based on a volume of a simplex and does not depend on the variables' measuring units. We consider a comparison of Um(2004)'s agreement coefficient with others based on two unit-free distance measures, Pearson distance and Mahalanobis distance. Comparison among them is made using hypothetical data set.

  • PDF