• Title/Summary/Keyword: Multivariate Data

검색결과 1,968건 처리시간 0.024초

다사건 시계열 자료 분석을 위한 베이지안 기반의 통계적 접근의 응용 (A Bayesian Approach for the Analysis of Times to Multiple Events : An Application on Healthcare Data)

  • 석준희;강영선
    • 한국경영과학회지
    • /
    • 제39권4호
    • /
    • pp.51-69
    • /
    • 2014
  • Times to multiple events (TMEs) are a major data type in large-scale business and medical data. Despite its importance, the analysis of TME data has not been well studied because of the analysis difficulty from censoring of observation. To address this difficulty, we have developed a Bayesian-based multivariate survival analysis method, which can successfully estimate the joint probability density of survival times. In this work, we extended this method for the analysis of precedence, dependency and causality among multiple events. We applied this method to the electronic health records of 2,111 patients in a children's hospital in the US and the proposed analysis successfully shows the relation between times to two types of hospital visits for different medical issues. The overall result implies the usefulness of the multivariate survival analysis method in large-scale big data in a variety of areas including marketing, human resources, and e-commerce. Lastly, we suggest our future research directions based multivariate survival analysis method.

Applications of response dimension reduction in large p-small n problems

  • Minjee Kim;Jae Keun Yoo
    • Communications for Statistical Applications and Methods
    • /
    • 제31권2호
    • /
    • pp.191-202
    • /
    • 2024
  • The goal of this paper is to show how multivariate regression analysis with high-dimensional responses is facilitated by the response dimension reduction. Multivariate regression, characterized by multi-dimensional response variables, is increasingly prevalent across diverse fields such as repeated measures, longitudinal studies, and functional data analysis. One of the key challenges in analyzing such data is managing the response dimensions, which can complicate the analysis due to an exponential increase in the number of parameters. Although response dimension reduction methods are developed, there is no practically useful illustration for various types of data such as so-called large p-small n data. This paper aims to fill this gap by showcasing how response dimension reduction can enhance the analysis of high-dimensional response data, thereby providing significant assistance to statistical practitioners and contributing to advancements in multiple scientific domains.

Multivariate Process Control Chart for Controlling the False Discovery Rate

  • Park, Jang-Ho;Jun, Chi-Hyuck
    • Industrial Engineering and Management Systems
    • /
    • 제11권4호
    • /
    • pp.385-389
    • /
    • 2012
  • With the development of computer storage and the rapidly growing ability to process large amounts of data, the multivariate control charts have received an increasing attention. The existing univariate and multivariate control charts are a single hypothesis testing approach to process mean or variance by using a single statistic plot. This paper proposes a multiple hypothesis approach to developing a new multivariate control scheme. Plotted Hotelling's $T^2$ statistics are used for computing the corresponding p-values and the procedure for controlling the false discovery rate in multiple hypothesis testing is applied to the proposed control scheme. Some numerical simulations were carried out to compare the performance of the proposed control scheme with the ordinary multivariate Shewhart chart in terms of the average run length. The results show that the proposed control scheme outperforms the existing multivariate Shewhart chart for all mean shifts.

다변량 스트림 데이터 축소 기법 평가 (Evaluation of Multivariate Stream Data Reduction Techniques)

  • 정훈조;서성보;최경주;박정석;류근호
    • 정보처리학회논문지D
    • /
    • 제13D권7호
    • /
    • pp.889-900
    • /
    • 2006
  • 센서 네트워크는 애플리케이션 분야에 따라 데이터 특성과 사용자의 요구사항이 다양함에도 불구하고, 현존하는 스트림 데이터 축소 연구는 데이터의 본질적인 특징보다 특정 축소 기법의 성능 향상 측면에 중점을 두고 있다. 이 논문은 계층/분산형 센서 네트워크 구조와 데이터 모델을 소개하고, 선택적으로 축소 기법을 적용하기 위해 데이터 특성과 사용자의 요구에 적합한 다변량 데이터 축소 기법을 비교 평가한다. 다변량 데이터 축소 기법의 성능을 비교 분석하기 위해, 우리는 웨이블릿, HCL(Hierarchical Clustering), SVD(Singular Value Decomposition), 샘플링과 같은 표준화 된 다변량 축소 기법을 이용한다. 실험 데이터는 다차원 시계열 데이터와 로봇 센서 데이터를 사용한다. 실험 결과 SVD와 샘플링 기법이 상대 에러 비율과 수행 성능 측면에서 웨이블릿과 HCL기법에 비해 우수하였다. 특히 각 데이터 축소 기법의 상대 에러 비율은 입력 데이터 특성에 따라 다르기 때문에 선택적으로 데이터 축소 기법을 적용하는 것이 좋은 성능을 보였다. 이 논문은 다차원 센서 데이터가 수집되는 센서 네트워크를 디자인하고 구축하는 응용 분야에 유용하게 활용될 것이다.

다변량 통합공정관리에서 재수정 절차 (A readjustment procedure in the multivariate integrated process control)

  • 조교영;박종숙
    • Journal of the Korean Data and Information Science Society
    • /
    • 제22권6호
    • /
    • pp.1123-1135
    • /
    • 2011
  • 다변량 통합공정관리의 기본절차는 잡음이 내재하는 공정에 수정조치를 취하여 공정편차를 백색 잡음으로 전환하도록 하여 공정제곱편차를 최소화하게 되는 것이며, 이러한 다변량 통합공정관리의 수정활동을 하는 경우 공정에 이상원인이 발생하면 관리도를 통해 이를 탐지하고 제거하게 된다. 수정된 공정은 이상원인 발생 전에는 백색잡음이지만, 이상원인 발생 후 다양한 형태의 시계열 모형으로 변환하게 된다. 만약 수정된 공정을 탐지하여 이상원인의 신호가 발생한 경우 교정활동을 통하여 이를 제거해야 하지만, 구조적으로 교정이 불가능 하거나 교정활동의 비용이 많이 발생하는 경우에는이상원인의 효과를 감안하여 수정활동을 재조정해야할 것이다. 이 논문에서는 공정모형으로 다변량 IMA(1,1)모형을 가정하고 다변량 통합공정관리 절차를 수행하는 경우 이상신호가 발생한 후 재수정 절차를 제안한다.

A Test for Multivariate Normality Focused on Elliptical Symmetry Using Mahalanobis Distances

  • 박철용
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 한국데이터정보과학회 2006년도 PROCEEDINGS OF JOINT CONFERENCEOF KDISS AND KDAS
    • /
    • pp.203-212
    • /
    • 2006
  • A chi-squared test of multivariate normality is suggested which is mainly focused on detecting deviations from elliptical symmetry. This test uses Mahalanobis distances of observations to have some power for deviations from multivariate normality. We derive the limiting distribution of the test statistic by a conditional limit theorem. A simulation study is conducted to study the accuracy of the limiting distribution in finite samples. Finally, we compare the power of our method with those of other popular tests of multivariate normality under two non-normal distributions.

  • PDF

Zone, 다변량 $T^2$, ARIMA를 이용한 통합관리도의 적용방안 (Implementation of Integrated Control Chart Using Zone, Multivariate $T^2$ and ARIMA)

  • 최성운
    • 대한안전경영과학회:학술대회논문집
    • /
    • 대한안전경영과학회 2010년도 춘계학술대회
    • /
    • pp.259-265
    • /
    • 2010
  • The research discusses the implementation of control charts tools of MINITAB which are classified according to the type of data and the existence of subgrouping, weight and multivariate covariance. The paper presents the three integrated models by the use of zone, multivariate $T^2$-GV(Generalized Variance) and ARIMA(Autoregressive Integrated Moving Average).

  • PDF

Review of Data-Driven Multivariate and Multiscale Methods

  • Park, Cheolsoo
    • IEIE Transactions on Smart Processing and Computing
    • /
    • 제4권2호
    • /
    • pp.89-96
    • /
    • 2015
  • In this paper, time-frequency analysis algorithms, empirical mode decomposition and local mean decomposition, are reviewed and their applications to nonlinear and nonstationary real-world data are discussed. In addition, their generic extensions to complex domain are addressed for the analysis of multichannel data. Simulations of these algorithms on synthetic data illustrate the fundamental structure of the algorithms and how they are designed for the analysis of nonlinear and nonstationary data. Applications of the complex version of the algorithms to the synthetic data also demonstrate the benefit of the algorithms for the accurate frequency decomposition of multichannel data.

Binary classification on compositional data

  • Joo, Jae Yun;Lee, Seokho
    • Communications for Statistical Applications and Methods
    • /
    • 제28권1호
    • /
    • pp.89-97
    • /
    • 2021
  • Due to boundedness and sum constraint, compositional data are often transformed by logratio transformation and their transformed data are put into traditional binary classification or discriminant analysis. However, it may be problematic to directly apply traditional multivariate approaches to the transformed data because class distributions are not Gaussian and Bayes decision boundary are not polynomial on the transformed space. In this study, we propose to use flexible classification approaches to transformed data for compositional data classification. Empirical studies using synthetic and real examples demonstrate that flexible approaches outperform traditional multivariate classification or discriminant analysis.

On Assessing Inter-observer Agreement Independent of Variables' Measuring Units

  • Um, Yong-Hwan
    • Journal of the Korean Data and Information Science Society
    • /
    • 제17권2호
    • /
    • pp.529-536
    • /
    • 2006
  • Investigators use either Euclidean distance or volume of a simplex defined composed of data points as agreement index to measure chance-corrected agreement among observers for multivariate interval data. The agreement coefficient proposed by Um(2004) is based on a volume of a simplex and does not depend on the variables' measuring units. We consider a comparison of Um(2004)'s agreement coefficient with others based on two unit-free distance measures, Pearson distance and Mahalanobis distance. Comparison among them is made using hypothetical data set.

  • PDF