• Title/Summary/Keyword: methods of data analysis

Search Result 19,359, Processing Time 0.049 seconds

Perspectives and Current Developments for NVH Data Acquisition and Analysis

  • Hobelsberger, Josef
    • Proceedings of the Korean Society for Noise and Vibration Engineering Conference
    • /
    • 2012.04a
    • /
    • pp.439-440
    • /
    • 2012
  • New analysis methods complement classical approaches in the vehicle NVH development by reducing and accelerating iteration steps to obtain a target sound. Therefore, tools are required that allow an integrative approach of sound engineering and structural analysis and enable a precise simulation and modification based on measured data. The Response Modification Analysis (RMA) is such a hybrid solution, which provides indications of relevant transfer paths taking into account the sensitivity of response channels to modifications of reference channels.

  • PDF

Experimental Analysis of Equilibrization in Binary Classification for Non-Image Imbalanced Data Using Wasserstein GAN

  • Wang, Zhi-Yong;Kang, Dae-Ki
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.11 no.4
    • /
    • pp.37-42
    • /
    • 2019
  • In this paper, we explore the details of three classic data augmentation methods and two generative model based oversampling methods. The three classic data augmentation methods are random sampling (RANDOM), Synthetic Minority Over-sampling Technique (SMOTE), and Adaptive Synthetic Sampling (ADASYN). The two generative model based oversampling methods are Conditional Generative Adversarial Network (CGAN) and Wasserstein Generative Adversarial Network (WGAN). In imbalanced data, the whole instances are divided into majority class and minority class, where majority class occupies most of the instances in the training set and minority class only includes a few instances. Generative models have their own advantages when they are used to generate more plausible samples referring to the distribution of the minority class. We also adopt CGAN to compare the data augmentation performance with other methods. The experimental results show that WGAN-based oversampling technique is more stable than other approaches (RANDOM, SMOTE, ADASYN and CGAN) even with the very limited training datasets. However, when the imbalanced ratio is too small, generative model based approaches cannot achieve satisfying performance than the conventional data augmentation techniques. These results suggest us one of future research directions.

Operation Modes Classification of Chemical Processes for History Data-Based Fault Diagnosis Methods (데이터 기반 이상진단법을 위한 화학공정의 조업모드 판별)

  • Lee, Chang Jun;Ko, Jae Wook;Lee, Gibaek
    • Korean Chemical Engineering Research
    • /
    • v.46 no.2
    • /
    • pp.383-388
    • /
    • 2008
  • The safe and efficient operation of the chemical processes has become one of the primary concerns of chemical companies, and a variety of fault diagnosis methods have been developed to diagnose faults when abnormal situations arise. Recently, many research efforts have focused on fault diagnosis methods based on quantitative history data-based methods such as statistical models. However, when the history data-based models trained with the data obtained on an operation mode are applied to another operating condition, the models can make continuous wrong diagnosis, and have limits to be applied to real chemical processes with various operation modes. In order to classify operation modes of chemical processes, this study considers three multivariate models of Euclidean distance, FDA (Fisher's Discriminant Analysis), and PCA (principal component analysis), and integrates them with process dynamics to lead dynamic Euclidean distance, dynamic FDA, and dynamic PCA. A case study of the TE (Tennessee Eastman) process having six operation modes illustrates the conclusion that dynamic PCA model shows the best classification performance.

Higher-order solutions for generalized canonical correlation analysis

  • Kang, Hyuncheol
    • Communications for Statistical Applications and Methods
    • /
    • v.26 no.3
    • /
    • pp.305-313
    • /
    • 2019
  • Generalized canonical correlation analysis (GCCA) extends the canonical correlation analysis (CCA) to the case of more than two sets of variables and there have been many studies on how two-set canonical solutions can be generalized. In this paper, we derive certain stationary equations which can lead the higher-order solutions of several GCCA methods and suggest a type of iterative procedure to obtain the canonical coefficients. In addition, with some numerical examples we present the methods for graphical display, which are useful to interpret the GCCA results obtained.

PLS Path Modeling to Investigate the Relations between Competencies of Data Scientist and Big Data Analysis Performance : Focused on Kaggle Platform (데이터 사이언티스트의 역량과 빅데이터 분석성과의 PLS 경로모형분석 : Kaggle 플랫폼을 중심으로)

  • Han, Gyeong Jin;Cho, Keuntae
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.42 no.2
    • /
    • pp.112-121
    • /
    • 2016
  • This paper focuses on competencies of data scientists and behavioral intention that affect big data analysis performance. This experiment examined nine core factors required by data scientists. In order to investigate this, we conducted a survey to gather data from 103 data scientists who participated in big data competition at Kaggle platform and used factor analysis and PLS-SEM for the analysis methods. The results show that some key competency factors have influential effect on the big data analysis performance. This study is to provide a new theoretical basis needed for relevant research by analyzing the structural relationship between the individual competencies and performance, and practically to identify the priorities of the core competencies that data scientists must have.

Local linear regression analysis for interval-valued data

  • Jang, Jungteak;Kang, Kee-Hoon
    • Communications for Statistical Applications and Methods
    • /
    • v.27 no.3
    • /
    • pp.365-376
    • /
    • 2020
  • Interval-valued data, a type of symbolic data, is given as an interval in which the observation object is not a single value. It can also occur frequently in the process of aggregating large databases into a form that is easy to manage. Various regression methods for interval-valued data have been proposed relatively recently. In this paper, we introduce a nonparametric regression model using the kernel function and a nonlinear regression model for the interval-valued data. We also propose applying the local linear regression model, one of the nonparametric methods, to the interval-valued data. Simulations based on several distributions of the center point and the range are conducted using each of the methods presented in this paper. Various conditions confirm that the performance of the proposed local linear estimator is better than the others.

Bayesian pooling for contingency tables from small areas

  • Jo, Aejung;Kim, Dal Ho
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.6
    • /
    • pp.1621-1629
    • /
    • 2016
  • This paper studies Bayesian pooling for analysis of categorical data from small areas. Many surveys consist of categorical data collected on a contingency table in each area. Statistical inference for small areas requires considerable care because the subpopulation sample sizes are usually very small. Typically we use the hierarchical Bayesian model for pooling subpopulation data. However, the customary hierarchical Bayesian models may specify more exchangeability than warranted. We, therefore, investigate the effects of pooling in hierarchical Bayesian modeling for the contingency table from small areas. In specific, this paper focuses on the methods of direct or indirect pooling of categorical data collected on a contingency table in each area through Dirichlet priors. We compare the pooling effects of hierarchical Bayesian models by fitting the simulated data. The analysis is carried out using Markov chain Monte Carlo methods.

Bayesian Pattern Mixture Model for Longitudinal Binary Data with Nonignorable Missingness

  • Kyoung, Yujung;Lee, Keunbaik
    • Communications for Statistical Applications and Methods
    • /
    • v.22 no.6
    • /
    • pp.589-598
    • /
    • 2015
  • In longitudinal studies missing data are common and require a complicated analysis. There are two popular modeling frameworks, pattern mixture model (PMM) and selection models (SM) to analyze the missing data. We focus on the PMM and we also propose Bayesian pattern mixture models using generalized linear mixed models (GLMMs) for longitudinal binary data. Sensitivity analysis is used under the missing not at random assumption.

A Study on the Comparison of Optimal Solutions by Major Forecasting Methods - For the case of the cement product - (주요(主要) 수요예측기법(需要豫測技法)에 의한 최적해(最適解)의 비교연구(比較硏究) - 시멘트제품(製品)의 경우(境遇)를 중심(中心)으로 -)

  • Jeong, Bok-Su
    • Journal of Korean Society for Quality Management
    • /
    • v.12 no.2
    • /
    • pp.25-32
    • /
    • 1984
  • The purpose of this paper is to compare several forecasting methods for the case of the cement product by the analysis of the forecasting data and by the study of major forecasting methods, which are the Trend Projection, Exponential Smoothing, and Multiple Regression Analysis. As a result, it is thought that the Multiple Regression Analysis is the optimal model for the case of the cement product. In addition, it is important to consider the future circumstances for forecasting, and to improve the level of the forecasting results through the precise analysis of the collected data.

  • PDF

Advanced Big Data Analysis, Artificial Intelligence & Communication Systems

  • Jeong, Young-Sik;Park, Jong Hyuk
    • Journal of Information Processing Systems
    • /
    • v.15 no.1
    • /
    • pp.1-6
    • /
    • 2019
  • Recently, big data and artificial intelligence (AI) based on communication systems have become one of the hottest issues in the technology sector, and methods of analyzing big data using AI approaches are now considered essential. This paper presents diverse paradigms to subjects which deal with diverse research areas, such as image segmentation, fingerprint matching, human tracking techniques, malware distribution networks, methods of intrusion detection, digital image watermarking, wireless sensor networks, probabilistic neural networks, query processing of encrypted data, the semantic web, decision-making, software engineering, and so on.