• Title/Summary/Keyword: data correlation

Search Result 19,808, Processing Time 0.044 seconds

Correlation Measure for Big Data (빅데이터에서의 상관성 측도)

  • Jeong, Hai Sung
    • Journal of Applied Reliability
    • /
    • v.18 no.3
    • /
    • pp.208-212
    • /
    • 2018
  • Purpose: The three Vs of volume, velocity and variety are commonly used to characterize different aspects of Big Data. Volume refers to the amount of data, variety refers to the number of types of data and velocity refers to the speed of data processing. According to these characteristics, the size of Big Data varies rapidly, some data buckets will contain outliers, and buckets might have different sizes. Correlation plays a big role in Big Data. We need something better than usual correlation measures. Methods: The correlation measures offered by traditional statistics are compared. And conditions to meet the characteristics of Big Data are suggested. Finally the correlation measure that satisfies the suggested conditions is recommended. Results: Mutual Information satisfies the suggested conditions. Conclusion: This article builds on traditional correlation measures to analyze the co-relation between two variables. The conditions for correlation measures to meet the characteristics of Big Data are suggested. The correlation measure that satisfies these conditions is recommended. It is Mutual Information.

Nonlinear Canonical Correlation Analysis for Paralysis Disease Data

  • Shin, Yang-Kyu
    • Journal of the Korean Data and Information Science Society
    • /
    • v.15 no.3
    • /
    • pp.515-521
    • /
    • 2004
  • Categorical data are mostly found in oriental medical research. The nonlinear canonical correlation analysis does not assume an interval level of measurement. In this paper, we apply nonlinear canonical correlation analysis to quantification and explain how similar sets of variables are to one another for paralysis disease data.

  • PDF

Secure Multi-Party Computation of Correlation Coefficients (상관계수의 안전한 다자간 계산)

  • Hong, Sun-Kyong;Kim, Sang-Pil;Lim, Hyo-Sang;Moon, Yang-Sae
    • Journal of KIISE
    • /
    • v.41 no.10
    • /
    • pp.799-809
    • /
    • 2014
  • In this paper, we address the problem of computing Pearson correlation coefficients and Spearman's rank correlation coefficients in a secure manner while data providers preserve privacy of their own data in distributed environment. For a data mining or data analysis in the distributed environment, data providers(data owners) need to share their original data with each other. However, the original data may often contain very sensitive information, and thus, data providers do not prefer to disclose their original data for preserving privacy. In this paper, we formally define the secure correlation computation, SCC in short, as the problem of computing correlation coefficients in the distributed computing environment while preserving the data privacy (i.e., not disclosing the sensitive data) of multiple data providers. We then present SCC solutions for Pearson and Spearman's correlation coefficients using secure scalar product. We show the correctness and secure property of the proposed solutions by presenting theorems and proving them formally. We also empirically show that the proposed solutions can be used for practical applications in the performance aspect.

A Study on Prediction of Linear Relations Between Variables According to Working Characteristics Using Correlation Analysis

  • Kim, Seung Jae
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.14 no.4
    • /
    • pp.228-239
    • /
    • 2022
  • Many countries around the world using ICT technologies have various technologies to keep pace with the 4th industrial revolution, and various algorithms and systems have been developed accordingly. Among them, many industries and researchers are investing in unmanned automation systems based on AI. At the time when new technology development and algorithms are developed, decision-making by big data analysis applied to AI systems must be equipped with more sophistication. We apply, Pearson's correlation analysis is applied to six independent variables to find out the job satisfaction that office workers feel according to their job characteristics. First, a correlation coefficient is obtained to find out the degree of correlation for each variable. Second, the presence or absence of correlation for each data is verified through hypothesis testing. Third, after visualization processing using the size of the correlation coefficient, the degree of correlation between data is investigated. Fourth, the degree of correlation between variables will be verified based on the correlation coefficient obtained through the experiment and the results of the hypothesis test

A study on Applicability through Comparison of Weather Data based on Micro-climate with existing Weather Data for Building Performative Design (건물 성능디자인을 위한 미기후 기반 기상데이터의 기존 기상데이터와 비교를 통한 활용 가능성 연구)

  • Kim, Eon-Yong;Jun, Han-Jong
    • KIEAE Journal
    • /
    • v.11 no.6
    • /
    • pp.101-108
    • /
    • 2011
  • The weather data has important role for performative building design. If the data location is close to building site, the result of performative design can be accurate. The data which have used nowadays in Korea are from U.S. Department of Energy (DOE) and Korea Solar Energy Society (KSES) but they cover only several locations in Korea which are 4 in DOE and 11 in KSES and there are opinions which it could be served building design efficiently even if the data are not enough. However the weather data for micro-climate are exist which are Green Building Studio Virtual Weather Station (GBS VWS) and Meteonorm weather data. Each weather data has different generation methods which are TMY2, TRY, MM5, and extrapolation. In this research, the weather date for climate are compared with DOE and KSES to check correlation. The result shows the value of correlation in Dry Bulb Temp. and Dew Point Temp. is around 0.9 so they have high correlation in both but in Wind Speed case the correlation(around 0.2) is not exist. In overall result, the data has correlation with DOE and KSES as the value of correlation 0.648 of GBS VW and 0.656 of Meteonorm. Even if the correlation value is not high enough, the patterns of difference in each weather element are similar in scatter plot.

CMP cross-correlation analysis of multi-channel surface-wave data

  • Hayashi Koichi;Suzuki Haruhiko
    • Geophysics and Geophysical Exploration
    • /
    • v.7 no.1
    • /
    • pp.7-13
    • /
    • 2004
  • In this paper, we demonstrate that Common Mid-Point (CMP) cross-correlation gathers of multi-channel and multi-shot surface waves give accurate phase-velocity curves, and enable us to reconstruct two-dimensional (2D) velocity structures with high resolution. Data acquisition for CMP cross-correlation analysis is similar to acquisition for a 2D seismic reflection survey. Data processing seems similar to Common Depth-Point (CDP) analysis of 2D seismic reflection survey data, but differs in that the cross-correlation of the original waveform is calculated before making CMP gathers. Data processing in CMP cross-correlation analysis consists of the following four steps: First, cross-correlations are calculated for every pair of traces in each shot gather. Second, correlation traces having a common mid-point are gathered, and those traces that have equal spacing are stacked in the time domain. The resultant cross-correlation gathers resemble shot gathers and are referred to as CMP cross-correlation gathers. Third, a multi-channel analysis is applied to the CMP cross-correlation gathers for calculating phase velocities of surface waves. Finally, a 2D S-wave velocity profile is reconstructed through non-linear least squares inversion. Analyses of waveform data from numerical modelling and field observations indicate that the new method could greatly improve the accuracy and resolution of subsurface S-velocity structure, compared with conventional surface-wave methods.

An Analysis of Correlation between Personality and Visiting Place using Spearman's Rank Correlation Coefficient

  • Song, Ha Yoon;Park, Seongjin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.5
    • /
    • pp.1951-1966
    • /
    • 2020
  • Recent advancements in mobile device technology have enabled real-time positioning so that mobile patterns of people and favorable locations can be identified and related researches have become plentiful. One of the fields of research is the relationship between the object properties and the favored location to visit. The object properties of a person include personality, which is a major property jobs, income, gender, and age. In this study, we analyzed the relationship between the human personality and the preference of the location to visit. We used Spearman's Rank correlation coefficient, one of the many methods that can be used to determine the correlation between two variables. Instead of using actual data values, Spearman's Rank correlation coefficient deals with the ranks of the two data sets. In our research, the personality and the location data sets are used. Our personality data is ranked in five ranks and the location data is ranked in 8 ranks. Spearman's Rank correlation coefficient showed better results compared to Pearson linear correlation coefficient and Kendall rank correlation coefficient. Using Spearman's correlation coefficient, the degree of the relationship between the personality and the location preference is found to be 43%.

A New Estimation Model for Wireless Sensor Networks Based on the Spatial-Temporal Correlation Analysis

  • Ren, Xiaojun;Sug, HyonTai;Lee, HoonJae
    • Journal of information and communication convergence engineering
    • /
    • v.13 no.2
    • /
    • pp.105-112
    • /
    • 2015
  • The estimation of missing sensor values is an important problem in sensor network applications, but the existing approaches have some limitations, such as the limitations of application scope and estimation accuracy. Therefore, in this paper, we propose a new estimation model based on a spatial-temporal correlation analysis (STCAM). STCAM can make full use of spatial and temporal correlations and can recognize whether the sensor parameters have a spatial correlation or a temporal correlation, and whether the missing sensor data are continuous. According to the recognition results, STCAM can choose one of the most suitable algorithms from among linear interpolation algorithm of temporal correlation analysis (TCA-LI), multiple regression algorithm of temporal correlation analysis (TCA-MR), spatial correlation analysis (SCA), spatial-temporal correlation analysis (STCA) to estimate the missing sensor data. STCAM was evaluated over Intel lab dataset and a traffic dataset, and the simulation experiment results show that STCAM has good estimation accuracy.

Correlation Analysis for Correlation Dimesion of EEG and Cold-heat Score (뇌파의 상관차원과 한열설문지와의 상관분석)

  • Bas, No-Soo;Park, Young-Jae;Oh, Hwan-Sup;Park, Young-Bae
    • The Journal of the Society of Korean Medicine Diagnostics
    • /
    • v.11 no.2
    • /
    • pp.116-127
    • /
    • 2007
  • Background and Purpose: Acording to chaos theory, irregular signals of electroencephalogram can interpretated by nonlinear method. Chaotic nonlinear dynamics in EEG can be studied by calculating the correlation dimension. The aim of this study is to analyze EEG by correlation dimension and do Correlation Analysis of correlation dimension and cold-heat score Method: EEG raw data were measured during 15 minutes and choosed 40 seconds. We calculated correlation dimension and used surrogate data method for checking nonlinear data. After then do correlation analysis Result and Conclusion: Correlation dimension of channel 7 and channel 8 are showed significant correlation with cold score.

  • PDF

Correlation over Nonlinear Analysis of EEG and POMS Factor (뇌파와 POMS(Profile of Mood States)의 상관성 연구)

  • Kim, Dong-Won;Park, Young-Bae;Park, Young-Jae;Heo, Young
    • The Journal of the Society of Korean Medicine Diagnostics
    • /
    • v.11 no.2
    • /
    • pp.68-83
    • /
    • 2007
  • Background and Purpose: According to chaos theory, irregular signals of electroencephalogram can interpretated by nonlinear method. Chaotic nonlinear dynamics in EEG can be studied by calculating the correlation dimension. The aim of this study is to analyze EEG by correlation dimension and do Correlation Analysis of correlation dimension and K-POMS factors score. Method: EEG raw data were measured during 15 minutes and choosed 40 seconds. We calculated correlation dimension and used surrogate data method for checking nonlinear data. After then do correlation analysis. Result and Conclusion: Correlation dimension of channel 6, channel 7 and channel 8 are showed significant correlation with vigor factor.

  • PDF