• Title/Summary/Keyword: Statistical Analysis Data

Search Result 9,183, Processing Time 0.036 seconds

The types and characteristics of statistical big-data graphics with emphasis on the cognitive discouragements (빅데이터 통계그래픽스의 유형 및 특정 - 인지적 방해요소를 중심으로 -)

  • Sim, Mihee;You, Sicheon
    • Smart Media Journal
    • /
    • v.3 no.3
    • /
    • pp.26-35
    • /
    • 2014
  • The statistical graphics is a design field focusing on the user perception aspects for the correct information delivery and the effective understanding, with the use of the quantitative data through the information analysis, extraction, visualization process. The statistical graphics with the big data composition factor is termed as the statistical big data graphics. In the statistical graphics the visual factors are used to reduce the errors in the perception part and to successfully deliver the information. However, in the statistical big data graphics the visual factors of the enormous data are causing the cognitive discouragements. The purpose of this study is to extract the cognitive discouragement factors from the big data statistical graphics, categorizing the types of the statistical big data graphics as 'network type', 'segment type', and 'mixed type', based on their compositional shapes, and explored the characteristics according to them. Especially, based on the visual main factors in the statistical big data graphics, We extracted the cognitive discouragement factors that appear in the high visualization as the four categories: 'multi-dimensional cases', 'various color', 'information overlap', and 'legibility of the writing'.

Analysis of massive data in astronomy (천문학에서의 대용량 자료 분석)

  • Shin, Min-Su
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.6
    • /
    • pp.1107-1116
    • /
    • 2016
  • Recent astronomical survey observations have produced substantial amounts of data as well as completely changed conventional methods of analyzing astronomical data. Both classical statistical inference and modern machine learning methods have been used in every step of data analysis that range from data calibration to inferences of physical models. We are seeing the growing popularity of using machine learning methods in classical problems of astronomical data analysis due to low-cost data acquisition using cheap large-scale detectors and fast computer networks that enable us to share large volumes of data. It is common to consider the effects of inhomogeneous spatial and temporal coverage in the analysis of big astronomical data. The growing size of the data requires us to use parallel distributed computing environments as well as machine learning algorithms. Distributed data analysis systems have not been adopted widely for the general analysis of massive astronomical data. Gathering adequate training data is expensive in observation and learning data are generally collected from multiple data sources in astronomy; therefore, semi-supervised and ensemble machine learning methods will become important for the analysis of big astronomical data.

Classification of Microarray Gene Expression Data by MultiBlock Dimension Reduction

  • Oh, Mi-Ra;Kim, Seo-Young;Kim, Kyung-Sook;Baek, Jang-Sun;Son, Young-Sook
    • Communications for Statistical Applications and Methods
    • /
    • v.13 no.3
    • /
    • pp.567-576
    • /
    • 2006
  • In this paper, we applied the multiblock dimension reduction methods to the classification of tumor based on microarray gene expressions data. This procedure involves clustering selected genes, multiblock dimension reduction and classification using linear discrimination analysis and quadratic discrimination analysis.

Comparison of Methods for Reducing the Dimension of Compositional Data with Zero Values

  • Song, Taeg-Youn;Choi, Byung-Jin
    • Communications for Statistical Applications and Methods
    • /
    • v.19 no.4
    • /
    • pp.559-569
    • /
    • 2012
  • Compositional data consist of compositions that are non-negative vectors of proportions with the unit-sum constraint. In disciplines such as petrology and archaeometry, it is fundamental to statistically analyze this type of data. Aitchison (1983) introduced a log-contrast principal component analysis that involves logratio transformed data, as a dimension-reduction technique to understand and interpret the structure of compositional data. However, the analysis is not usable when zero values are present in the data. In this paper, we introduce 4 possible methods to reduce the dimension of compositional data with zero values. Two real data sets are analyzed using the methods and the obtained results are compared.

A GEE approach for the semiparametric accelerated lifetime model with multivariate interval-censored data

  • Maru Kim;Sangbum Choi
    • Communications for Statistical Applications and Methods
    • /
    • v.30 no.4
    • /
    • pp.389-402
    • /
    • 2023
  • Multivariate or clustered failure time data often occur in many medical, epidemiological, and socio-economic studies when survival data are collected from several research centers. If the data are periodically observed as in a longitudinal study, survival times are often subject to various types of interval-censoring, creating multivariate interval-censored data. Then, the event times of interest may be correlated among individuals who come from the same cluster. In this article, we propose a unified linear regression method for analyzing multivariate interval-censored data. We consider a semiparametric multivariate accelerated failure time model as a statistical analysis tool and develop a generalized Buckley-James method to make inferences by imputing interval-censored observations with their conditional mean values. Since the study population consists of several heterogeneous clusters, where the subjects in the same cluster may be related, we propose a generalized estimating equations approach to accommodate potential dependence in clusters. Our simulation results confirm that the proposed estimator is robust to misspecification of working covariance matrix and statistical efficiency can increase when the working covariance structure is close to the truth. The proposed method is applied to the dataset from a diabetic retinopathy study.

Prediction of Extreme Sloshing Pressure Using Different Statistical Models

  • Cetin, Ekin Ceyda;Lee, Jeoungkyu;Kim, Sangyeob;Kim, Yonghwan
    • Journal of Advanced Research in Ocean Engineering
    • /
    • v.4 no.4
    • /
    • pp.185-194
    • /
    • 2018
  • In this study, the extreme sloshing pressure was predicted using various statistical models: three-parameter Weibull distribution, generalized Pareto distribution, generalized extreme value distribution, and three-parameter log-logistic distribution. The estimation of sloshing impact pressure is important in design of liquid cargo tank in severe sea state. In order to get the extreme values of local impact pressures, a lot of model tests have been carried out and statistical analysis has been performed. Three-parameter Weibull distribution and generalized Pareto distribution are widely used as the statistical analysis method in sloshing phenomenon, but generalized extreme value distribution and three-parameter log-logistic distribution are added in this study. Additionally, statistical distributions are fitted to peak pressure data using three different parameter estimation methods. The data were obtained from a three-dimensional sloshing model text conducted at Seoul National University. The loading conditions were 20%, 50%, and 95% of tank height, and the analysis was performed based on the measured impact pressure on four significant panels with large sloshing impacts. These fittings were compared by observing probability of exceedance diagrams and probability plot correlation coefficient test for goodness-of-fit.

Exploring COVID-19 in mainland China during the lockdown of Wuhan via functional data analysis

  • Li, Xing;Zhang, Panpan;Feng, Qunqiang
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.1
    • /
    • pp.103-125
    • /
    • 2022
  • In this paper, we analyze the time series data of the case and death counts of COVID-19 that broke out in China in December, 2019. The study period is during the lockdown of Wuhan. We exploit functional data analysis methods to analyze the collected time series data. The analysis is divided into three parts. First, the functional principal component analysis is conducted to investigate the modes of variation. Second, we carry out the functional canonical correlation analysis to explore the relationship between confirmed and death cases. Finally, we utilize a clustering method based on the Expectation-Maximization (EM) algorithm to run the cluster analysis on the counts of confirmed cases, where the number of clusters is determined via a cross-validation approach. Besides, we compare the clustering results with some migration data available to the public.

AUTOMATED ELECTROFACIES DETERMINATION USING MULTIVARIATE STATISTICAL ANALYSIS

  • Kim Jungwhan;Lim Jong-Se
    • 한국석유지질학회:학술대회논문집
    • /
    • spring
    • /
    • pp.10-14
    • /
    • 1998
  • A systematic methodology is developed for the electrofacies determination from wireline log data using multivariate statistical analysis. To consider corresponding contribution of each log and reduce the computational dimension, multivariate logs are transformed into a single variable through principal components analysis. Resultant principal components logs are segmented using the statistical zonation method to enhance the efficiency and quality of the interpreted results. Hierarchical cluster analysis is then used to group the segments into electrofacies. Optimal number of groups is determined on the basis of the ratio of within-group variance to total variance and core data. This technique is applied to the wells in the Korea Continental Shelf. The results of field application demonstrate that the prediction of lithology based on the electrofacies classification matches well to the core and the cutting data with high reliability This methodology for electrofacies classification can be used to define the reservoir characteristics which are helpful to the reservoir management.

  • PDF

Comparison of Parameter Estimation Methods in the Analysis of Multivariate Categorical Data with Logit Models

  • Song, Hae-Hiang
    • Journal of the Korean Statistical Society
    • /
    • v.12 no.1
    • /
    • pp.24-35
    • /
    • 1983
  • In fitting models to data, selection of the most desirable estimation method and determination of the adequacy of fitted model are the central issues. This paper compares the maximum likelihood estimators and the minimum logit chi-square estimators, both being best asymptotically normal, when logit models are fitted to infant mortality data. Chi-square goodness-of-fit test and likelihood ratio one are also compared. The analysis infant mortality data shows that the outlying observations do not necessarily result in the same impact on goodness-of-fit measures.

  • PDF

Discriminant Analysis of Binary Data by Using the Maximum Entropy Distribution

  • Lee, Jung Jin;Hwang, Joon
    • Communications for Statistical Applications and Methods
    • /
    • v.10 no.3
    • /
    • pp.909-917
    • /
    • 2003
  • Although many classification models have been used to classify binary data, none of the classification models dominates all varying circumstances depending on the number of variables and the size of data(Asparoukhov and Krzanowski (2001)). This paper proposes a classification model which uses information on marginal distributions of sub-variables and its maximum entropy distribution. Classification experiments by using simulation are discussed.