• Title/Summary/Keyword: statistical data analysis

Search Result 9,255, Processing Time 0.037 seconds

Binary classification on compositional data

  • Joo, Jae Yun;Lee, Seokho
    • Communications for Statistical Applications and Methods
    • /
    • v.28 no.1
    • /
    • pp.89-97
    • /
    • 2021
  • Due to boundedness and sum constraint, compositional data are often transformed by logratio transformation and their transformed data are put into traditional binary classification or discriminant analysis. However, it may be problematic to directly apply traditional multivariate approaches to the transformed data because class distributions are not Gaussian and Bayes decision boundary are not polynomial on the transformed space. In this study, we propose to use flexible classification approaches to transformed data for compositional data classification. Empirical studies using synthetic and real examples demonstrate that flexible approaches outperform traditional multivariate classification or discriminant analysis.

Explanatory Analysis for South Korea's Political Website Linking - Statistical Aspects

  • Choi, Kyoung-Ho;Park, Han-Woo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.16 no.4
    • /
    • pp.899-911
    • /
    • 2005
  • This paper conducts an explanatory analysis of the web sphere produced by National Assemblymen in South Korea, using some statistical methods. First, some descriptive metrics were employed. Next, the traditional methods of multi-variate analyses, multidimensional scaling and corresponding analysis, were applied to the data. Finally, cross-sectional data were compared to examine a change over time.

  • PDF

An Exploratory Study on the Approaches to the Statistical Yield and Analysis of Family Data (가족 데이터의 통계적 산출 및 분석방법에 관한탐색적 고찰)

  • 유계숙
    • Journal of Families and Better Life
    • /
    • v.14 no.1
    • /
    • pp.11-20
    • /
    • 1996
  • When data collected from more than one family member are utilized family researchers must take the correlation of family member's perception behavior or attitude scores into account viewing the couple or family as a unit of interdependent members. This paper presents a framework for categorizing family data based on the unit of analysis and several alternatives for the statistical analysis of family variables using individual- dyadic- and family-level data.

  • PDF

Rainstorm Tracking Using Statistical Analysis Method (통계적 기법을 이용한 국지성집중호우의 이동경로 분석)

  • Kim Sooyoung;Nam Woo-Sung;Heo Jun-Haeng
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2005.05b
    • /
    • pp.194-198
    • /
    • 2005
  • Although the rainstorm causes local damage on large scale, it is difficult to predict the movement of the rainstorm exactly. In order to reduce the rainstorm damage of the rainstorm, it is necessary to analyze the path of the rainstorm using various statistical methods. In addition, efficient time interval of rainfall observation for the analysis of the rainstorm movement can be derived by applying various statistical methods to rainfall data. In this study, the rainstorm tracking using statistical method is performed for various types of rainfall data. For the tracking of the rainstorm, the methods of temporal distribution, inclined Plane equations, and cross correlation were applied for various types of data including electromagnetic rainfall gauge data and AWS data. The speed and direction of each method were compared with those of real rainfall movement. In addition, the effective time interval of rainfall observation for the analysis of the rainstorm movement was also investigated for the selected time intervals 10, 20, 30, 40, 50, and 60 minutes. As a result, the absolute relative errors of the method of inclined plane equations are smaller than those of other methods in case of electromagnetic rainfall gauges data. The absolute relative errors of the method of cross correlation are smaller than those of other methods in case of AWS data. The absolute relative errors of 30 minutes or less than 30 minutes are smaller than those of other time intervals.

  • PDF

Application of Multivariate Statistical Analysis Technique in Landfill Investigation (매립물 특성 조사를 위한 다변량 통계분석 기법의 응용)

  • Kwon, Byung-Doo;Kim, Cha-Soup
    • Journal of the Korean earth science society
    • /
    • v.18 no.6
    • /
    • pp.515-521
    • /
    • 1997
  • To investigate the nature of the waste materials in the Nanjido Landfill, we have conducted multivariate statistical analysis of geophysical data set comprised of magnetic, gravity, LandSat TM thermal band and surface depression measurement data. Because these data sets show different responses to the depth, we have transformed the observed total field magnetic data and gravity data to the residual reduced-to-pole(RTP) magnetic anomalies and the three dimensional density anomalies, respectively, and utilized the informations about the upper shallow part of the landfills only in the following process. For the statistical analysis at the points of depression measurement, the magnetic, density and LandSat data values at these points are determined by interpolation process. Since the multivarite statistical analysis technique utilizes a clustering algorithm for classification of data set and we have measured the dissimilarity between objects by using Euclidean distance, standardization was applied prior to distance calculation in order to eliminate any scaling effects due to different measurement unit of each data set. The hierarchial grouping technique was used to construct the dendrogram. The optimum number of statistical groups(clusters), which are classified on the basis of geophysical and geotechnical characteristics, appeared to be six on the resulting dendrogram. The result of this study suggests that the dimension and nature of the multicomponent waste landfills can be identified by application of the multivarite statistical analysis technique to integrated geophysical data sets.

  • PDF

Analysis of Statistical Methods and Errors in the Articles Published in the Korean Journal of Pain

  • Yim, Kyoung-Hoon;Nahm, Francis Sahn-Gun;Han, Kyoung-Ah;Park, Soo-Young
    • The Korean Journal of Pain
    • /
    • v.23 no.1
    • /
    • pp.35-41
    • /
    • 2010
  • Background: Statistical analysis is essential in regard to obtaining objective reliability for medical research. However, medical researchers do not have enough statistical knowledge to properly analyze their study data. To help understand and potentially alleviate this problem, we have analyzed the statistical methods and errors of articles published in the Korean Journal of Pain (KJP), with the intention to improve the statistical quality of the journal. Methods: All the articles, except case reports and editorials, published from 2004 to 2008 in the KJP were reviewed. The types of applied statistical methods and errors in the articles were evaluated. Results: One hundred and thirty-nine original articles were reviewed. Inferential statistics and descriptive statistics were used in 119 papers and 20 papers, respectively. Only 20.9% of the papers were free from statistical errors. The most commonly adopted statistical method was the t-test (21.0%) followed by the chi-square test (15.9%). Errors of omission were encountered 101 times in 70 papers. Among the errors of omission, "no statistics used even though statistical methods were required" was the most common (40.6%). The errors of commission were encountered 165 times in 86 papers, among which "parametric inference for nonparametric data" was the most common (33.9%). Conclusions: We found various types of statistical errors in the articles published in the KJP. This suggests that meticulous attention should be given not only in the applying statistical procedures but also in the reviewing process to improve the value of the article.

Functional Data Classification of Variable Stars

  • Park, Minjeong;Kim, Donghoh;Cho, Sinsup;Oh, Hee-Seok
    • Communications for Statistical Applications and Methods
    • /
    • v.20 no.4
    • /
    • pp.271-281
    • /
    • 2013
  • This paper considers a problem of classification of variable stars based on functional data analysis. For a better understanding of galaxy structure and stellar evolution, various approaches for classification of variable stars have been studied. Several features that explain the characteristics of variable stars (such as color index, amplitude, period, and Fourier coefficients) were usually used to classify variable stars. Excluding other factors but focusing only on the curve shapes of variable stars, Deb and Singh (2009) proposed a classification procedure using multivariate principal component analysis. However, this approach is limited to accommodate some features of the light curve data that are unequally spaced in the phase domain and have some functional properties. In this paper, we propose a light curve estimation method that is suitable for functional data analysis, and provide a classification procedure for variable stars that combined the features of a light curve with existing functional data analysis methods. To evaluate its practical applicability, we apply the proposed classification procedure to the data sets of variable stars from the project STellar Astrophysics and Research on Exoplanets (STARE).

A Suggestion to Establish Statistical Treatment Guideline for Aircraft Manufacturer (국산 복합재료 시험데이터 처리지침 수립을 위한 제언)

  • Suh, Jangwon
    • Journal of Aerospace System Engineering
    • /
    • v.8 no.4
    • /
    • pp.39-43
    • /
    • 2014
  • This paper examines the statistical process that should be performed with caution in the composite material qualification and equivalency process, and describes statistically significant considerations on outlier finding and handling process, data pooling through normalization process, review for data distributions and design allowables determination process for structural analysis. Based on these considerations, the need for guidance on statistical process for aircraft manufacturers who use the composite material properties database are proposed.

An Analysis on Error Types of Graphs for Statistical Literacy Education: Ethical Problems at Data Analysis in the Statistical Problem Solving (통계적 소양 교육을 위한 그래프 오류 유형 분석: 자료 분석 단계에서의 통계 윤리 문제)

  • Tak, Byungjoo;Kim, Dabin
    • Journal of Elementary Mathematics Education in Korea
    • /
    • v.24 no.1
    • /
    • pp.1-30
    • /
    • 2020
  • This study was carried out in order to identify the error types of statistical graphs for statistical literacy education. We analyze the meaning of using graphs in statistical problem solving, and identify categories, frequencies, and contexts as the components of statistical graphs. Error types of representing categories and frequencies make statistics consumers see incorrect distributions of data by subjective point of view of statistics producers and visual illusion. Error types of providing contexts hinder the interpretation of statistical information by concealing or twisting the contexts of data. Moreover, the findings show that tasks provide standardized frame already for drawing graphs in order to avoid errors and pay attention to the process of drawing the graph rather than statistical literacy for analyzing data. We suggest some implications about statistical literacy education, ethical problems, and knowledge for teaching to be considered when teaching the statistical graph in elementary mathematics classes.

Analyzing seventh graders' statistical thinking through statistical processes by phases and instructional settings (통계적 과정의 학습에서 나타난 중학교 1학년 학생들의 단계별·수업 형태별 통계적 사고 분석)

  • Kim, Ga Young;Kim, Rae Young
    • The Mathematical Education
    • /
    • v.58 no.3
    • /
    • pp.459-481
    • /
    • 2019
  • This study aims to investigate students' statistical thinking through statistical processes in different instructional settings: Teacher-centered instruction vs. student-centered learning. We first developed instructional materials that allowed students to experience all the processes of statistics, including data collection, data analysis, data representation, and interpretation of the results. Using the instructional materials for four classes, we collected and analyzed the data from 57 seventh graders' discourse and artifacts from two different instructional settings using the analytic framework generated on the basis of literature review. The results showed that students felt difficulty particularly in the process of data collection and graph representations. In addition, even though data description has been heavily emphasized for data analysis in statistics education, it is surprisingly discovered that students had a hard time to understand the relationship between data and representations. Also, there were relationships between students' statistical thinking and instructional settings. Even though both groups of students showed difficulty in data collection and graph representations of the data, there were significant differences between the groups in terms of their performance. Whereas students from student-centered learning class outperformed in making decisions considering verification and justification, students from teacher-centered lecture class did better in problems requiring accuracy than the counterpart. The results from the study provide meaningful implications on developing curriculum and instructional methods for statistics education.