• Title/Summary/Keyword: data measures

Search Result 6,013, Processing Time 0.029 seconds

Impact of Outliers on the Statistical Measures of the Environmental Monitoring Data in Busan Coastal Sea (이상자료가 연안 환경자료의 통계 척도에 미치는 영향)

  • Cho, Hong-Yeon;Lee, Ki-Seop;Ahn, Soon-Mo
    • Ocean and Polar Research
    • /
    • v.38 no.2
    • /
    • pp.149-159
    • /
    • 2016
  • The statistical measures of the coastal environmental data are used in a variety of statistical inferences, hypothesis tests, and data-driven modeling. If the measures are biased, then the statistical estimations and models may also be biased and this potential for bias is great when data contain some outliers defined as extraordinary large or small data values. This study aims to suggest more robust statistical measures as alternatives to more commonly used measures and to assess the performance these robust measures through a quantitative evaluation of more typical measures, such as in terms of locations, spreads, and shapes, with regard to environmental monitoring data in the Busan coastal sea. The detection of outliers within the data was carried out on the basis of Rosner's test. About 5-10% of the nutrient data were found to contain outliers based on Rosner's test. After removal (zero-weighting) of the outliers in the data sets, the relative change ratios of the mean and standard deviation between before and after outlier-removal conditions revealed the figures 13 and 33%, respectively. The variation magnitudes of skewness and kurtosis are 1.36 and 8.11 in a decreasing trend, respectively. On the other hand, the change ratios for more robust measures regarding the mean and standard deviation are 3.7-10.5%, and the variation magnitudes of robust skewness and kurtosis are about only 2-4% of the magnitude of the non-robust measures. The robust measures can be regarded as outlier-resistant statistical measures based on the relatively small changes in the scenarios before and after outlier removal conditions.

Computation Procedures of Reliability Measures for Interval Data (구간 데이터에 대한 신뢰성 척도 산정 절차)

  • Choi, Sung-Woon
    • Journal of the Korea Safety Management & Science
    • /
    • v.9 no.2
    • /
    • pp.149-159
    • /
    • 2007
  • This paper is to propose two computation procedures of reliability measures for large interval data. First method is efficient to verify the relationship among four reliability measures such as F(t), R(t), f(t) and $\lambda(t)$. Another method is effective to interpret the concept of various reliability measures. This study is also to reinterpret and recompute the errors of four reliability measures discovered in the reliability textbooks. Various numerical examples are presented to illustrate the application of two proposed procedures.

Correlation Measure for Big Data (빅데이터에서의 상관성 측도)

  • Jeong, Hai Sung
    • Journal of Applied Reliability
    • /
    • v.18 no.3
    • /
    • pp.208-212
    • /
    • 2018
  • Purpose: The three Vs of volume, velocity and variety are commonly used to characterize different aspects of Big Data. Volume refers to the amount of data, variety refers to the number of types of data and velocity refers to the speed of data processing. According to these characteristics, the size of Big Data varies rapidly, some data buckets will contain outliers, and buckets might have different sizes. Correlation plays a big role in Big Data. We need something better than usual correlation measures. Methods: The correlation measures offered by traditional statistics are compared. And conditions to meet the characteristics of Big Data are suggested. Finally the correlation measure that satisfies the suggested conditions is recommended. Results: Mutual Information satisfies the suggested conditions. Conclusion: This article builds on traditional correlation measures to analyze the co-relation between two variables. The conditions for correlation measures to meet the characteristics of Big Data are suggested. The correlation measure that satisfies these conditions is recommended. It is Mutual Information.

Leverage Measures in Nonlinear Regression

  • Kahng, Myung-Wook
    • Journal of the Korean Data and Information Science Society
    • /
    • v.18 no.1
    • /
    • pp.229-235
    • /
    • 2007
  • Measures of leverage in nonlinear regression models are discussed by extending the leverage in linear regression models. The connection between measures of leverage and nonlinearity of the models are explored. Illustrative example based on real data is presented.

  • PDF

The Development of Relative Interestingness Measure for Comparing with Degrees of Association

  • Park, Hee-Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.19 no.4
    • /
    • pp.1269-1279
    • /
    • 2008
  • Data mining is the technique to find useful information in huge databases. One of the well-studied problems in data mining is exploration for association rules. An association rule technique finds the relation among each items in massive volume databases by several interestingness measures. An important and useful classification scheme of interestingness measures may be based on user-involvement. This results in two categories - objective and subjective measures. This paper present some relative interestingess measures to compare with degrees of association for two groups. A comparative study with some relative interestingness measures is shown by numerical example. The results show that the relative net confidence is the best relative interestingness measure.

  • PDF

Big Data Analytics of Construction Safety Incidents Using Text Mining (텍스트 마이닝을 활용한 건설안전사고 빅데이터 분석)

  • Jeong Uk Seo;Chie Hoon Song
    • Journal of the Korean Society of Industry Convergence
    • /
    • v.27 no.3
    • /
    • pp.581-590
    • /
    • 2024
  • This study aims to extract key topics through text mining of incident records (incident history, post-incident measures, preventive measures) from construction safety accident case data available on the public data portal. It also seeks to provide fundamental insights contributing to the establishment of manuals for disaster prevention by identifying correlations between these topics. After pre-processing the input data, we used the LDA-based topic modeling technique to derive the main topics. Consequently, we obtained five topics related to incident history, and four topics each related to post-incident measures and preventive measures. Although no dominant patterns emerged from the topic pattern analysis, the study holds significance as it provides quantitative information on the follow-up actions related to the incident history, thereby suggesting practical implications for the establishment of a preventive decision-making system through the linkage between accident history and subsequent measures for reccurrence prevention.

Exploration of relationship between confirmation measures and association thresholds (기준 확인 측도와 연관성 평가기준과의 관계 탐색)

  • Park, Hee Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.4
    • /
    • pp.835-845
    • /
    • 2013
  • Association rule of data mining techniques is the method to quantify the relevance between a set of items in a big database, andhas been applied in various fields like manufacturing industry, shopping mall, healthcare, insurance, and education. Philosophers of science have proposed interestingness measures for various kinds of patterns, analyzed their theoretical properties, evaluated them empirically, and suggested strategies to select appropriate measures for particular domains and requirements. Such interestingness measures are divided into objective, subjective, and semantic measures. Objective measures are based on data used in the discovery process and are typically motivated by statistical considerations. Subjective measures take into account not only the data but also the knowledge and interests of users who examine the pattern, while semantic measures additionally take into account utility and actionability. In a very different context, researchers have devoted a lot of attention to measures of confirmation or evidential support. The focus in this paper was on asymmetric confirmation measures, and we compared confirmation measures with basic association thresholds using some simulation data. As the result, we could distinguish the direction of association rule by confirmation measures, and interpret degree of association operationally by them. Futhermore, the result showed that the measure by Rips and that by Kemeny and Oppenheim were better than other confirmation measures.

Learning City Performance Measurement and Performance Measure Weighting Decision based on DEA Method (DEA를 활용한 성과평가 지표의 가중치 결정모형 구축 : 평생학습도시 성과평가 지표 적용 사례를 중심으로)

  • Lim, Hwan;Sohn, Myung-Ho
    • Journal of Information Technology Services
    • /
    • v.9 no.4
    • /
    • pp.109-121
    • /
    • 2010
  • Most organizations adopt their own performance measurement systems. Those organizations select performance measures to meet their goals. Organizations can give only limited description of what performance measures are. Kaplan and Norton suggest that the Balanced Scorecard (BSC) to complement the conventional performance measures. The BSC can provide management system with a comprehensive strategic vision and integrates non-financial measures with financial measures. The BSC is widely used for measuring corporate performance. This paper investigates how the BSC-based performance measures can be applied to Learning City. The Learning City's performance measures and strategy map on the basis of the BSC are suggested in this research. This paper adopt the AR(assurance region)-DEA model which could limit the range of weight on performance measures to prevent each viewpoint of BSC from having unlimited elasticity. The proposed model is based on CCR model including a property of unit invariance to use the data without normalization process.

Analysis of latent growth model using repeated measures ANOVA in the data from KYPS (청소년패널자료 분석에서의 반복측정분산분석을 활용한 잠재성장모형)

  • Lee, Hwa-Jung;Kang, Suk-Bok
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.6
    • /
    • pp.1409-1419
    • /
    • 2013
  • We analyzed the data from KYPS using the latent growth model which has been widely studied as an analysis method of longitudinal data. In this study, we applied repeated measures ANOVA to unconditional model in order for faster decision of the unconditional model of the latent growth model. Also, we compared the six-type models, the quadratic model and the model of which repeated measures ANOVA is applied.

Overview of Reliability Rank Measures for Small Sample (소표본인 경우 신뢰성 순위 척도의 고찰)

  • Choi, Sung-Woon
    • Journal of the Korea Safety Management & Science
    • /
    • v.9 no.2
    • /
    • pp.161-169
    • /
    • 2007
  • This paper presents three methods for expression of reliability measures for large and small data. First method is to express parametric estimation of cardinal reliability measure data for large sample, which requires numerous sample. Second is to obtain nonparametric distribution classification of ordinal reliability measure data for small sample. However it is difficult for field user to understand this method. Last method is to acquire parametric estimation of ordinal reliability measure data for small data. Because this method requires small sample and is comprehensive, we recommend this one among the proposed methods. Various reliability rank measures are presented.