• Title/Summary/Keyword: data for analysis

Search Result 73,396, Processing Time 0.082 seconds

Automatic Generation of Issue Analysis Report Based on Social Big Data Mining (소셜 빅데이터 마이닝 기반 이슈 분석보고서 자동 생성)

  • Heo, Jeong;Lee, Chung Hee;Oh, Hyo Jung;Yoon, Yeo Chan;Kim, Hyun Ki;Jo, Yo Han;Ock, Cheol Young
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.3 no.12
    • /
    • pp.553-564
    • /
    • 2014
  • In this paper, we propose the system for automatic generation of issue analysis report based on social big data mining, with the purpose of resolving three problems of the previous technologies in a social media analysis and analytic report generation. Three problems are the isolation of analysis, the subjectivity of experts and the closure of information attributable to a high price. The system is comprised of the natural language query analysis, the issue analysis, the social big data analysis, the social big data correlation analysis and the automatic report generation. For the evaluation of report usefulness, we used a Likert scale and made two experts of big data analysis evaluate. The result shows that the quality of report is comparatively useful and reliable. Because of a low price of the report generation, the correlation analysis of social big data and the objectivity of social big data analysis, the proposed system will lead us to the popularization of social big data analysis.

Analysis of Combined Yeast Cell Cycle Data by Using the Integrated Analysis Program for DNA chip (DNA chip 통합분석 프로그램을 이용한 효모의 세포주기 유전자 발현 통합 데이터의 분석)

  • 양영렬;허철구
    • KSBB Journal
    • /
    • v.16 no.6
    • /
    • pp.538-546
    • /
    • 2001
  • An integrated data analysis program for DNA chip containing normalization, FDM analysis, various kinds of clustering methods, PCA, and SVD was applied to analyze combined yeast cell cycle data. This paper includes both comparisons of some clustering algorithms such as K-means, SOM and furry c-means and their results. For further analysis, clustering results from the integrated analysis program was used for function assignments to each cluster and for motif analysis. These results show an integrated analysis view on DNA chip data.

  • PDF

Speaker Verification with the Constraint of Limited Data

  • Kumari, Thyamagondlu Renukamurthy Jayanthi;Jayanna, Haradagere Siddaramaiah
    • Journal of Information Processing Systems
    • /
    • v.14 no.4
    • /
    • pp.807-823
    • /
    • 2018
  • Speaker verification system performance depends on the utterance of each speaker. To verify the speaker, important information has to be captured from the utterance. Nowadays under the constraints of limited data, speaker verification has become a challenging task. The testing and training data are in terms of few seconds in limited data. The feature vectors extracted from single frame size and rate (SFSR) analysis is not sufficient for training and testing speakers in speaker verification. This leads to poor speaker modeling during training and may not provide good decision during testing. The problem is to be resolved by increasing feature vectors of training and testing data to the same duration. For that we are using multiple frame size (MFS), multiple frame rate (MFR), and multiple frame size and rate (MFSR) analysis techniques for speaker verification under limited data condition. These analysis techniques relatively extract more feature vector during training and testing and develop improved modeling and testing for limited data. To demonstrate this we have used mel-frequency cepstral coefficients (MFCC) and linear prediction cepstral coefficients (LPCC) as feature. Gaussian mixture model (GMM) and GMM-universal background model (GMM-UBM) are used for modeling the speaker. The database used is NIST-2003. The experimental results indicate that, improved performance of MFS, MFR, and MFSR analysis radically better compared with SFSR analysis. The experimental results show that LPCC based MFSR analysis perform better compared to other analysis techniques and feature extraction techniques.

Proposed Data Literacy Competency Framework through Literature Analysis

  • Hyo-suk Kang;Suntae Kim
    • International Journal of Knowledge Content Development & Technology
    • /
    • v.14 no.3
    • /
    • pp.115-140
    • /
    • 2024
  • With the advent of the Fourth Industrial Revolution and the era of big data, the ability to handle data has become essential. This has heightened the importance and necessity of data literacy competencies. The purpose of this study is to propose a framework for data literacy competencies. To achieve this goal, data literacy frameworks from eight countries and twelve pieces of literature on data literacy competencies were analyzed and synthesized, resulting in five categories and twenty-three competencies. The five categories are: data understanding and ethics, data collection and management, data analysis and evaluation, data utilization, and data governance and systems. It is hoped that the data literacy competency framework proposed in this study will serve as a foundational resource for policies, curricula, and the enhancement of individual data literacy competencies.

Saliency Score-Based Visualization for Data Quality Evaluation

  • Kim, Yong Ki;Lee, Keon Myung
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.15 no.4
    • /
    • pp.289-294
    • /
    • 2015
  • Data analysts explore collections of data to search for valuable information using various techniques and tricks. Garbage in, garbage out is a well-recognized idiom that emphasizes the importance of the quality of data in data analysis. It is therefore crucial to validate the data quality in the early stage of data analysis, and an effective method of evaluating the quality of data is hence required. In this paper, a method to visually characterize the quality of data using the notion of a saliency score is introduced. The saliency score is a measure comprising five indexes that captures certain aspects of data quality. Some experiment results are presented to show the applicability of proposed method.

A Profile Analysis about Thermal Life Data of Electrical insulating materials at Accelerated Life Test

  • Bark, Shim-Kyu
    • Journal of Korea Multimedia Society
    • /
    • v.13 no.12
    • /
    • pp.1814-1819
    • /
    • 2010
  • Since 1987, when statistical analyzing guide for thermal life test of Accelerated Life Test(ALT) was proposed as ANSI/IEEE Std 101, this guide has been used widely for many experiment data. Shim(2004) had done Monte Carlo simulation to compare life of two different systems or materials, based on statistic values obtained from ANSI/IEEE Std 101 data. In this study, a profile analysis is proposed for comparing life of two different systems or materials, and some examples using pre-existing data are given.

A Study on Analysis of Superlarge Manufacturing Process Data for Six Sigma (6 시그마 위한 대용량 공정데이터 분석에 관한 연구)

  • 박재홍;변재현
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 2001.10a
    • /
    • pp.411-415
    • /
    • 2001
  • Advances in computer and sensor technology have made it possible to obtain superlarge manufacturing process data in real time, letting us to extract meaningful information from these superlarge data sets. We propose a systematic data analysis procedure which field engineers can apply easily to manufacture quality products. The procedure consists of data cleaning and data analysis stages. Data cleaning stage is to construct a database suitable for statistical analysis from the original superlarge manufacturing process data. In the data analysis stage, we suggest a graphical easy-to-implement approach to extract practical information from the cleaned database. This study will help manufacturing companies to achieve six sigma quality.

  • PDF

Investigating the underlying structure of particulate matter concentrations: a functional exploratory data analysis study using California monitoring data

  • Montoya, Eduardo L.
    • Communications for Statistical Applications and Methods
    • /
    • v.25 no.6
    • /
    • pp.619-631
    • /
    • 2018
  • Functional data analysis continues to attract interest because advances in technology across many fields have increasingly permitted measurements to be made from continuous processes on a discretized scale. Particulate matter is among the most harmful air pollutants affecting public health and the environment, and levels of PM10 (particles less than 10 micrometers in diameter) for regions of California remain among the highest in the United States. The relatively high frequency of particulate matter sampling enables us to regard the data as functional data. In this work, we investigate the dominant modes of variation of PM10 using functional data analysis methodologies. Our analysis provides insight into the underlying data structure of PM10, and it captures the size and temporal variation of this underlying data structure. In addition, our study shows that certain aspects of size and temporal variation of the underlying PM10 structure are associated with changes in large-scale climate indices that quantify variations of sea surface temperature and atmospheric circulation patterns.

Analysis of the Core Concepts of Middle School Informatics Textbook Using Big Data Analysis Techniques (빅데이터 분석 방법을 이용한 중학교 정보 교과서 핵심 개념 분석)

  • Woon, Daewoong;Choe, Hyunjong
    • Journal of Creative Information Culture
    • /
    • v.5 no.2
    • /
    • pp.157-164
    • /
    • 2019
  • Big data is a field that has been utilized and developed in various fields in our society recently. Big data analysis techniques are frequently used to analyze various big data in various fields of politics, economy, and society to grasp various meanings hidden in the data. However, big data analysis is used some case studies of in fields of analysis of educational data, but analysis of the curriculum and direction is still inadequate. Therefore, this study aims to identify and analyze the core concepts of middle school informatics textbooks using big data analysis techniques. Text mining was used for big data analysis for informatics textbook analysis. Through the core concepts of middle school informatics textbooks identified using this techniques, we could confirm the concepts to be emphasized in the textbooks and the possibility of using big data in the field of education.

Probabilistic Graphical Model for Transaction Data Analysis (트랜잭션 데이터 분석을 위한 확률 그래프 모형)

  • Ahn, Gil Seung;Hur, Sun
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.42 no.4
    • /
    • pp.249-255
    • /
    • 2016
  • Recently, transaction data is accumulated everywhere very rapidly. Association analysis methods are usually applied to analyze transaction data, but the methods have several problems. For example, these methods can only consider one-way relations among items and cannot reflect domain knowledge into analysis process. In order to overcome defect of association analysis methods, we suggest a transaction data analysis method based on probabilistic graphical model (PGM) in this study. The method we suggest has several advantages as compared with association analysis methods. For example, this method has a high flexibility, and can give a solution to various probability problems regarding the transaction data with relationships among items.