• 제목/요약/키워드: analysis data

검색결과 85,083건 처리시간 0.085초

Improving Interpretability of Multivariate Data Through Rotations of Artificial Variates

  • Hwang, S.Y.;Park, A.M.
    • Journal of the Korean Data and Information Science Society
    • /
    • 제15권2호
    • /
    • pp.297-306
    • /
    • 2004
  • It is usual that multivariate data analysis produces related (small number of) artificial variates for data reduction. Among them, refer to MDS(multidimensional scaling), MDPREF(multidimensional preference analysis), CDA(canonical discriminant analysis), CCA(canonical correlation analysis) and FA(factor analysis). Varimax rotation of artificial variables which is originally invented in FA for easy interpretations is applied to diverse multivariate techniques mentioned above. Real data analysisis is performed in order to manifest that rotation improves interpretations of artificial variables.

  • PDF

스마트팜을 위한 웹 기반 데이터 분석 서비스 (Web-Based Data Analysis Service for Smart Farms)

  • 정지민;이지현;노혜민
    • 정보처리학회논문지:소프트웨어 및 데이터공학
    • /
    • 제11권9호
    • /
    • pp.355-362
    • /
    • 2022
  • 농업에 정보 통신 기술을 접목한 스마트팜은 단순한 생육 환경 모니터링에서 벗어나 작물 생육을 위한 최적의 환경을 발견하고 자율제어가 가능한 농업의 형태로 나아가고 있다. 이를 위해서는 관련 데이터를 수집하는 것도 중요하지만, 재배 경험과 지식을 가진 농업인 사용자들이 수집된 데이터를 다양한 관점에서 분석하여 작물 생육 환경 제어에 유용한 정보를 도출해야 할 필요가 있다. 본 연구에서는 작물 생육과 관련된 데이터를 가지고 필요한 정보를 얻고자 하는 농업인 사용자가 쉽게 데이터 분석을 할 수 있는 웹 서비스를 개발하였다. 개발한 웹 기반 데이터 분석 서비스는 데이터 분석을 위하여 R 언어를 사용하며 Node.js를 위한 익스프레스 웹 애플리케이션 프레임워크를 기반으로 개발하였다. 데이터 분석 서비스를 운영 중인 생육 환경 모니터링 시스템과 함께 적용해 본 결과 사용자는 웹 상에서 CSV 형식의 파일을 입력하거나 직접 데이터 입력함으로써 서버가 제공하는 데이터 분석을 위한 R 스크립트를 실행하여 데이터 분석을 수행할 수 있었다. 서비스 제공자는 다양한 데이터 분석 서비스를 쉽게 제공할 수 있었고, R 스크립트만 새로 추가하면 애플리케이션에 대한 수정 없이 새로운 데이터 분석 서비스 추가가 용이함을 확인하였다.

국방 C5ISR 분야 품질문제의 빅데이터 분석 및 예측 모델에 대한 연구 (A Study on the Big Data Analysis and Predictive Models for Quality Issues in Defense C5ISR)

  • 허형조;고수진;백승현
    • 품질경영학회지
    • /
    • 제51권4호
    • /
    • pp.551-571
    • /
    • 2023
  • Purpose: The purpose of this study is to propose useful suggestions by analyzing the causal effect relationship between the failure rate of quality and the process variables in the C5ISR domain of the defense industry. Methods: The collected data through the in house Systems were analyzed using Big data analysis. Data analysis between quality data and A/S history data was conducted using the CRISP-DM(Cross-Industry Standard Process for Data Mining) analysis process. Results: The results of this study are as follows: After evaluating the performance of candidate models for the influence of inspection data and A/S history data, logistic regression was selected as the final model because it performed relatively well compared to the decision tree with an accuracy of 82%/67% and an AUC of 0.66/0.57. Based on this model, we estimated the coefficients using 'R', a data analysis tool, and found that a specific variable(continuous maximum discharge current time) had a statistically significant effect on the A/S quality failure rate and it was analysed that 82% of the failure rate could be predicted. Conclusion: As the first case of applying big data analysis to quality issues in the defense industry, this study confirms that it is possible to improve the market failure rates of defense products by focusing on the measured values of the main causes of failures derived through the big data analysis process, and identifies improvements, such as the number of data samples and data collection limitations, to be addressed in subsequent studies for a more reliable analysis model.

TRAPR: R Package for Statistical Analysis and Visualization of RNA-Seq Data

  • Lim, Jae Hyun;Lee, Soo Youn;Kim, Ju Han
    • Genomics & Informatics
    • /
    • 제15권1호
    • /
    • pp.51-53
    • /
    • 2017
  • High-throughput transcriptome sequencing, also known as RNA sequencing (RNA-Seq), is a standard technology for measuring gene expression with unprecedented accuracy. Numerous bioconductor packages have been developed for the statistical analysis of RNA-Seq data. However, these tools focus on specific aspects of the data analysis pipeline, and are difficult to appropriately integrate with one another due to their disparate data structures and processing methods. They also lack visualization methods to confirm the integrity of the data and the process. In this paper, we propose an R-based RNA-Seq analysis pipeline called TRAPR, an integrated tool that facilitates the statistical analysis and visualization of RNA-Seq expression data. TRAPR provides various functions for data management, the filtering of low-quality data, normalization, transformation, statistical analysis, data visualization, and result visualization that allow researchers to build customized analysis pipelines.

Finding a plan to improve recognition rate using classification analysis

  • Kim, SeungJae;Kim, SungHwan
    • International journal of advanced smart convergence
    • /
    • 제9권4호
    • /
    • pp.184-191
    • /
    • 2020
  • With the emergence of the 4th Industrial Revolution, core technologies that will lead the 4th Industrial Revolution such as AI (artificial intelligence), big data, and Internet of Things (IOT) are also at the center of the topic of the general public. In particular, there is a growing trend of attempts to present future visions by discovering new models by using them for big data analysis based on data collected in a specific field, and inferring and predicting new values with the models. In order to obtain the reliability and sophistication of statistics as a result of big data analysis, it is necessary to analyze the meaning of each variable, the correlation between the variables, and multicollinearity. If the data is classified differently from the hypothesis test from the beginning, even if the analysis is performed well, unreliable results will be obtained. In other words, prior to big data analysis, it is necessary to ensure that data is well classified according to the purpose of analysis. Therefore, in this study, data is classified using a decision tree technique and a random forest technique among classification analysis, which is a machine learning technique that implements AI technology. And by evaluating the degree of classification of the data, we try to find a way to improve the classification and analysis rate of the data.

Big Data Analysis of the Women Who Score Goal Sports Entertainment Program: Focusing on Text Mining and Semantic Network Analysis.

  • Hyun-Myung, Kim;Kyung-Won, Byun
    • International Journal of Internet, Broadcasting and Communication
    • /
    • 제15권1호
    • /
    • pp.222-230
    • /
    • 2023
  • The purpose of this study is to provide basic data on sports entertainment programs by collecting data on unstructured data generated by Naver and Google for SBS entertainment program 'Women Who Score Goal', which began regular broadcast in June 2021, and analyzing public perceptions through data mining, semantic matrix, and CONCOR analysis. Data collection was conducted using Textom, and 27,911 cases of data accumulated for 16 months from June 16, 2021 to October 15, 2022. For the collected data, 80 key keywords related to 'Kick a Goal' were derived through simple frequency and TF-IDF analysis through data mining. Semantic network analysis was conducted to analyze the relationship between the top 80 keywords analyzed through this process. The centrality was derived through the UCINET 6.0 program using NetDraw of UCINET 6.0, understanding the characteristics of the network, and visualizing the connection relationship between keywords to express it clearly. CONCOR analysis was conducted to derive a cluster of words with similar characteristics based on the semantic network. As a result of the analysis, it was analyzed as a 'program' cluster related to the broadcast content of 'Kick a Goal' and a 'Soccer' cluster, a sports event of 'Kick a Goal'. In addition to the scenes about the game of the cast, it was analyzed as an 'Everyday Life' cluster about training and daily life, and a cluster about 'Broadcast Manipulation' that disappointed viewers with manipulation of the game content.

Finding the Optimal Data Classification Method Using LDA and QDA Discriminant Analysis

  • Kim, SeungJae;Kim, SungHwan
    • 통합자연과학논문집
    • /
    • 제13권4호
    • /
    • pp.132-140
    • /
    • 2020
  • With the recent introduction of artificial intelligence (AI) technology, the use of data is rapidly increasing, and newly generated data is also rapidly increasing. In order to obtain the results to be analyzed based on these data, the first thing to do is to classify the data well. However, when classifying data, if only one classification technique belonging to the machine learning technique is applied to classify and analyze it, an error of overfitting can be accompanied. In order to reduce or minimize the problems caused by misclassification of the classification system such as overfitting, it is necessary to derive an optimal classification by comparing the results of each classification by applying several classification techniques. If you try to interpret the data with only one classification technique, you will have poor reasoning and poor predictions of results. This study seeks to find a method for optimally classifying data by looking at data from various perspectives and applying various classification techniques such as LDA and QDA, such as linear or nonlinear classification, as a process before data analysis in data analysis. In order to obtain the reliability and sophistication of statistics as a result of big data analysis, it is necessary to analyze the meaning of each variable and the correlation between the variables. If the data is classified differently from the hypothesis test from the beginning, even if the analysis is performed well, unreliable results will be obtained. In other words, prior to big data analysis, it is necessary to ensure that data is well classified to suit the purpose of analysis. This is a process that must be performed before reaching the result by analyzing the data, and it may be a method of optimal data classification.

센서 데이터를 이용한 전기 기관차의 이상 상태 요인분석 (Failure Analysis to Derive the Causes of Abnormal Condition of Electric Locomotive Subsystem)

  • 소민섭;전홍배;신종호
    • 산업경영시스템학회지
    • /
    • 제41권2호
    • /
    • pp.84-94
    • /
    • 2018
  • In recent years, the diminishing of operation and maintenance cost using advanced maintenance technology is attracting many companies' attention. Especially, the heavy machinery industry regards it as a crucial problem since a failure of heavy machinery requires high cost and long downtime. To improve the current maintenance process, the heavy machinery industry tries to develop a methodology to predict failure in advance and to find its causes using usage data. A better analysis of failure causes requires more data so that various kinds of sensor are attached to machines and abundant amount of product usage data is collected through the sensor network. However, the systemic analysis of the collected product usage data is still in its infant stage. Many previous works have focused on failure occurrence as statistical data for reliability analysis. There have been less works to apply product usage data into root cause analysis of product failure. The product usage data collected while failures occur should be considered failure cause analysis. To do this, this study proposes a methodology to apply product usage data into failure cause analysis. The proposed methodology in this study is composed of several steps to transform product usage into failure causes. Various statistical analysis combined with product usage data such as multinomial logistic regression, T-test, and so on are used for the root cause analysis. The proposed methodology is applied to field data coming from operated locomotive and the analysis result shows its effectiveness.

Big Data Analysis on the Perception of Home Training According to the Implementation of COVID-19 Social Distancing

  • Hyun-Chang Keum;Kyung-Won Byun
    • International Journal of Internet, Broadcasting and Communication
    • /
    • 제15권3호
    • /
    • pp.211-218
    • /
    • 2023
  • Due to the implementation of COVID-19 distancing, interest and users in 'home training' are rapidly increasing. Therefore, the purpose of this study is to identify the perception of 'home training' through big data analysis on social media channels and provide basic data to related business sector. Social media channels collected big data from various news and social content provided on Naver and Google sites. Data for three years from March 22, 2020 were collected based on the time when COVID-19 distancing was implemented in Korea. The collected data included 4,000 Naver blogs, 2,673 news, 4,000 cafes, 3,989 knowledge IN, and 953 Google channel news. These data analyzed TF and TF-IDF through text mining, and through this, semantic network analysis was conducted on 70 keywords, big data analysis programs such as Textom and Ucinet were used for social big data analysis, and NetDraw was used for visualization. As a result of text mining analysis, 'home training' was found the most frequently in relation to TF with 4,045 times. The next order is 'exercise', 'Homt', 'house', 'apparatus', 'recommendation', and 'diet'. Regarding TF-IDF, the main keywords are 'exercise', 'apparatus', 'home', 'house', 'diet', 'recommendation', and 'mat'. Based on these results, 70 keywords with high frequency were extracted, and then semantic indicators and centrality analysis were conducted. Finally, through CONCOR analysis, it was clustered into 'purchase cluster', 'equipment cluster', 'diet cluster', and 'execute method cluster'. For the results of these four clusters, basic data on the 'home training' business sector were presented based on consumers' main perception of 'home training' and analysis of the meaning network.

Utilization of Log Data Reflecting User Information-Seeking Behavior in the Digital Library

  • Lee, Seonhee;Lee, Jee Yeon
    • Journal of Information Science Theory and Practice
    • /
    • 제10권1호
    • /
    • pp.73-88
    • /
    • 2022
  • This exploratory study aims to understand the potential of log data analysis and expand its utilization in user research methods. Transaction log data are records of electronic interactions that have occurred between users and web services, reflecting information-seeking behavior in the context of digital libraries where users interact with the service system during the search for information. Two ways were used to analyze South Korea's National Digital Science Library (NDSL) log data for three days, including 150,000 data: a log pattern analysis, and log context analysis using statistics. First, a pattern-based analysis examined the general paths of usage by logged and unlogged users. The correlation between paths was analyzed through a χ2 analysis. The subsequent log context analysis assessed 30 identified users' data using basic statistics and visualized the individual user information-seeking behavior while accessing NDSL. The visualization shows included 30 diverse paths for 30 cases. Log analysis provided insight into general and individual user information-seeking behavior. The results of log analysis can enhance the understanding of user actions. Therefore, it can be utilized as the basic data to improve the design of services and systems in the digital library to meet users' needs.