• Title/Summary/Keyword: Data and Analysis

Search Result 85,372, Processing Time 0.088 seconds

Analysis of ADS-B ground trajectory data using non-aviation approval public data (공공용 정보를 이용한 ADS-B 지상 항적 자료 분석)

  • Ku, SungKwan;Baik, Hojong
    • Journal of the Korean Society for Aviation and Aeronautics
    • /
    • v.23 no.4
    • /
    • pp.6-11
    • /
    • 2015
  • In this study, we surveyed analysis of ADS-B ground trajectory data using non-aviation approval public data. For analysis used non-aviation public data and commercial ADS-B receiver. The study result is available using ADS-B ground trajectory data for airfield surveillance on limited range. Also, to confirmed of available using non-aviation public data for aviation research.

Text Mining and Visualization of Unstructured Data Using Big Data Analytical Tool R (빅데이터 분석 도구 R을 이용한 비정형 데이터 텍스트 마이닝과 시각화)

  • Nam, Soo-Tai;Shin, Seong-Yoon;Jin, Chan-Yong
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.9
    • /
    • pp.1199-1205
    • /
    • 2021
  • In the era of big data, not only structured data well organized in databases, but also the Internet, social network services, it is very important to effectively analyze unstructured big data such as web documents, e-mails, and social data generated in real time in mobile environment. Big data analysis is the process of creating new value by discovering meaningful new correlations, patterns, and trends in big data stored in data storage. We intend to summarize and visualize the analysis results through frequency analysis of unstructured article data using R language, a big data analysis tool. The data used in this study was analyzed for total 104 papers in the Mon-May 2021 among the journals of the Korea Institute of Information and Communication Engineering. In the final analysis results, the most frequently mentioned keyword was "Data", which ranked first 1,538 times. Therefore, based on the results of the analysis, the limitations of the study and theoretical implications are suggested.

Complex Segregation Analysis of Categorical Traits in Farm Animals: Comparison of Linear and Threshold Models

  • Kadarmideen, Haja N.;Ilahi, H.
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.18 no.8
    • /
    • pp.1088-1097
    • /
    • 2005
  • Main objectives of this study were to investigate accuracy, bias and power of linear and threshold model segregation analysis methods for detection of major genes in categorical traits in farm animals. Maximum Likelihood Linear Model (MLLM), Bayesian Linear Model (BALM) and Bayesian Threshold Model (BATM) were applied to simulated data on normal, categorical and binary scales as well as to disease data in pigs. Simulated data on the underlying normally distributed liability (NDL) were used to create categorical and binary data. MLLM method was applied to data on all scales (Normal, categorical and binary) and BATM method was developed and applied only to binary data. The MLLM analyses underestimated parameters for binary as well as categorical traits compared to normal traits; with the bias being very severe for binary traits. The accuracy of major gene and polygene parameter estimates was also very low for binary data compared with those for categorical data; the later gave results similar to normal data. When disease incidence (on binary scale) is close to 50%, segregation analysis has more accuracy and lesser bias, compared to diseases with rare incidences. NDL data were always better than categorical data. Under the MLLM method, the test statistics for categorical and binary data were consistently unusually very high (while the opposite is expected due to loss of information in categorical data), indicating high false discovery rates of major genes if linear models are applied to categorical traits. With Bayesian segregation analysis, 95% highest probability density regions of major gene variances were checked if they included the value of zero (boundary parameter); by nature of this difference between likelihood and Bayesian approaches, the Bayesian methods are likely to be more reliable for categorical data. The BATM segregation analysis of binary data also showed a significant advantage over MLLM in terms of higher accuracy. Based on the results, threshold models are recommended when the trait distributions are discontinuous. Further, segregation analysis could be used in an initial scan of the data for evidence of major genes before embarking on molecular genome mapping.

A Study on the Meaning of The First Slam Dunk Based on Text Mining and Semantic Network Analysis

  • Kyung-Won Byun
    • International journal of advanced smart convergence
    • /
    • v.12 no.1
    • /
    • pp.164-172
    • /
    • 2023
  • In this study, we identify the recognition of 'The First Slam Dunk', which is gaining popularity as a sports-based cartoon through big data analysis of social media channels, and provide basic data for the development and development of various contents in the sports industry. Social media channels collected detailed social big data from news provided on Naver and Google sites. Data were collected from January 1, 2023 to February 15, 2023, referring to the release date of 'The First Slam Dunk' in Korea. The collected data were 2,106 Naver news data, and 1,019 Google news data were collected. TF and TF-IDF were analyzed through text mining for these data. Through this, semantic network analysis was conducted for 60 keywords. Big data analysis programs such as Textom and UCINET were used for social big data analysis, and NetDraw was used for visualization. As a result of the study, the keyword with the high frequency in relation to the subject in consideration of TF and TF-IDF appeared 4,079 times as 'The First Slam Dunk' was the keyword with the high frequency among the frequent keywords. Next are 'Slam Dunk', 'Movie', 'Premiere', 'Animation', 'Audience', and 'Box-Office'. Based on these results, 60 high-frequency appearing keywords were extracted. After that, semantic metrics and centrality analysis were conducted. Finally, a total of 6 clusters(competing movie, cartoon, passion, premiere, attention, Box-Office) were formed through CONCOR analysis. Based on this analysis of the semantic network of 'The First Slam Dunk', basic data on the development plan of sports content were provided.

Investigating the underlying structure of particulate matter concentrations: a functional exploratory data analysis study using California monitoring data

  • Montoya, Eduardo L.
    • Communications for Statistical Applications and Methods
    • /
    • v.25 no.6
    • /
    • pp.619-631
    • /
    • 2018
  • Functional data analysis continues to attract interest because advances in technology across many fields have increasingly permitted measurements to be made from continuous processes on a discretized scale. Particulate matter is among the most harmful air pollutants affecting public health and the environment, and levels of PM10 (particles less than 10 micrometers in diameter) for regions of California remain among the highest in the United States. The relatively high frequency of particulate matter sampling enables us to regard the data as functional data. In this work, we investigate the dominant modes of variation of PM10 using functional data analysis methodologies. Our analysis provides insight into the underlying data structure of PM10, and it captures the size and temporal variation of this underlying data structure. In addition, our study shows that certain aspects of size and temporal variation of the underlying PM10 structure are associated with changes in large-scale climate indices that quantify variations of sea surface temperature and atmospheric circulation patterns.

Comparing Methodology of Building Energy Analysis - Comparative Analysis from steady-state simulation to data-driven Analysis - (건물에너지 분석 방법론 비교 - Steady-state simulation에서부터 Data-driven 방법론의 비교 분석 -)

  • Cho, Sooyoun;Leigh, Seung-Bok
    • KIEAE Journal
    • /
    • v.17 no.5
    • /
    • pp.77-86
    • /
    • 2017
  • Purpose: Because of the growing concern over fossil fuel use and increasing demand for greenhouse gas emission reduction since the 1990s, the building energy analysis field has produced various types of methods, which are being applied more often and broadly than ever. A lot of research products have been actively proposed in the area of the building energy simulation for over 50 years around the world. However, in the last 20 years, there have been only a few research cases where the trend of building energy analysis is examined, estimated or compared. This research aims to investigate a trend of the building energy analysis by focusing on methodology and characteristics of each method. Method: The research papers addressing the building energy analysis are classified into two types of method: engineering analysis and algorithm estimation. Especially, EPG(Energy Performance Gap), which is the limit both for the existing engineering method and the single algorithm-based estimation method, results from comparing data of two different levels- in other words, real time data and simulation data. Result: When one or more ensemble algorithms are used, more accurate estimations of energy consumption and performance are produced, and thereby improving the problem of energy performance gap.

A Study on Design of Real-time Big Data Collection and Analysis System based on OPC-UA for Smart Manufacturing of Machine Working

  • Kim, Jaepyo;Kim, Youngjoo;Kim, Seungcheon
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.13 no.4
    • /
    • pp.121-128
    • /
    • 2021
  • In order to design a real time big data collection and analysis system of manufacturing data in a smart factory, it is important to establish an appropriate wired/wireless communication system and protocol. This paper introduces the latest communication protocol, OPC-UA (Open Platform Communication Unified Architecture) based client/server function, applied user interface technology to configure a network for real-time data collection through IoT Integration. Then, Database is designed in MES (Manufacturing Execution System) based on the analysis table that reflects the user's requirements among the data extracted from the new cutting process automation process, bush inner diameter indentation measurement system and tool monitoring/inspection system. In summary, big data analysis system introduced in this paper performs SPC (statistical Process Control) analysis and visualization analysis with interface of OPC-UA-based wired/wireless communication. Through AI learning modeling with XGBoost (eXtream Gradient Boosting) and LR (Linear Regression) algorithm, quality and visualization analysis is carried out the storage and connection to the cloud.

Categorical Data Analysis by Means of Echelon Analysis with Spatial Scan Statistics

  • Moon, Sung-Ho
    • Journal of the Korean Data and Information Science Society
    • /
    • v.15 no.1
    • /
    • pp.83-94
    • /
    • 2004
  • In this study we analyze categorical data by means of spatial statistics and echelon analysis. To do this, we first determine the hierarchical structure of a given contingency table by using echelon dendrogram then, we detect candidates of hotspots given as the top echelon in the dendrogram. Next, we evaluate spatial scan statistics for the zones of significantly high or low rates based on the likelihood ratio. Finally, we detect hotspots of any size and shape based on spatial scan statistics.

  • PDF

Bounding Worst-Case Data Cache Performance by Using Stack Distance

  • Liu, Yu;Zhang, Wei
    • Journal of Computing Science and Engineering
    • /
    • v.3 no.4
    • /
    • pp.195-215
    • /
    • 2009
  • Worst-case execution time (WCET) analysis is critical for hard real-time systems to ensure that different tasks can meet their respective deadlines. While significant progress has been made for WCET analysis of instruction caches, the data cache timing analysis, especially for set-associative data caches, is rather limited. This paper proposes an approach to safely and tightly bounding data cache performance by computing the worst-case stack distance of data cache accesses. Our approach can not only be applied to direct-mapped caches, but also be used for set-associative or even fully-associative caches without increasing the complexity of analysis. Moreover, the proposed approach can statically categorize worst-case data cache misses into cold, conflict, and capacity misses, which can provide useful insights for designers to enhance the worst-case data cache performance. Our evaluation shows that the proposed data cache timing analysis technique can safely and accurately estimate the worst-case data cache performance, and the overestimation as compared to the observed worst-case data cache misses is within 1% on average.

Problems of Big Data Analysis Education and Their Solutions (빅데이터 분석 교육의 문제점과 개선 방안 -학생 과제 보고서를 중심으로)

  • Choi, Do-Sik
    • Journal of the Korea Convergence Society
    • /
    • v.8 no.12
    • /
    • pp.265-274
    • /
    • 2017
  • This paper examines the problems of big data analysis education and suggests ways to solve them. Big data is a trend that the characteristic of big data is evolving from V3 to V5. For this reason, big data analysis education must take V5 into account. Because increased uncertainty can increase the risk of data analysis, internal and external structured/semi-structured data as well as disturbance factors should be analyzed to improve the reliability of the data. And when using opinion mining, error that is easy to perceive is variability and veracity. The veracity of the data can be increased when data analysis is performed against uncertain situations created by various variables and options. It is the node analysis of the textom(텍스톰) and NodeXL that students and researchers mainly use in the analysis of the association network. Social network analysis should be able to get meaningful results and predict future by analyzing the current situation based on dark data gained.