• Title/Summary/Keyword: Big Data Processing Technology

Search Result 385, Processing Time 0.025 seconds

Twitter Issue Tracking System by Topic Modeling Techniques (토픽 모델링을 이용한 트위터 이슈 트래킹 시스템)

  • Bae, Jung-Hwan;Han, Nam-Gi;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.109-122
    • /
    • 2014
  • People are nowadays creating a tremendous amount of data on Social Network Service (SNS). In particular, the incorporation of SNS into mobile devices has resulted in massive amounts of data generation, thereby greatly influencing society. This is an unmatched phenomenon in history, and now we live in the Age of Big Data. SNS Data is defined as a condition of Big Data where the amount of data (volume), data input and output speeds (velocity), and the variety of data types (variety) are satisfied. If someone intends to discover the trend of an issue in SNS Big Data, this information can be used as a new important source for the creation of new values because this information covers the whole of society. In this study, a Twitter Issue Tracking System (TITS) is designed and established to meet the needs of analyzing SNS Big Data. TITS extracts issues from Twitter texts and visualizes them on the web. The proposed system provides the following four functions: (1) Provide the topic keyword set that corresponds to daily ranking; (2) Visualize the daily time series graph of a topic for the duration of a month; (3) Provide the importance of a topic through a treemap based on the score system and frequency; (4) Visualize the daily time-series graph of keywords by searching the keyword; The present study analyzes the Big Data generated by SNS in real time. SNS Big Data analysis requires various natural language processing techniques, including the removal of stop words, and noun extraction for processing various unrefined forms of unstructured data. In addition, such analysis requires the latest big data technology to process rapidly a large amount of real-time data, such as the Hadoop distributed system or NoSQL, which is an alternative to relational database. We built TITS based on Hadoop to optimize the processing of big data because Hadoop is designed to scale up from single node computing to thousands of machines. Furthermore, we use MongoDB, which is classified as a NoSQL database. In addition, MongoDB is an open source platform, document-oriented database that provides high performance, high availability, and automatic scaling. Unlike existing relational database, there are no schema or tables with MongoDB, and its most important goal is that of data accessibility and data processing performance. In the Age of Big Data, the visualization of Big Data is more attractive to the Big Data community because it helps analysts to examine such data easily and clearly. Therefore, TITS uses the d3.js library as a visualization tool. This library is designed for the purpose of creating Data Driven Documents that bind document object model (DOM) and any data; the interaction between data is easy and useful for managing real-time data stream with smooth animation. In addition, TITS uses a bootstrap made of pre-configured plug-in style sheets and JavaScript libraries to build a web system. The TITS Graphical User Interface (GUI) is designed using these libraries, and it is capable of detecting issues on Twitter in an easy and intuitive manner. The proposed work demonstrates the superiority of our issue detection techniques by matching detected issues with corresponding online news articles. The contributions of the present study are threefold. First, we suggest an alternative approach to real-time big data analysis, which has become an extremely important issue. Second, we apply a topic modeling technique that is used in various research areas, including Library and Information Science (LIS). Based on this, we can confirm the utility of storytelling and time series analysis. Third, we develop a web-based system, and make the system available for the real-time discovery of topics. The present study conducted experiments with nearly 150 million tweets in Korea during March 2013.

Development of Distributed Smart Data Monitoring System for Heterogeneous Manufacturing Machines Operation (이종 공작기계 운용 관리를 위한 분산 스마트 데이터 모니터링 시스템 개발)

  • Lee, Young-woon;Choi, Young-ju;Lee, Jong-Hyeok;Kim, Byung-Gyu;Lee, Seung-Woo;Park, Jong-Kweon
    • Journal of Digital Contents Society
    • /
    • v.18 no.6
    • /
    • pp.1175-1182
    • /
    • 2017
  • Recent trend in the manufacturing industry is focused on the convergence with IoT and Big Data, by emergence of the 4th Industrial Revolution. To realize a smart factory, the proposed system based on MTConnect technology collects and integrates various status information of machines from many production facilities including heterogeneous devices. Also it can distribute the acquisited status of heterogeneous manufacturing machines to the remote devices. As a key technology of a flexible automated production line, the proposed system can provide much possibility to manage important information such as error detection and processing state management in the unmanned automation line.

Big Data based Epidemic Investigation Support System using Mobile Network Data (이동통신 데이터를 활용한 빅데이터 기반 역학조사지원 시스템)

  • Lee, Min-woo;Kim, Ye-ji;Yi, Jae-jin;Moon, Kyu-hwan;Hwang, SeonBae;Jun, Yong-joo;Hahm, Yu-Kun
    • The Journal of Bigdata
    • /
    • v.5 no.2
    • /
    • pp.187-199
    • /
    • 2020
  • The World Health Organization declared COVID-19 a pandemic on March 11. South Korea recorded 27,000 cases of the coronavirus illness, and more than 50 million coronavirus cases were confirmed all over the world. An epidemiological investigation becomes important once again due to the spread of COVID-19 infections. However, there were a number of confirmed coronavirus cases from Deagu and Gyeongbuk. Limitations of the epidemiological investigation methods were recognized. The Korea Disease Control and Prevention Agency developed the Epidemiological Investigation Support System(EISS) to utilize the smart city data hub technology and utilized the system in the epidemiological investigation. As a part of EISS, The proposed system is big-data bsed epidemiological investigation support system processing mobile network data. The established system is the epidemiological investigation support system based on big data to process mobile carriers' big data. Processing abnormal values of mobile carriers' data which was impossible with existing staff or creating hotspot regions where more than two people were in contact with an infected person were realized. As a result, our system processes outlier of mobile network data in 30 seconds, while processes hotspot around in 10 minutes. as a first time to adapt and support bigdata system into epidemiological investigation, our system proposes the practical utilizability of big-data system into epidemiological investigation.

Efficient Query Retrieval from Social Data in Neo4j using LIndex

  • Mathew, Anita Brigit
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.5
    • /
    • pp.2211-2232
    • /
    • 2018
  • The unstructured and semi-structured big data in social network poses new challenges in query retrieval. This requirement needs to be met by introducing quality retrieval time measures like indexing. Due to the huge volume of data storage, there originate the need for efficient index algorithms to promote query processing. However, conventional algorithms fail to index the huge amount of frequently obtained information in real time and fall short of providing scalable indexing service. In this paper, a new LIndex algorithm, which is a heuristic on Lucene is built on Neo4jHA architecture that holds the social network Big data. LIndex is a flexible and simplified adaptive indexing scheme that ascendancy decomposed shortest paths around term neighbors as basic indexing unit. This newfangled index proves to be effectual in query space pruning of graph database Neo4j, scalable in index construction and deployment. A graph query is processed and optimized beyond the traditional Lucene in a time-based manner to a more efficient path method in LIndex. This advanced algorithm significantly reduces query fetch without compromising the quality of results in time. The experiments are conducted to confirm the efficiency of the proposed query retrieval in Neo4j graph NoSQL database.

Transfer Learning-Based Feature Fusion Model for Classification of Maneuver Weapon Systems

  • Jinyong Hwang;You-Rak Choi;Tae-Jin Park;Ji-Hoon Bae
    • Journal of Information Processing Systems
    • /
    • v.19 no.5
    • /
    • pp.673-687
    • /
    • 2023
  • Convolutional neural network-based deep learning technology is the most commonly used in image identification, but it requires large-scale data for training. Therefore, application in specific fields in which data acquisition is limited, such as in the military, may be challenging. In particular, the identification of ground weapon systems is a very important mission, and high identification accuracy is required. Accordingly, various studies have been conducted to achieve high performance using small-scale data. Among them, the ensemble method, which achieves excellent performance through the prediction average of the pre-trained models, is the most representative method; however, it requires considerable time and effort to find the optimal combination of ensemble models. In addition, there is a performance limitation in the prediction results obtained by using an ensemble method. Furthermore, it is difficult to obtain the ensemble effect using models with imbalanced classification accuracies. In this paper, we propose a transfer learning-based feature fusion technique for heterogeneous models that extracts and fuses features of pre-trained heterogeneous models and finally, fine-tunes hyperparameters of the fully connected layer to improve the classification accuracy. The experimental results of this study indicate that it is possible to overcome the limitations of the existing ensemble methods by improving the classification accuracy through feature fusion between heterogeneous models based on transfer learning.

Access efficiency of small sized files in Big Data using various Techniques on Hadoop Distributed File System platform

  • Alange, Neeta;Mathur, Anjali
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.7
    • /
    • pp.359-364
    • /
    • 2021
  • In recent years Hadoop usage has been increasing day by day. The need of development of the technology and its specified outcomes are eagerly waiting across globe to adopt speedy access of data. Need of computers and its dependency is increasing day by day. Big data is exponentially growing as the entire world is working in online mode. Large amount of data has been produced which is very difficult to handle and process within a short time. In present situation industries are widely using the Hadoop framework to store, process and produce at the specified time with huge amount of data that has been put on the server. Processing of this huge amount of data having small files & its storage optimization is a big problem. HDFS, Sequence files, HAR, NHAR various techniques have been already proposed. In this paper we have discussed about various existing techniques which are developed for accessing and storing small files efficiently. Out of the various techniques we have specifically tried to implement the HDFS- HAR, NHAR techniques.

Job-aware Network Scheduling for Hadoop Cluster

  • Liu, Wen;Wang, Zhigang;Shen, Yanming
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.1
    • /
    • pp.237-252
    • /
    • 2017
  • In recent years, data centers have become the core infrastructure to deal with big data processing. For these big data applications, network transmission has become one of the most important factors affecting the performance. In order to improve network utilization and reduce job completion time, in this paper, by real-time monitoring from the application layer, we propose job-aware priority scheduling. Our approach takes the correlations of flows in the same job into account, and flows in the same job are assigned the same priority. Therefore, we expect that flows in the same job finish their transmissions at about the same time, avoiding lagging flows. To achieve load balancing, two approaches (Flow-based and Spray) using ECMP (Equal-Cost multi-path routing) are presented. We implemented our scheme using NS-2 simulator. In our evaluations, we emulate real network environment by setting background traffic, scheduling delay and link failures. The experimental results show that our approach can enhance the Hadoop job execution efficiency of the shuffle stage, significantly reduce the network transmission time of the highest priority job.

A Study on the Anomaly Prediction System of Drone Using Big Data (빅데이터를 활용한 드론의 이상 예측시스템 연구)

  • Lee, Yang-Kyoo;Hong, Jun-Ki;Hong, Sung-Chan
    • Journal of Internet Computing and Services
    • /
    • v.21 no.2
    • /
    • pp.27-37
    • /
    • 2020
  • Recently, big data is rapidly emerging as a core technology in the 4th industrial revolution. Further, the utilization and the demand of drones are continuously increasing with the development of the 4th industrial revolution. However, as the drones usage increases, the risk of drones falling increases. Drones always have a risk of being able to fall easily even with small problems due to its simple structure. In this paper, in order to predict the risk of drone fall and to prevent the fall, ESC (Electronic Speed Control) is attached integrally with the drone's driving motor and the acceleration sensor is stored to collect the vibration data in real time. By processing and monitoring the data in real time and analyzing the data through big data obtained in such a situation using a Fast Fourier Transform (FFT) algorithm, we proposed a prediction system that minimizes the risk of drone fall by analyzing big data collected from drones.

Big Data-based Medical Clinical Results Analysis (빅데이터 기반 의료 임상 결과 분석)

  • Hwang, Seung-Yeon;Park, Ji-Hun;Youn, Ha-Young;Kwak, Kwang-Jin;Park, Jeong-Min;Kim, Jeong-Joon
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.19 no.1
    • /
    • pp.187-195
    • /
    • 2019
  • Recently, it has become possible to collect, store, process, and analyze data generated in various fields by the development of the technology related to the big data. These big data technologies are used for clinical results analysis and the optimization of clinical trial design will reduce the costs associated with health care. Therefore, in this paper, we are going to analyze clinical results and present guidelines that can reduce the period and cost of clinical trials. First, we use Sqoop to collect clinical results data from relational databases and store in HDFS, and use Hive, a processing tool based on Hadoop, to process data. Finally we use R, a big data analysis tool that is widely used in various fields such as public sector or business, to analyze associations.

Enhancing the performance of taxi application based on in-memory data grid technology (In-memory data grid 기술을 활용한 택시 애플리케이션 성능 향상 기법 연구)

  • Choi, Chi-Hwan;Kim, Jin-Hyuk;Park, Min-Kyu;Kwon, Kaaen;Jung, Seung-Hyun;Nazareno, Franco;Cho, Wan-Sup
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.5
    • /
    • pp.1035-1045
    • /
    • 2015
  • Recent studies in Big Data Analysis are showing promising results, utilizing the main memory for rapid data processing. In-memory computing technology can be highly advantageous when used with high-performing servers having tens of gigabytes of RAM with multi-core processors. The constraint in network in these infrastructure can be lessen by combining in-memory technology with distributed parallel processing. This paper discusses the research in the aforementioned concept applying to a test taxi hailing application without disregard to its underlying RDBMS structure. The application of IMDG technology in the application's backend API without restructuring the database schema yields 6 to 9 times increase in performance in data processing and throughput. Specifically, the change in throughput is very small even with increase in data load processing.