• Title/Summary/Keyword: Big data streaming

Search Result 42, Processing Time 0.023 seconds

Research of Knowledge Management and Reusability in Streaming Big Data with Privacy Policy through Actionable Analytics (스트리밍 빅데이터의 프라이버시 보호 동반 실용적 분석을 통한 지식 활용과 재사용 연구)

  • Paik, Juryon;Lee, Youngsook
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.12 no.3
    • /
    • pp.1-9
    • /
    • 2016
  • The current meaning of "Big Data" refers to all the techniques for value eduction and actionable analytics as well management tools. Particularly, with the advances of wireless sensor networks, they yield diverse patterns of digital records. The records are mostly semi-structured and unstructured data which are usually beyond of capabilities of the management tools. Such data are rapidly growing due to their complex data structures. The complex type effectively supports data exchangeability and heterogeneity and that is the main reason their volumes are getting bigger in the sensor networks. However, there are many errors and problems in applications because the managing solutions for the complex data model are rarely presented in current big data environments. To solve such problems and show our differentiation, we aim to provide the solution of actionable analytics and semantic reusability in the sensor web based streaming big data with new data structure, and to empower the competitiveness.

Hazelcast Vs. Ignite: Opportunities for Java Programmers

  • Maxim, Bartkov;Tetiana, Katkova;S., Kruglyk Vladyslav;G., Murtaziev Ernest;V., Kotova Olha
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.2
    • /
    • pp.406-412
    • /
    • 2022
  • Storing large amounts of data has always been a big problem from the beginning of computing history. Big Data has made huge advancements in improving business processes by finding the customers' needs using prediction models based on web and social media search. The main purpose of big data stream processing frameworks is to allow programmers to directly query the continuous stream without dealing with the lower-level mechanisms. In other words, programmers write the code to process streams using these runtime libraries (also called Stream Processing Engines). This is achieved by taking large volumes of data and analyzing them using Big Data frameworks. Streaming platforms are an emerging technology that deals with continuous streams of data. There are several streaming platforms of Big Data freely available on the Internet. However, selecting the most appropriate one is not easy for programmers. In this paper, we present a detailed description of two of the state-of-the-art and most popular streaming frameworks: Apache Ignite and Hazelcast. In addition, the performance of these frameworks is compared using selected attributes. Different types of databases are used in common to store the data. To process the data in real-time continuously, data streaming technologies are developed. With the development of today's large-scale distributed applications handling tons of data, these databases are not viable. Consequently, Big Data is introduced to store, process, and analyze data at a fast speed and also to deal with big users and data growth day by day.

Application Of Open Data Framework For Real-Time Data Processing (실시간 데이터 처리를 위한 개방형 데이터 프레임워크 적용 방안)

  • Park, Sun-ho;Kim, Young-kil
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.23 no.10
    • /
    • pp.1179-1187
    • /
    • 2019
  • In today's technology environment, most big data-based applications and solutions are based on real-time processing of streaming data. Real-time processing and analysis of big data streams plays an important role in the development of big data-based applications and solutions. In particular, in the maritime data processing environment, the necessity of developing a technology capable of rapidly processing and analyzing a large amount of real-time data due to the explosion of data is accelerating. Therefore, this paper analyzes the characteristics of NiFi, Kafka, and Druid as suitable open source among various open data technologies for processing big data, and provides the latest information on external linkage necessary for maritime service analysis in Korean e-Navigation service. To this end, we will lay the foundation for applying open data framework technology for real-time data processing.

Performance Evaluation and Analysis of Multiple Scenarios of Big Data Stream Computing on Storm Platform

  • Sun, Dawei;Yan, Hongbin;Gao, Shang;Zhou, Zhangbing
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.7
    • /
    • pp.2977-2997
    • /
    • 2018
  • In big data era, fresh data grows rapidly every day. More than 30,000 gigabytes of data are created every second and the rate is accelerating. Many organizations rely heavily on real time streaming, while big data stream computing helps them spot opportunities and risks from real time big data. Storm, one of the most common online stream computing platforms, has been used for big data stream computing, with response time ranging from milliseconds to sub-seconds. The performance of Storm plays a crucial role in different application scenarios, however, few studies were conducted to evaluate the performance of Storm. In this paper, we investigate the performance of Storm under different application scenarios. Our experimental results show that throughput and latency of Storm are greatly affected by the number of instances of each vertex in task topology, and the number of available resources in data center. The fault-tolerant mechanism of Storm works well in most big data stream computing environments. As a result, it is suggested that a dynamic topology, an elastic scheduling framework, and a memory based fault-tolerant mechanism are necessary for providing high throughput and low latency services on Storm platform.

Squall: A Real-time Big Data Processing Framework based on TMO Model for Real-time Events and Micro-batch Processing (Squall: 실시간 이벤트와 마이크로-배치의 동시 처리 지원을 위한 TMO 모델 기반의 실시간 빅데이터 처리 프레임워크)

  • Son, Jae Gi;Kim, Jung Guk
    • Journal of KIISE
    • /
    • v.44 no.1
    • /
    • pp.84-94
    • /
    • 2017
  • Recently, the importance of velocity, one of the characteristics of big data (5V: Volume, Variety, Velocity, Veracity, and Value), has been emphasized in the data processing, which has led to several studies on the real-time stream processing, a technology for quick and accurate processing and analyses of big data. In this paper, we propose a Squall framework using Time-triggered Message-triggered Object (TMO) technology, a model that is widely used for processing real-time big data. Moreover, we provide a description of Squall framework and its operations under a single node. TMO is an object model that supports the non-regular real-time processing method for certain conditions as well as regular periodic processing for certain amount of time. A Squall framework can support the real-time event stream of big data and micro-batch processing with outstanding performances, as compared to Apache storm and Spark Streaming. However, additional development for processing real-time stream under multiple nodes that is common under most frameworks is needed. In conclusion, the advantages of a TMO model can overcome the drawbacks of Apache storm or Spark Streaming in the processing of real-time big data. The TMO model has potential as a useful model in real-time big data processing.

An Analysis of Big Video Data with Cloud Computing in Ubiquitous City (클라우드 컴퓨팅을 이용한 유시티 비디오 빅데이터 분석)

  • Lee, Hak Geon;Yun, Chang Ho;Park, Jong Won;Lee, Yong Woo
    • Journal of Internet Computing and Services
    • /
    • v.15 no.3
    • /
    • pp.45-52
    • /
    • 2014
  • The Ubiquitous-City (U-City) is a smart or intelligent city to satisfy human beings' desire to enjoy IT services with any device, anytime, anywhere. It is a future city model based on Internet of everything or things (IoE or IoT). It includes a lot of video cameras which are networked together. The networked video cameras support a lot of U-City services as one of the main input data together with sensors. They generate huge amount of video information, real big data for the U-City all the time. It is usually required that the U-City manipulates the big data in real-time. And it is not easy at all. Also, many times, it is required that the accumulated video data are analyzed to detect an event or find a figure among them. It requires a lot of computational power and usually takes a lot of time. Currently we can find researches which try to reduce the processing time of the big video data. Cloud computing can be a good solution to address this matter. There are many cloud computing methodologies which can be used to address the matter. MapReduce is an interesting and attractive methodology for it. It has many advantages and is getting popularity in many areas. Video cameras evolve day by day so that the resolution improves sharply. It leads to the exponential growth of the produced data by the networked video cameras. We are coping with real big data when we have to deal with video image data which are produced by the good quality video cameras. A video surveillance system was not useful until we find the cloud computing. But it is now being widely spread in U-Cities since we find some useful methodologies. Video data are unstructured data thus it is not easy to find a good research result of analyzing the data with MapReduce. This paper presents an analyzing system for the video surveillance system, which is a cloud-computing based video data management system. It is easy to deploy, flexible and reliable. It consists of the video manager, the video monitors, the storage for the video images, the storage client and streaming IN component. The "video monitor" for the video images consists of "video translater" and "protocol manager". The "storage" contains MapReduce analyzer. All components were designed according to the functional requirement of video surveillance system. The "streaming IN" component receives the video data from the networked video cameras and delivers them to the "storage client". It also manages the bottleneck of the network to smooth the data stream. The "storage client" receives the video data from the "streaming IN" component and stores them to the storage. It also helps other components to access the storage. The "video monitor" component transfers the video data by smoothly streaming and manages the protocol. The "video translator" sub-component enables users to manage the resolution, the codec and the frame rate of the video image. The "protocol" sub-component manages the Real Time Streaming Protocol (RTSP) and Real Time Messaging Protocol (RTMP). We use Hadoop Distributed File System(HDFS) for the storage of cloud computing. Hadoop stores the data in HDFS and provides the platform that can process data with simple MapReduce programming model. We suggest our own methodology to analyze the video images using MapReduce in this paper. That is, the workflow of video analysis is presented and detailed explanation is given in this paper. The performance evaluation was experiment and we found that our proposed system worked well. The performance evaluation results are presented in this paper with analysis. With our cluster system, we used compressed $1920{\times}1080(FHD)$ resolution video data, H.264 codec and HDFS as video storage. We measured the processing time according to the number of frame per mapper. Tracing the optimal splitting size of input data and the processing time according to the number of node, we found the linearity of the system performance.

Machine Learning based Bandwidth Prediction for Dynamic Adaptive Streaming over HTTP

  • Yoo, Soyoung;Kim, Gyeongryeong;Kim, Minji;Kim, Yeonjin;Park, Soeun;Kim, Dongho
    • Journal of Advanced Information Technology and Convergence
    • /
    • v.10 no.2
    • /
    • pp.33-48
    • /
    • 2020
  • By Digital Transformation, new technologies like ML (Machine Learning), Big Data, Cloud, VR/AR are being used to video streaming technology. We choose ML to provide optimal QoE (Quality of Experience) in various network conditions. In other words, ML helps DASH in providing non-stopping video streaming. In DASH, the source video is segmented into short duration chunks of 2-10 seconds, each of which is encoded at several different bitrate levels and resolutions. We built and compared the performances of five prototypes after applying five different machine learning algorithms to DASH. The prototype consists of a dash.js, a video processing server, web servers, data sets, and five machine learning models.

RDFS Rule based Parallel Reasoning Scheme for Large-Scale Streaming Sensor Data (대용량 스트리밍 센서데이터 환경에서 RDFS 규칙기반 병렬추론 기법)

  • Kwon, SoonHyun;Park, Youngtack
    • Journal of KIISE
    • /
    • v.41 no.9
    • /
    • pp.686-698
    • /
    • 2014
  • Recently, large-scale streaming sensor data have emerged due to explosive supply of smart phones, diffusion of IoT and Cloud computing technology, and generalization of IoT devices. Also, researches on combination of semantic web technology are being actively pushed forward by increasing of requirements for creating new value of data through data sharing and mash-up in large-scale environments. However, we are faced with big issues due to large-scale and streaming data in the inference field for creating a new knowledge. For this reason, we propose the RDFS rule based parallel reasoning scheme to service by processing large-scale streaming sensor data with the semantic web technology. In the proposed scheme, we run in parallel each job of Rete network algorithm, the existing rule inference algorithm and sharing data using the HBase, a hadoop database, as a public storage. To achieve this, we implement our system and evaluate performance through the AWS data of the weather center as large-scale streaming sensor data.

Research on Big Data Integration Method

  • Kim, Jee-Hyun;Cho, Young-Im
    • Journal of the Korea Society of Computer and Information
    • /
    • v.22 no.1
    • /
    • pp.49-56
    • /
    • 2017
  • In this paper we propose the approach for big data integration so as to analyze, visualize and predict the future of the trend of the market, and that is to get the integration data model using the R language which is the future of the statistics and the Hadoop which is a parallel processing for the data. As four approaching methods using R and Hadoop, ff package in R, R and Streaming as Hadoop utility, and Rhipe and RHadoop as R and Hadoop interface packages are used, and the strength and weakness of four methods are described and analyzed, so Rhipe and RHadoop are proposed as a complete set of data integration model. The integration of R, which is popular for processing statistical algorithm and Hadoop contains Distributed File System and resource management platform and can implement the MapReduce programming model gives us a new environment where in R code can be written and deployed in Hadoop without any data movement. This model allows us to predictive analysis with high performance and deep understand over the big data.

An Analysis of Utilization on Virtualized Computing Resource for Hadoop and HBase based Big Data Processing Applications (Hadoop과 HBase 기반의 빅 데이터 처리 응용을 위한 가상 컴퓨팅 자원 이용률 분석)

  • Cho, Nayun;Ku, Mino;Kim, Baul;Xuhua, Rui;Min, Dugki
    • Journal of Information Technology and Architecture
    • /
    • v.11 no.4
    • /
    • pp.449-462
    • /
    • 2014
  • In big data era, there are a number of considerable parts in processing systems for capturing, storing, and analyzing stored or streaming data. Unlike traditional data handling systems, a big data processing system needs to concern the characteristics (format, velocity, and volume) of being handled data in the system. In this situation, virtualized computing platform is an emerging platform for handling big data effectively, since virtualization technology enables to manage computing resources dynamically and elastically with minimum efforts. In this paper, we analyze resource utilization of virtualized computing resources to discover suitable deployment models in Apache Hadoop and HBase-based big data processing environment. Consequently, Task Tracker service shows high CPU utilization and high Disk I/O overhead during MapReduce phases. Moreover, HRegion service indicates high network resource consumption for transfer the traffic data from DataNode to Task Tracker. DataNode shows high memory resource utilization and Disk I/O overhead for reading stored data.