• Title/Summary/Keyword: Big data Processing

Search Result 1,063, Processing Time 0.026 seconds

A MVC Framework for Visualizing Text Data (텍스트 데이터 시각화를 위한 MVC 프레임워크)

  • Choi, Kwang Sun;Jeong, Kyo Sung;Kim, Soo Dong
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.39-58
    • /
    • 2014
  • As the importance of big data and related technologies continues to grow in the industry, it has become highlighted to visualize results of processing and analyzing big data. Visualization of data delivers people effectiveness and clarity for understanding the result of analyzing. By the way, visualization has a role as the GUI (Graphical User Interface) that supports communications between people and analysis systems. Usually to make development and maintenance easier, these GUI parts should be loosely coupled from the parts of processing and analyzing data. And also to implement a loosely coupled architecture, it is necessary to adopt design patterns such as MVC (Model-View-Controller) which is designed for minimizing coupling between UI part and data processing part. On the other hand, big data can be classified as structured data and unstructured data. The visualization of structured data is relatively easy to unstructured data. For all that, as it has been spread out that the people utilize and analyze unstructured data, they usually develop the visualization system only for each project to overcome the limitation traditional visualization system for structured data. Furthermore, for text data which covers a huge part of unstructured data, visualization of data is more difficult. It results from the complexity of technology for analyzing text data as like linguistic analysis, text mining, social network analysis, and so on. And also those technologies are not standardized. This situation makes it more difficult to reuse the visualization system of a project to other projects. We assume that the reason is lack of commonality design of visualization system considering to expanse it to other system. In our research, we suggest a common information model for visualizing text data and propose a comprehensive and reusable framework, TexVizu, for visualizing text data. At first, we survey representative researches in text visualization era. And also we identify common elements for text visualization and common patterns among various cases of its. And then we review and analyze elements and patterns with three different viewpoints as structural viewpoint, interactive viewpoint, and semantic viewpoint. And then we design an integrated model of text data which represent elements for visualization. The structural viewpoint is for identifying structural element from various text documents as like title, author, body, and so on. The interactive viewpoint is for identifying the types of relations and interactions between text documents as like post, comment, reply and so on. The semantic viewpoint is for identifying semantic elements which extracted from analyzing text data linguistically and are represented as tags for classifying types of entity as like people, place or location, time, event and so on. After then we extract and choose common requirements for visualizing text data. The requirements are categorized as four types which are structure information, content information, relation information, trend information. Each type of requirements comprised with required visualization techniques, data and goal (what to know). These requirements are common and key requirement for design a framework which keep that a visualization system are loosely coupled from data processing or analyzing system. Finally we designed a common text visualization framework, TexVizu which is reusable and expansible for various visualization projects by collaborating with various Text Data Loader and Analytical Text Data Visualizer via common interfaces as like ITextDataLoader and IATDProvider. And also TexVisu is comprised with Analytical Text Data Model, Analytical Text Data Storage and Analytical Text Data Controller. In this framework, external components are the specifications of required interfaces for collaborating with this framework. As an experiment, we also adopt this framework into two text visualization systems as like a social opinion mining system and an online news analysis system.

NoSQL-based Distributed Processing System for Processing BigData Security Events (빅데이터 보안이벤트 처리를 위한 NoSQL 기반 분산 처리 시스템)

  • Han, HyoJoon;Kang, JiWon;Jung, Yong-Hwan;Kim, Yangwoo
    • Annual Conference of KIPS
    • /
    • 2017.04a
    • /
    • pp.90-93
    • /
    • 2017
  • 인터넷과 클라우드 서비스 사용이 증가하면서 패킷의 양과 사이버 위협이 증가하였다. 본 논문에서는 빅데이터를 처리하기 위해 사용되는 NoSQL을 보안이벤트의 신속한 처리를 위한 침입탐지시스템에 적용하였다. 다양한 데이터 모델 유형의 NoSQL 데이터베이스 중에서 빅데이터 보안이벤트를 처리하는데 가장 적합한 시스템을 찾기 위해 세 가지 유형의 Snort 룰 기반 보안이벤트 분산 처리 프로토타입 시스템들을 구축하였고 각 시스템의 성능을 평가하였다. 그 결과로 MongoDB 기반의 보안이벤트 분산 처리 시스템이 가장 속도가 빠른 것을 확인하였다.

Research on Data Preprocessing Techniques for Efficient Decision-Making in Food Import Procedures (식품 수입 절차에서의 효율적 의사결정을 위한 데이터 전처리 기술에 관한 연구)

  • Jae-Hyeong Park;Yong-Uk Song;Ju-Young Kang
    • The Journal of Bigdata
    • /
    • v.8 no.1
    • /
    • pp.61-71
    • /
    • 2023
  • With the development of data-driven decision-making and sophisticated big data processing technique, there is a growing demand for information on how to process data. However, recent studies with data preprocessing mentioned only as a means to achieve a result. Therefore, in this study, we aimed to write in detail about the data processing pipeline, include preprocessing data. In particular, we shares the context and domain knowledge to aid fluent understand of the research.

K Nearest Neighbor Joins for Big Data Processing based on Spark (Spark 기반 빅데이터 처리를 위한 K-최근접 이웃 연결)

  • JIAQI, JI;Chung, Yeongjee
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.21 no.9
    • /
    • pp.1731-1737
    • /
    • 2017
  • K Nearest Neighbor Join (KNN Join) is a simple yet effective method in machine learning. It is widely used in small dataset of the past time. As the number of data increases, it is infeasible to run this model on an actual application by a single machine due to memory and time restrictions. Nowadays a popular batch process model called MapReduce which can run on a cluster with a large number of computers is widely used for large-scale data processing. Hadoop is a framework to implement MapReduce, but its performance can be further improved by a new framework named Spark. In the present study, we will provide a KNN Join implement based on Spark. With the advantage of its in-memory calculation capability, it will be faster and more effective than Hadoop. In our experiments, we study the influence of different factors on running time and demonstrate robustness and efficiency of our approach.

Efficient Processing of Multiple Group-by Queries in MapReduce for Big Data Analysis (맵리듀스에서 빅데이터 분석을 위한 다중 Group-by 질의의 효율적인 처리 기법)

  • Park, Eunju;Park, Sojeong;Oh, Sohyun;Choi, Hyejin;Lee, Ki Yong;Shim, Junho
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.5
    • /
    • pp.387-392
    • /
    • 2015
  • MapReduce is a framework used to process large data sets in parallel on a large cluster. A group-by query is a query that partitions the input data into groups based on the values of the specified attributes, and then evaluates the value of the specified aggregate function for each group. In this paper, we propose an efficient method for processing multiple group-by queries using MapReduce. Instead of computing each group-by query independently, the proposed method computes multiple group-by queries in stages with one or more MapReduce jobs in order to reduce the total execution cost. We compared the performance of this method with the performance of a less sophisticated method that computes each group-by query independently. This comparison showed that the proposed method offers better performance in terms of execution time.

An Exploratory Study on the Uniqueness of Crowds and its Implications for Public Data Platform Design from the Information Processing View (정보처리관점에서 본 대중의 특성과 공공데이터 플랫폼 설계 시사점에 관한 탐색 연구)

  • Kim, Sang Wook;Lee, Jae Sook
    • Journal of the Korea Convergence Society
    • /
    • v.8 no.11
    • /
    • pp.373-381
    • /
    • 2017
  • As the viewpoint of data evolves from business resources to value resources, the focus on the data processing is rapidly shifting from the cartesian (alphanumeric) to the contextual. In response to this trend, the government is actively promoting the utilization of public data by the private sectors. With a focus on the fact that the mainstream of public data users is the crowds, this study, therefore, seeks to identify differences between the members and the crowds from information processing view, followed by presenting directions and main points for the design of data platform. This study is perhaps to be a starting point for finding practical alternatives for creating 'collective intelligence ecosystem' and to be a basis for building a big data base in response to the opening of public data.

Enhancing the performance of taxi application based on in-memory data grid technology (In-memory data grid 기술을 활용한 택시 애플리케이션 성능 향상 기법 연구)

  • Choi, Chi-Hwan;Kim, Jin-Hyuk;Park, Min-Kyu;Kwon, Kaaen;Jung, Seung-Hyun;Nazareno, Franco;Cho, Wan-Sup
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.5
    • /
    • pp.1035-1045
    • /
    • 2015
  • Recent studies in Big Data Analysis are showing promising results, utilizing the main memory for rapid data processing. In-memory computing technology can be highly advantageous when used with high-performing servers having tens of gigabytes of RAM with multi-core processors. The constraint in network in these infrastructure can be lessen by combining in-memory technology with distributed parallel processing. This paper discusses the research in the aforementioned concept applying to a test taxi hailing application without disregard to its underlying RDBMS structure. The application of IMDG technology in the application's backend API without restructuring the database schema yields 6 to 9 times increase in performance in data processing and throughput. Specifically, the change in throughput is very small even with increase in data load processing.

Range Segmentation of Dynamic Offloading (RSDO) Algorithm by Correlation for Edge Computing

  • Kang, Jieun;Kim, Svetlana;Kim, Jae-Ho;Sung, Nak-Myoung;Yoon, Yong-Ik
    • Journal of Information Processing Systems
    • /
    • v.17 no.5
    • /
    • pp.905-917
    • /
    • 2021
  • In recent years, edge computing technology consists of several Internet of Things (IoT) devices with embedded sensors that have improved significantly for monitoring, detection, and management in an environment where big data is commercialized. The main focus of edge computing is data optimization or task offloading due to data and task-intensive application development. However, existing offloading approaches do not consider correlations and associations between data and tasks involving edge computing. The extent of collaborative offloading segmented without considering the interaction between data and task can lead to data loss and delays when moving from edge to edge. This article proposes a range segmentation of dynamic offloading (RSDO) algorithm that isolates the offload range and collaborative edge node around the edge node function to address the offloading issue.The RSDO algorithm groups highly correlated data and tasks according to the cause of the overload and dynamically distributes offloading ranges according to the state of cooperating nodes. The segmentation improves the overall performance of edge nodes, balances edge computing, and solves data loss and average latency.

An Open Source Mobile Cloud Service: Geo-spatial Image Filtering Tools Using R (오픈소스 모바일 클라우드 서비스: R 기반 공간영상정보 필터링 사례)

  • Kang, Sanggoo;Lee, Kiwon
    • Spatial Information Research
    • /
    • v.22 no.5
    • /
    • pp.1-8
    • /
    • 2014
  • Globally, mobile, cloud computing or big data are the recent marketable key terms. These trend technologies or paradigm in the ICT (Information Communication Technology) fields exert large influence on the most application fields including geo-spatial applications. Among them, cloud computing, though the early stage in Korea now, plays a important role as a platform for other trend technologies uses. Especially, mobile cloud, an integrated platform with mobile device and cloud computing can be considered as a good solution to overcome well known limitations of mobile applications and to provide more information processing functionalities to mobile users. This work is a case study to design and implement the mobile application system for geo-spatial image filtering processing operated on mobile cloud platform built using OpenStack and various open sources. Filtering processing is carried out using R environment, recently being recognized as one of big data analysis technologies. This approach is expected to be an element linking geo-spatial information for new service model development and the geo-spatial analysis service development using R.

Performance Analysis of Distributed Hadoop Systems (분산 하둡 시스템의 성능 비교 분석)

  • Bae, Byoung-Jin;Kim, Young-Joo;Kim, Young-Kuk
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2014.05a
    • /
    • pp.479-482
    • /
    • 2014
  • Nowadays open-source hadoop systems have been using widely to efficiently manage a fast-growing big data. Hadoop systems consist of distributed file processing system called HDFS (Hadoop Distributed File System) and distributed parallel processing system called MapReduce. The MapReduce reads and processes big data from HDFS and then processed results are written in HDFS again by the MapReduce. Such a processing method has different system structure respectively according to hadoop version. Therefore, this paper shows analysis results for performance of hadoop systems. For this, we devise a way which monitors hadoop systems and measure occurrence frequency of processes, threads, and variables generated in hadoop system itself using the devised way. So, by using the measured results as analysis indicator, we help the indicator predict inner performance of hadoop systems.

  • PDF