• Title/Summary/Keyword: Big data Processing

Search Result 1,063, Processing Time 0.03 seconds

Design and Implementation of an Efficient Web Services Data Processing Using Hadoop-Based Big Data Processing Technique (하둡 기반 빅 데이터 기법을 이용한 웹 서비스 데이터 처리 설계 및 구현)

  • Kim, Hyun-Joo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.16 no.1
    • /
    • pp.726-734
    • /
    • 2015
  • Relational databases used by structuralizing data are the most widely used in data management at present. However, in relational databases, service becomes slower as the amount of data increases because of constraints in the reading and writing operations to save or query data. Furthermore, when a new task is added, the database grows and, consequently, requires additional infrastructure, such as parallel configuration of hardware, CPU, memory, and network, to support smooth operation. In this paper, in order to improve the web information services that are slowing down due to increase of data in the relational databases, we implemented a model to extract a large amount of data quickly and safely for users by processing Hadoop Distributed File System (HDFS) files after sending data to HDFSs and unifying and reconstructing the data. We implemented our model in a Web-based civil affairs system that stores image files, which is irregular data processing. Our proposed system's data processing was found to be 0.4 sec faster than that of a relational database system. Thus, we found that it is possible to support Web information services with a Hadoop-based big data processing technique in order to process a large amount of data, as in conventional relational databases. Furthermore, since Hadoop is open source, our model has the advantage of reducing software costs. The proposed system is expected to be used as a model for Web services that provide fast information processing for organizations that require efficient processing of big data because of the increase in the size of conventional relational databases.

A Study on Practical Classes for Healthcare Administration Education Program Using Health and Medical Big Data (보건의료 빅데이터를 활용한 보건행정 교육프로그램 실무수업에 관한 고찰)

  • Ok-Yul Yang;Yeon-Hee Lee
    • Journal of the Health Care and Life Science
    • /
    • v.10 no.1
    • /
    • pp.1-14
    • /
    • 2022
  • This study is a study on the possibility of using big data-related education programs in health and medical administration-related departments using health and medical big data. This paper intends to examine the health and medical big data from five perspectives. 1st, in addition to the aforementioned 'Health and Medical Big Data Open System', I would like to examine the characteristics and application technologies of public big data disclosed by 'Korea Welfare Panel', 'Public Big Data', 'Seoul City Big Data', 'Statistical Office Big Data', etc. 2nd, it is intended to examine the appropriateness of whether the applicable health and medical big data can be used as living data in regular subjects of health and medical administration and health information related departments of junior colleges. 3rd, we want to select the most appropriate tool for classroom lectures using existing statistical processing packages and programming languages. Fourth, finally, by using verified health and medical big data and appropriate tools, we want to test the possibility of expressing graphs, etc. in class and the steps from writing a report. 4th, I would like to describe the relative advantages of R language that can satisfy portability, installability, cost effectiveness, compatibility, and big data processing potential.

A Study on The Real-Time Data Collection/Analysis/Processing Intelligent IoT (실시간 데이터 수집/분석/처리를 위한 지능형 IoT)

  • Kim, Hee-Chul
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.14 no.2
    • /
    • pp.317-322
    • /
    • 2019
  • This study is based on big big data base for real-time collection/analysis/processing of data, creative analysis of data assets, and intelligent processing system based on IoT, which can measure distribution phase in real time. The mobile terminal uses the SDK of the provided device to measure the data information on the consumption of specific seafood production and distribution. We use the oneM2M protocol to store various kinds of information needed for seafood production, and implement a DB Server and a system that allows the administrator to manage the system using the UI.

Advanced Resource Management with Access Control for Multitenant Hadoop

  • Won, Heesun;Nguyen, Minh Chau;Gil, Myeong-Seon;Moon, Yang-Sae
    • Journal of Communications and Networks
    • /
    • v.17 no.6
    • /
    • pp.592-601
    • /
    • 2015
  • Multitenancy has gained growing importance with the development and evolution of cloud computing technology. In a multitenant environment, multiple tenants with different demands can share a variety of computing resources (e.g., CPU, memory, storage, network, and data) within a single system, while each tenant remains logically isolated. This useful multitenancy concept offers highly efficient, and cost-effective systems without wasting computing resources to enterprises requiring similar environments for data processing and management. In this paper, we propose a novel approach supporting multitenancy features for Apache Hadoop, a large scale distributed system commonly used for processing big data. We first analyze the Hadoop framework focusing on "yet another resource negotiator (YARN)", which is responsible for managing resources, application runtime, and access control in the latest version of Hadoop. We then define the problems for supporting multitenancy and formally derive the requirements to solve these problems. Based on these requirements, we design the details of multitenant Hadoop. We also present experimental results to validate the data access control and to evaluate the performance enhancement of multitenant Hadoop.

Research on Big Data Integration Method

  • Kim, Jee-Hyun;Cho, Young-Im
    • Journal of the Korea Society of Computer and Information
    • /
    • v.22 no.1
    • /
    • pp.49-56
    • /
    • 2017
  • In this paper we propose the approach for big data integration so as to analyze, visualize and predict the future of the trend of the market, and that is to get the integration data model using the R language which is the future of the statistics and the Hadoop which is a parallel processing for the data. As four approaching methods using R and Hadoop, ff package in R, R and Streaming as Hadoop utility, and Rhipe and RHadoop as R and Hadoop interface packages are used, and the strength and weakness of four methods are described and analyzed, so Rhipe and RHadoop are proposed as a complete set of data integration model. The integration of R, which is popular for processing statistical algorithm and Hadoop contains Distributed File System and resource management platform and can implement the MapReduce programming model gives us a new environment where in R code can be written and deployed in Hadoop without any data movement. This model allows us to predictive analysis with high performance and deep understand over the big data.

On Efficient Processing of Continuous Reverse Skyline Queries in Wireless Sensor Networks

  • Yin, Bo;Zhou, Siwang;Zhang, Shiwen;Gu, Ke;Yu, Fei
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.4
    • /
    • pp.1931-1953
    • /
    • 2017
  • The reverse skyline query plays an important role in information searching applications. This paper deals with continuous reverse skyline queries in sensor networks, which retrieves reverse skylines as well as the set of nodes that reported them for continuous sampling epochs. Designing an energy-efficient approach to answer continuous reverse skyline queries is non-trivial because the reverse skyline query is not decomposable and a huge number of unqualified nodes need to report their sensor readings. In this paper, we develop a new algorithm that avoids transmission of updates from nodes that cannot influence the reverse skyline. We propose a data mapping scheme to estimate sensor readings and determine their dominance relationships without having to know the true values. We also theoretically analyze the properties for reverse skyline computation, and propose efficient pruning techniques while guaranteeing the correctness of the answer. An extensive experimental evaluation demonstrates the efficiency of our approach.

Apriori Based Big Data Processing System for Improve Sensor Data Throughput in IoT Environments (IoT 환경에서 센서 데이터 처리율 향상을 위한 Apriori 기반 빅데이터 처리 시스템)

  • Song, Jin Su;Kim, Soo Jin;Shin, Young Tae
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.10 no.10
    • /
    • pp.277-284
    • /
    • 2021
  • Recently, the smart home environment is expected to be a platform that collects, integrates, and utilizes various data through convergence with wireless information and communication technology. In fact, the number of smart devices with various sensors is increasing inside smart homes. The amount of data that needs to be processed by the increased number of smart devices is also increasing, and big data processing systems are actively being introduced to handle it effectively. However, traditional big data processing systems have all requests directed to cluster drivers before they are allocated to distributed nodes, leading to reduced cluster-wide performance sharing as cluster drivers managing segmentation tasks become bottlenecks. In particular, there is a greater delay rate on smart home devices that constantly request small data processing. Thus, in this paper, we design a Apriori-based big data system for effective data processing in smart home environments where frequent requests occur at the same time. According to the performance evaluation results of the proposed system, the data processing time was reduced by up to 38.6% from at least 19.2% compared to the existing system. The reason for this result is related to the type of data being measured. Because the amount of data collected in a smart home environment is large, the use of cache servers plays a major role in data processing, and association analysis with Apriori algorithms stores highly relevant sensor data in the cache.

Processing Method of Mass Small File Using Hadoop Platform (하둡 플랫폼을 이용한 대량의 스몰파일 처리방법)

  • Kim, Chang-Bok;Chung, Jae-Pil
    • Journal of Advanced Navigation Technology
    • /
    • v.18 no.4
    • /
    • pp.401-408
    • /
    • 2014
  • Hadoop is composed with MapReduce programming model for distributed processing and HDFS distributed file system. Hadoop is suitable framework for big data processing, but processing of mass small files have many problems. The processing of mass small file in hadoop have problems to created one mapper per one file, and it have problems to needed many memory for store of meta information of file. This paper have comparison evaluation processing method of mass small file with various method in hadoop platform. The processing of general compression format is inadequate because of processing by one mapper regardless of data size. The processing of sequence and hadoop archive file is removed memory problem of namenode by compress and combine of small file. Hadoop archive file is faster then sequence file about combine time of small file. The processing using CombineFileInputFormat class is needed not combine of small file, and it have similar speed big data processing method.

Design of a Disaster Big Data Platform for Collecting and Analyzing Social Media (소셜미디어 수집과 분석을 위한 재난 빅 데이터 플랫폼의 설계)

  • Nguyen, Van-Quyet;Nguyen, Sinh-Ngoc;Nguyen, Giang-Truong;Kim, Kyungbaek
    • Annual Conference of KIPS
    • /
    • 2017.04a
    • /
    • pp.661-664
    • /
    • 2017
  • Recently, during disasters occurrence, dealing with emergencies has been handled well by the early transmission of disaster relating notifications on social media networks (e.g., Twitter or Facebook). Intuitively, with their characteristics (e.g., real-time, mobility) and big communities whose users could be regarded as volunteers, social networks are proved to be a crucial role for disasters response. However, the amount of data transmitted during disasters is an obstacle for filtering informative messages; because the messages are diversity, large and very noise. This large volume of data could be seen as Social Big Data (SBD). In this paper, we proposed a big data platform for collecting and analyzing disasters' data from SBD. Firstly, we designed a collecting module; which could rapidly extract disasters' information from the Twitter; by big data frameworks supporting streaming data on distributed system; such as Kafka and Spark. Secondly, we developed an analyzing module which learned from SBD to distinguish the useful information from the irrelevant one. Finally, we also designed a real-time visualization on the web interface for displaying the results of analysis phase. To show the viability of our platform, we conducted experiments of the collecting and analyzing phases in 10 days for both real-time and historical tweets, which were about disasters happened in South Korea. The results prove that our big data platform could be applied to disaster information based systems, by providing a huge relevant data; which can be used for inferring affected regions and victims in disaster situations, from 21.000 collected tweets.

A Study on Medical Information Platform Based on Big Data Processing and Edge Computing for Supporting Automatic Authentication in Emergency Situations (응급상황에서 자동인증지원을 위한 빅데이터 처리 및 에지컴퓨팅 기반의 의료정보플랫폼 연구)

  • Ham, Gyu-Sung;Kang, Mingoo;Joo, Su-Chong
    • Journal of Internet Computing and Services
    • /
    • v.23 no.3
    • /
    • pp.87-95
    • /
    • 2022
  • Recently, with the development of smart technology, in medical information platform, patient's biometric data is measured in real time and accumulated into database, and it is possible to determine the patient's emergency situations. Medical staff can easily access patient information after simple authentication using a mobile terminal. However, in accessing medical information using the mobile terminal, it is necessary to study authentication in consideration of the patient situations and mobile terminal. In this paper, we studied on medical information platforms based on big data processing and edge computing for supporting automatic authentication in emergency situations. The automatic authentication system that we had studied is an authentication system that simultaneously performs user authentication and mobile terminal authentication in emergency situations, and grants upper-level access rights to certified medical staff and mobile terminal. Big data processing and analysis techniques were applied to the proposed platform in order to determine emergency situations in consideration of patient conditions such as high blood pressure and diabetes. To quickly determine the patient's emergency situations, edge computing was placed in front of the medical information server so that the edge computing determine patient's situations instead of the medical information server. The medical information server derived emergency situation decision values using the input patient's information and accumulated biometric data, and transmit them to the edge computing to determine patient-customized emergency situation. In conclusion, the proposed medical information platform considers the patient's conditions and determine quick emergency situations through big data processing and edge computing, and enables rapid authentication in emergency situations through automatic authentication, and protects patient's information by granting access rights according to the patient situations and the role of the medical staff.