• Title/Summary/Keyword: Massive Data Processing

Search Result 231, Processing Time 0.407 seconds

Introduction to general purpose GPU computing (GPU를 이용한 범용 계산의 소개)

  • Yu, Donghyeon;Lim, Johan
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.5
    • /
    • pp.1043-1061
    • /
    • 2013
  • Recent advances in computer technology introduce massive data and their analysis becomes important. The high performance computing is one of the most essential part in analysis of massive data. In this paper, we review the general purpose of the graphics processing unit and its application to parallel computing, which has been of great interest in statistics communities.

A Method to Provide Context from Massive Data Processing in Context-Aware System (상황인지 시스템에서 대용량의 데이터 처리결과를 컨텍스트 정보로 제공하기 위한 방법)

  • Park, Yoo Sang;Choi, Jong Sun;Choi, Jae Young
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.8 no.4
    • /
    • pp.145-152
    • /
    • 2019
  • Unlike a single value from a sensor device, a massive data set has characteristics for various processing aspects; input data may be formed in a different format, the size of input data varies, and the processing time of analyzing input data is not predictable. Therefore, context aware systems may contain complex modules, and these modules can be implemented and used in different ways. In order to solve these problems, we propose a method to handle context information from the result of analyzing massive data. The proposed method considers analysis work as a different type of abstracting context and suggests the way of representing context information. In experiment, we demonstrate how the context processing engine works properly in a couple of steps with healthcare services.

Two-Tier Storage DBMS for High-Performance Query Processing

  • Eo, Sang-Hun;Li, Yan;Kim, Ho-Seok;Bae, Hae-Young
    • Journal of Information Processing Systems
    • /
    • v.4 no.1
    • /
    • pp.9-16
    • /
    • 2008
  • This paper describes the design and implementation of a two-tier DBMS for handling massive data and providing faster response time. In the present day, the main requirements of DBMS are figured out using two aspects. The first is handling large amounts of data. And the second is providing fast response time. But in fact, Traditional DBMS cannot fulfill both the requirements. The disk-oriented DBMS can handle massive data but the response time is relatively slower than the memory-resident DBMS. On the other hand, the memory-resident DBMS can provide fast response time but they have original restrictions of database size. In this paper, to meet the requirements of handling large volumes of data and providing fast response time, a two-tier DBMS is proposed. The cold-data which does not require fast response times are managed by disk storage manager, and the hot-data which require fast response time among the large volumes of data are handled by memory storage manager as snapshots. As a result, the proposed system performs significantly better than disk-oriented DBMS with an added advantage to manage massive data at the same time.

Study on Data Processing of the IOT Sensor Network Based on a Hadoop Cloud Platform and a TWLGA Scheduling Algorithm

  • Li, Guoyu;Yang, Kang
    • Journal of Information Processing Systems
    • /
    • v.17 no.6
    • /
    • pp.1035-1043
    • /
    • 2021
  • An Internet of Things (IOT) sensor network is an effective solution for monitoring environmental conditions. However, IOT sensor networks generate massive data such that the abilities of massive data storage, processing, and query become technical challenges. To solve the problem, a Hadoop cloud platform is proposed. Using the time and workload genetic algorithm (TWLGA), the data processing platform enables the work of one node to be shared with other nodes, which not only raises efficiency of one single node but also provides the compatibility support to reduce the possible risk of software and hardware. In this experiment, a Hadoop cluster platform with TWLGA scheduling algorithm is developed, and the performance of the platform is tested. The results show that the Hadoop cloud platform is suitable for big data processing requirements of IOT sensor networks.

Parallelization of Raster GIS Operations Using PC Clusters (PC 클러스터를 이용한 래스터 GIS 연산의 병렬화)

  • 신윤호;박수홍
    • Spatial Information Research
    • /
    • v.11 no.3
    • /
    • pp.213-226
    • /
    • 2003
  • With the increasing demand of processing massive geographic data, conventional GISs based on the single processor architecture appear to be problematic. Especially, performing complex GIS operations on the massive geographic data is very time consuming and even impossible. This is due to the processor speed development does not keep up with the data volume to be processed. In the field of GIS, this PC clustering is one of the emerging technology for handling massive geographic data effectively. In this study, a MPI(Message Passing Interface)-based parallel processing approach was conducted to implement the existing raster GIS operations that typically requires massive geographic data sets in order to improve the processing capabilities and performance. Specially for this research, four types of raster CIS operations that Tomlin(1990) has introduced for systematic analysis of raster GIS operation. A data decomposition method was designed and implemented for selected raster GIS operations.

  • PDF

A Study of designing Parallel File System for Massive Information Processing (대규모 정보처리를 위한 병렬 화일시스템 설계에 관한 연구)

  • Jang, Si-Ung;Jeong, Gi-Dong
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.5
    • /
    • pp.1221-1230
    • /
    • 1997
  • In this study, the performance of a parallel file system(N-PFS), which is inplemented using conventional disks as disk arrays on a Workstation Cluster, is analyzed by using analytical method and adtual values in experiments.N-PFS can be used as high-performance file sever in small-scale server systems and effciently pro-cess massive data I/Os such as multimedia and scientifid data. In this paper, an analytical model was suggested and the correctness of the suggested was verified by analyzing the experimental values on a system.The result of the appropriate stping unit for processing massive data of the Workstation Cluster with 8 disks is 64-128Kbytes and the maximum throughput on it is 15.8 Mbytes/ses.In addition, the performance of parallel file system on massive data is bounded by the time required to copy data between buffers.

  • PDF

Design and Implementation of the Massive Underwater Acoustic Database System (대용량 해상시험자료 데이터베이스 시스템 설계 및 구현)

  • Jeong, Gi-Hyeon;Choe, Jae-Yong;Do, Gyeong-Cheol;Kim, Eung-Beom
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.8
    • /
    • pp.2020-2030
    • /
    • 1999
  • Data acquired about 53 gigabyte per a naval vessel is massive, expensive and impossible to be retested in a underwater acoustic measurement. So, it is required to manage systematically. In this paper, we mention MUADS (Massive Underwater Acoustic Database System) that was developed to store a long time, manage systematically and supply raw data and analyzed data. we analyze client/server performance based on 6003 input data individually in our MUADS that Unix server having a massive DC-jukebox and Windows clients.

  • PDF

ECPS: Efficient Cloud Processing Scheme for Massive Contents (클라우드 환경에서 대규모 콘텐츠를 위한 효율적인 자원처리 기법)

  • Na, Moon-Sung;Kim, Seung-Hoon;Lee, Jae-Dong
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.15 no.4
    • /
    • pp.17-27
    • /
    • 2010
  • Major IT vendors expect that cloud computing technology makes it possible to reduce the contents service cycle, speed up application deployment and skip the installation process, reducing operational costs, proactive management etc. However, cloud computing environment for massive content service solutions requires high-performance data processing to reduce the time of data processing and analysis. In this study, Efficient_Cloud_Processing_Scheme(ECPS) is proposed for allocation of resources for massive content services. For high-performance services, optimized resource allocation plan is presented using MapReduce programming techniques and association rules that is used to detect hidden patterns in data mining, based on levels of Hadoop platform(Infrastructure as a service). The proposed ECPS has brought more than 20% improvement in performance and speed compared to the traditional methods.

Efficient Top-K Queries Computation for Encrypted Data in the Cloud (클라우드 환경에서의 암호화 데이터에 대한 효율적인 Top-K 질의 수행 기법)

  • Kim, Jong Wook
    • Journal of Korea Multimedia Society
    • /
    • v.18 no.8
    • /
    • pp.915-924
    • /
    • 2015
  • With growing popularity of cloud computing services, users can more easily manage massive amount of data by outsourcing them to the cloud, or more efficiently analyse large amount of data by leveraging IT infrastructure provided by the cloud. This, however, brings the security concerns of sensitive data. To provide data security, it is essential to encrypt sensitive data before uploading it to cloud computing services. Although data encryption helps provide data security, it negatively affects the performance of massive data analytics because it forbids the use of index and mathematical operation on encrypted data. Thus, in this paper, we propose a novel algorithm which enables to efficiently process a large amount of encrypted data. In particular, we propose a novel top-k processing algorithm on the massive amount of encrypted data in the cloud computing environments, and verify the performance of the proposed approach with real data experiments.

Agent Based Cinder Monitoring System Supporting PDA

  • Han, Jung-Soo
    • International Journal of Contents
    • /
    • v.3 no.1
    • /
    • pp.24-28
    • /
    • 2007
  • This paper embodies the agent based cinder monitoring system which supports PDA(Personal Digital Assistant). Monitoring system automatically manages data by using data managing agents such as a state managing agent, a location managing agent, a badness managing agent, a circumstances managing agent, etc, and uses a massive data processing agent to manage massive data. The development of agent based data monitoring system for the stable cinder reuse will be an epoch-making method to develop the process mechanized or manual-labored that widely spreads into the real-time automated process.