• Title/Summary/Keyword: Parallel data processing

Search Result 751, Processing Time 0.031 seconds

Development of Real time Air Quality Prediction System

  • Oh, Jai-Ho;Kim, Tae-Kook;Park, Hung-Mok;Kim, Young-Tae
    • Proceedings of the Korean Environmental Sciences Society Conference
    • /
    • 2003.11a
    • /
    • pp.73-78
    • /
    • 2003
  • In this research, we implement Realtime Air Diffusion Prediction System which is a parallel Fortran model running on distributed-memory parallel computers. The system is designed for air diffusion simulations with four-dimensional data assimilation. For regional air quality forecasting a series of dynamic downscaling technique is adopted using the NCAR/Penn. State MM5 model which is an atmospheric model. The realtime initial data have been provided daily from the KMA (Korean Meteorological Administration) global spectral model output. It takes huge resources of computation to get 24 hour air quality forecast with this four step dynamic downscaling (27km, 9km, 3km, and lkm). Parallel implementation of the realtime system is imperative to achieve increased throughput since the realtime system have to be performed which correct timing behavior and the sequential code requires a large amount of CPU time for typical simulations. The parallel system uses MPI (Message Passing Interface), a standard library to support high-level routines for message passing. We validate the parallel model by comparing it with the sequential model. For realtime running, we implement a cluster computer which is a distributed-memory parallel computer that links high-performance PCs with high-speed interconnection networks. We use 32 2-CPU nodes and a Myrinet network for the cluster. Since cluster computers more cost effective than conventional distributed parallel computers, we can build a dedicated realtime computer. The system also includes web based Gill (Graphic User Interface) for convenient system management and performance monitoring so that end-users can restart the system easily when the system faults. Performance of the parallel model is analyzed by comparing its execution time with the sequential model, and by calculating communication overhead and load imbalance, which are common problems in parallel processing. Performance analysis is carried out on our cluster which has 32 2-CPU nodes.

  • PDF

Color Media Instructions for Embedded Parallel Processors (임베디드 병렬 프로세서를 위한 칼라미디어 명령어 구현)

  • Kim, Cheol-Hong;Kim, Jong-Myon
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.35 no.7
    • /
    • pp.305-317
    • /
    • 2008
  • As a mobile computing environment is rapidly changing, increasing user demand for multimedia-over-wireless capabilities on embedded processors places constraints on performance, power, and sire. In this regard, this paper proposes color media instructions (CMI) for single instruction, multiple data (SIMD) parallel processors to meet the computational requirements and cost goals. While existing multimedia extensions store and process 48-bit pixels in a 32-bit register, CMI, which considers that color components are perceptually less significant, supports parallel operations on two-packed compressed 16-bit YCbCr (6 bit Y and 5 bits Cb, Cr) data in a 32-bit datapath processor. This provides greater concurrency and efficiency for YCbCr data processing. Moreover, the ability to reduce data format size reduces system cost. The reduction in data bandwidth also simplifies system design. Experimental results on a representative SIMD parallel processor architecture show that CMI achieves an average speedup of 6.3x over the baseline SIMD parallel processor performance. This is in contrast to MMX (a representative Intel's multimedia extensions), which achieves an average speedup of only 3.7x over the same baseline SIMD architecture. CMI also outperforms MMX in both area efficiency (a 52% increase versus a 13% increase) and energy efficiency (a 50% increase versus an 11% increase). CMI improves the performance and efficiency with a mere 3% increase in the system area and a 5% increase in the system power, while MMX requires a 14% increase in the system area and a 16% increase in the system power.

Real Time Distributed Parallel Processing to Visualize Noise Map with Big Sensor Data and GIS Data for Smart Cities (스마트시티의 빅 센서 데이터와 빅 GIS 데이터를 융합하여 실시간 온라인 소음지도로 시각화하기 위한 분산병렬처리 방법론)

  • Park, Jong-Won;Sim, Ye-Chan;Jung, Hae-Sun;Lee, Yong-Woo
    • Journal of Internet Computing and Services
    • /
    • v.19 no.4
    • /
    • pp.1-6
    • /
    • 2018
  • In smart cities, data from various kinds of sensors are collected and processed to provide smart services to the citizens. Noise information services with noise maps using the collected sensor data from various kinds of ubiquitous sensor networks is one of them. This paper presents a research result which generates three dimensional (3D) noise maps in real-time for smart cities. To make a noise map, we have to converge many informal data which include big image data of geographical Information and massive sensor data. Making such a 3D noise map in real-time requires the processing of the stream data from the ubiquitous sensor networks in real-time and the convergence operation in real-time. They are very challenging works. We developed our own methodology for real-time distributed and parallel processing for it and present it in this paper. Further, we developed our own real-time 3D noise map generation system, with the methodology. The system uses open source softwares for it. Here in this paper, we do introduce one of our systems which uses Apache Storm. We did performance evaluation using the developed system. Cloud computing was used for the performance evaluation experiments. It was confirmed that our system was working properly with good performance and the system can produce the 3D noise maps in real-time. The performance evaluation results are given in this paper, as well.

Data Assimilation of Real-time Air Quality Forecast using CUDA (CUDA를 이용한 실시간 대기질 예보 자료동화)

  • Bae, Hyo-Sik;Yu, Suk-Hyun;Kwon, Hee-Yong
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.17 no.2
    • /
    • pp.271-277
    • /
    • 2017
  • As a result of rapid industrialization, air pollutants are seriously threatening the health of the people, the forecast is becoming more and more important. In forecasting air quality, it is very important to create a reliable initial field because the initial field input to the air quality forecasting model affects the accuracy of the forecast. There are several methods for enhancing the initial field input. One of the necessary techniques is data assimilation. The number of operations and the time required for such data assimilation is exponentially increased as the forecasting area is widened and the number of observation sites increases. Therefore, as the forecast size increases, it is difficult to apply the existing sequential processing method to a field requiring fast processing speed. In this paper, we propose a method that can process Cresman's method, which is one of the data assimilation techniques, in real time using CUDA. As a result, the proposed parallel processing method using CUDA improved at least 35 times faster than the conventional sequential method and other parallel processing methods.

Sequential Pattern Mining with Optimization Calling MapReduce Function on MapReduce Framework (맵리듀스 프레임웍 상에서 맵리듀스 함수 호출을 최적화하는 순차 패턴 마이닝 기법)

  • Kim, Jin-Hyun;Shim, Kyu-Seok
    • The KIPS Transactions:PartD
    • /
    • v.18D no.2
    • /
    • pp.81-88
    • /
    • 2011
  • Sequential pattern mining that determines frequent patterns appearing in a given set of sequences is an important data mining problem with broad applications. For example, sequential pattern mining can find the web access patterns, customer's purchase patterns and DNA sequences related with specific disease. In this paper, we develop the sequential pattern mining algorithms using MapReduce framework. Our algorithms distribute input data to several machines and find frequent sequential patterns in parallel. With synthetic data sets, we did a comprehensive performance study with varying various parameters. Our experimental results show that linear speed up can be achieved through our algorithms with increasing the number of used machines.

A Parallel Algorithm for Merging Relaxed Min-Max Heaps (Relaxed min-max 힙을 병합하는 병렬 알고리즘)

  • Min, Yong-Sik
    • The Transactions of the Korea Information Processing Society
    • /
    • v.5 no.5
    • /
    • pp.1162-1171
    • /
    • 1998
  • This paper presents a data structure that implements a mergable double-ended priority queue : namely an improved relaxed min-max-pair heap. By means of this new data structure, we suggest a parallel algorithm to merge priority queues organized in two relaxed heaps of different sizes, n and k, respectively. This new data-structure eliminates the blossomed tree and the lazying method used to merge the relaxed min-max heaps in [9]. As a result, employing max($2^{i-1}$,[(m+1/4)]) processors, this algorithm requires O(log(log(n/k))${\times}$log(n)) time. Also, on the MarPar machine, this method achieves a 35.205-fold speedup with 64 processors to merge 8 million data items which consist of two relaxed min-max heaps of different sizes.

  • PDF

Parameter analysis for gas hydrate data of East sea using Geobit (지오빗을 이용한 동해 가스하이드레이트 탄성파 자료처리 매개변수 분석)

  • Kim, Young-Wan;Jang, Seong-Hyung;Kim, Hyun-Tae;Yoon, Wang-Joong
    • 한국신재생에너지학회:학술대회논문집
    • /
    • 2006.06a
    • /
    • pp.377-381
    • /
    • 2006
  • A seismic survey for gas hydrate have performed over the East sea by the KIGAM since 1997. General indicator of gas hydrate in seismic data is commonly inferred from the BSR(Bottom Simulating Reflector) that occurred parallel to the sea floor, amplitude decrease at the top of the BSR, amplitude blanking at the bottom of the BSR, decrease of the interval velocity and the reflection phase reversal at the BSR. In this paper we had analyzed optimum parameters of the field data to detect the 9as hydrate. Shot delay correction is applied 95ms, spherical divergence correction is applied velocity library 3, bandpass filter is applied 25-30-115-120Hz deconvolution operator length is applied 60ms, lag is 6ms and accurate velocity analysis NMO correction, stack is performed. Geobit 2.11.0 developed by the KIGAM was used for all data processing. Processing results say that the BSR occurred parallel to the sea floor were shown at 3,150m/s of two way travel time from the sea floor through shot point 5,000-5,610, and identified the interval velocity decrease around BSR and the reflection phase reversal corresponding to the reflection at the sea floor.

  • PDF

Implementation of Parallel Processing Interpolation Algorithm for Multicore GPU (다중코어 GPU를 위한 병렬처리 보간 알고리즘 구현)

  • Lee, Kwang-Yeob;Kim, Chi-Yong
    • Journal of IKEEE
    • /
    • v.16 no.4
    • /
    • pp.304-309
    • /
    • 2012
  • As resolution for displays is recently more and more increasing, the amount of data abd calculation that graphic hardware needs to process are also increasing. Especially the amount of data processing by rasterizer is rapidly increasing. This paper used an algorism using coordinates in center of gravity and area for triangle instead of using bilinear algorism[1] used by conventional interpolation, which is to make it easier for parallel processing by rasterizer. This paper implemented designed rasterizer under FPGA environment, and compared it with conventional rasterizer and verified it. This rasterizer is proved to have approximately 50% higher performance compared to conventional one.

The Development of High Resolution Film Scanner Using DSP (DSP를 이용한 고해상도 스캐너 개발)

  • 김태현;최은석;백중환
    • Proceedings of the Korea Institute of Convergence Signal Processing
    • /
    • 2000.12a
    • /
    • pp.149-152
    • /
    • 2000
  • A scanner is an output device that scans documents, photographs, films etc, and convert them to digital data. Especially, a film scanner is used for scanning negative/positive films. In this paper, we design step motor control part, image sensor part, and Aか converter part which are components of the scanner and use DSP for fast signal processing. We also design the interface circuits using EPLD between these peripherals and DSP. The PC interface circuits between scanner and PC are designed by using parallel port to control and transfer the scanned data from scanner to PC. For 35mm film, we design hardwares which obtain high resolution more than 9 million pixels (horizontal resolution is 3835 and vertical resolution is 2592).

  • PDF

Dynamic Block Reassignment for Load Balancing of Block Centric Graph Processing Systems (블록 중심 그래프 처리 시스템의 부하 분산을 위한 동적 블록 재배치 기법)

  • Kim, Yewon;Bae, Minho;Oh, Sangyoon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.7 no.5
    • /
    • pp.177-188
    • /
    • 2018
  • The scale of graph data has been increased rapidly because of the growth of mobile Internet applications and the proliferation of social network services. This brings upon the imminent necessity of efficient distributed and parallel graph processing approach since the size of these large-scale graphs are easily over a capacity of a single machine. Currently, there are two popular parallel graph processing approaches, vertex-centric graph processing and block centric processing. While a vertex-centric graph processing approach can easily be applied to the parallel processing system, a block-centric graph processing approach is proposed to compensate the drawbacks of the vertex-centric approach. In these systems, the initial quality of graph partition affects to the overall performance significantly. However, it is a very difficult problem to divide the graph into optimal states at the initial phase. Thus, several dynamic load balancing techniques have been studied that suggest the progressive partitioning during the graph processing time. In this paper, we present a load balancing algorithms for the block-centric graph processing approach where most of dynamic load balancing techniques are focused on vertex-centric systems. Our proposed algorithm focus on an improvement of the graph partition quality by dynamically reassigning blocks in runtime, and suggests block split strategy for escaping local optimum solution.