• Title/Summary/Keyword: Parallel data processing

Search Result 751, Processing Time 0.031 seconds

An efficient parallel solution algorithm on the linear second-order partial differential equations with large sparse matrix being based on the block cyclic reduction technique (Block Cyclic Reduction 기법에 의한 대형 Sparse Matrix 선형 2계편미분방정식의 효율적인 병렬 해 알고리즘)

  • 이병홍;김정선
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.15 no.7
    • /
    • pp.553-564
    • /
    • 1990
  • The co-efficient matrix of linear second-order partial differential equations in the general form is partitioned with (n-1)x(n-1) submartices and is transformed into the block tridiagonal system. Then the cyclic odd-even reduction technique is applied to this system with the large-grain data granularity and the block cyclic reduction algorithm to solve unknown vectors of this system is created. But this block cyclic reduction technique is not suitable for the parallel processing system because of its parallelism chanigng at every computing stages. So a new algorithm for solving linear second-order partical differential equations is presentes by the block cyclic reduction technique which is modified in order to keep its parallelism constant, and to reduce gteatly its execution time. Both of these algoriths are compared and studied.

  • PDF

Problem space based search algorithm for manufacturing process with rework probabilities affecting product quality and tardiness (Rework 확률이 제품의 품질과 납기준수에 영향을 주는 공정을 위한 문제공간기반 탐색 알고리즘)

  • Kang, Yong-Ha;Lee, Young-Sup;Shin, Hyun-Joon
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.10 no.7
    • /
    • pp.1702-1710
    • /
    • 2009
  • In this paper, we propose a problem space based search(PSBS) algorithm to solve parallel machine scheduling problem considering rework probabilities. For each pair of a machine and a job type, rework probability of each job on a machine can be known through historical data acquisition. Neighborhoods are generated by perturbing four problem data vectors (processing times, due dates, setup times, and rework probabilities) and evaluated through the efficient dispatching heuristic (EDDR). The proposed algorithm is measured by maximum lateness and the number of reworked jobs. We show that the PSBS algorithm is considerably improved from the result obtained by EDDR.

NAWM Bus Architecture of High Performance for SoC (SoC를 위한 고성능 NAWM 버스 아키텍처)

  • Lee, Kook-Pyo;Yoon, Yung-Sup
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.45 no.9
    • /
    • pp.26-32
    • /
    • 2008
  • The conventional shared bus architecture is capable of processing only one data transaction in same time. In this paper, we propose the NAWM (No Arbitration Wild Master) bus architecture that is capable of processing several data transactions in same time. After designing the master and the slave wrappers of NAWM bus architecture about AMBA system, we confirm that most of IPs of AMBA system can be a lied without modification and the added timing delay can be neglected. from simulation we deduce that more than 50% parallel processing is possible when several masters initiate slaves in NAWM bus architecture.

A Programmable Doppler Processor Using a Multiple-DSP Board (다중 DSP 보드를 이용한 프로그램 가능한 도플러 처리기)

  • 신현익;김환우
    • Journal of the Institute of Electronics Engineers of Korea SC
    • /
    • v.40 no.5
    • /
    • pp.333-340
    • /
    • 2003
  • Doppler processing is the heart of pulsed Doppler radar. It gives a clutter elimination and coherent integration. With the improvement of digital signal processors (DPSs), the implementation using them is more widely used in radar systems. Generally, so as for Doppler processor to process the input data in real time, a parallel processing concept using multiple DSPs should be used. This paper implements a programmable Doppler processor, which consists of MTI filter, DFB and square-law detector, using 8 ADSP21060s. Formulating the distribution time of the input data, the transfer time of the output data and the time required to compute each algorithm, it estimates total processing time and the number of required DSP. Finally, using the TSG that provides radar control pulses and simulated target signals, performances of the implemented Doppler processor are evaluated.

Dynamic Available-Resource Reallocation based Job Scheduling Model in Grid Computing (그리드 컴퓨팅에서 유효자원 동적 재배치 기반 작업 스케줄링 모델)

  • Kim, Jae-Kwon;Lee, Jong-Sik
    • Journal of the Korea Society for Simulation
    • /
    • v.21 no.2
    • /
    • pp.59-67
    • /
    • 2012
  • A grid computing consists of the physical resources for processing one of the large-scale jobs. However, due to the recent trends of rapid growing data, the grid computing needs a parallel processing method to process the job. In general, each physical resource divides a requested large-scale task. And a processing time of the task varies with an efficiency and a distance of each resource. Even if some resource completes a job, the resource is standing by until every divided job is finished. When every resource finishes a processing, each resource starts a next job. Therefore, this paper proposes a dynamic resource reallocation scheduling model (DDRSM). DDRSM finds a waiting resource and reallocates an unfinished job with an efficiency and a distance of the resource. DDRSM is an efficient method for processing multiple large-scale jobs.

Efficient Interleaving Scheme of Volume Holographic Memory (체적 홀로그래픽 메모리에서의 효율적인 인터리빙 기법)

  • Seunghoon Han;Kim, Minseung;Byungchoon Yang;Lee, Byoungho
    • Proceedings of the Optical Society of Korea Conference
    • /
    • 2003.02a
    • /
    • pp.72-73
    • /
    • 2003
  • In volume holographic memory (VHM), two-dimensional data array (i.e. data page) is used for the recording and the retrieving process with the aid of spatial light modulator (SLM) and CCD camera. Due to this two-dimensional parallel data processing, burst errors in this system also have two-dimensional characteristics in a data page domain. In this paper, we present a channel model of the burst noise and burst error for the VHM system using disk type recording media. (omitted)

  • PDF

Efficient Implementation of Convolutional Neural Network Using CUDA (CUDA를 이용한 Convolutional Neural Network의 효율적인 구현)

  • Ki, Cheol-Min;Cho, Tai-Hoon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.21 no.6
    • /
    • pp.1143-1148
    • /
    • 2017
  • Currently, Artificial Intelligence and Deep Learning are rising as hot social issues, and these technologies are applied to various fields. A good method among the various algorithms in Artificial Intelligence is Convolutional Neural Networks. Convolutional Neural Network is a form that adds Convolution Layers to Multi Layer Neural Network. If you use Convolutional Neural Networks for small amount of data, or if the structure of layers is not complicated, you don't have to pay attention to speed. But the learning should take long time when the size of the learning data is large and the structure of layers is complicated. In these cases, GPU-based parallel processing is frequently needed. In this paper, we developed Convolutional Neural Networks using CUDA, and show that its learning is faster and more efficient than learning using some other frameworks or programs.

A Data Transfer Method of the Sub-Cluster Group based on the Distributed and Shared Memory (분산 공유메모리를 기반으로 한 서브 클러스터 그룹의 자료전송방식)

  • Lee, Kee-Jun
    • The KIPS Transactions:PartA
    • /
    • v.10A no.6
    • /
    • pp.635-642
    • /
    • 2003
  • The radical development of recent network technology provides the basic foundation which can establish a high speed and cheap cluster system. It is a general trend that conventional cluster systems are built as the system over a fixed level based on stabilized and high speed local networks. A multi-distributed web cluster group is a web cluster model which can obtain high performance, high efficiency and high availability through mutual cooperative works between effective job division and system nodes through parallel performance of a given work and shared memory of SC-Server with low price and low speed system nodes on networks. For this, multi-distributed web cluster group builds a sub-cluster group bound with single imaginary networks of multiple system nodes and uses the web distributed shared memory of system nodes for the effective data transmission within sub-cluster groups. Since the presented model uses a load balancing and parallel computing method of large-scale work required from users, it can maximize the processing efficiency.

Design of Luma and Chroma Sub-pixel Interpolator for H.264 Motion Estimation (H.264 움직임 예측을 위한 Luma와 Chroma 부화소 보간기 설계)

  • Lee, Seon-Young;Cho, Kyeong-Soon
    • The KIPS Transactions:PartA
    • /
    • v.18A no.6
    • /
    • pp.249-254
    • /
    • 2011
  • This paper describes an efficient design of the interpolation circuit to generate the luma and chroma sub-pixels for H.264 motion estimation. The circuit based on the proposed architecture does not require any input data buffering and processes the horizontal, vertical and diagonal sub-pixel interpolations in parallel. The performance of the circuit is further improved by simultaneously processing the 1/2-pixel and 1/4-pixel interpolations for luma components and the 1/8-pixel interpolations for chroma components. In order to reduce the circuit size, we store the intermediate data required to process all the interpolations in parallel in the internal SRAM's instead of registers. We described the proposed circuit at register transfer level and verified its operation on FPGA board. We also synthesized the gate-level circuit using 130nm CMOS standard cell library. It consists of 20,674 gates and has the maximum operating frequency of 244MHz. The total number of SPSRAM bits used in our circuit is 3,232. The size of our circuit (including logic gates and SRAM's) is smaller than others and the performance is still comparable to them.

Performance improvement for Streaming of High Capacity Panoramic Video (대용량 파노라마 비디오 스트리밍의 성능개선)

  • Kim, Young-Back;Kim, Tae-Ho;Lee, Dae-Gyu;Kim, Jae-Joon
    • Journal of Internet Computing and Services
    • /
    • v.11 no.2
    • /
    • pp.143-153
    • /
    • 2010
  • When providing high quality panoramic video across the Internet, mobile communications, and broadcasting areas, it requires a suitable video codec that satisfies both high-compression efficiency and random access functionality. The users must have high-compression efficiency in order to enable video streaming of high-volume panoramic data. Random access allows the user to move the viewpoint and direction freely. In this paper, we propose the parallel processing scheme under cell units in order to improve the performance of streaming service for large screen panoramic video in 10Mbps bandwidths based on H.264/AVC with high compression rate. This improved algorithm divides a screen composed of cells less than $256{\times}256$ in size, encodes it, and decodes it with the cells in the present view. At this point, encoding/decoding is parallel processed by the present cell units. Also, since the cells only included in the present view are packed and transmitted, the possible processing of not extricating blocks is proven by experiment.