• Title/Summary/Keyword: 병렬프로세서

Search Result 579, Processing Time 0.027 seconds

Programmable Multimedia Platform for Video Processing of UHD TV (UHD TV 영상신호처리를 위한 프로그래머블 멀티미디어 플랫폼)

  • Kim, Jaehyun;Park, Goo-man
    • Journal of Broadcast Engineering
    • /
    • v.20 no.5
    • /
    • pp.774-777
    • /
    • 2015
  • This paper introduces the world's first programmable video-processing platform for the enhancement of the video quality of the 8K(7680x4320) UHD(Ultra High Definition) TV operating up to 60 frames per second. In order to support required computing capacity and memory bandwidth, the proposed platform implemented several key features such as symmetric multi-cluster architecture for parallel data processing, a ring-data path between the clusters for data pipelining and hardware accelerators for computing filter operations. The proposed platform based on RP(Reconfigurable Processor) processes video quality enhancement algorithms and handles effectively new UHD broadcasting standards and display panels.

A Study on the design of RNS Multiplier to speed up the Graphic Process (고속 그래픽 처리를 위한 잉여수계 승산기 설계에 관한 연구)

  • Kim, Yong-Sung;Cho, Won-Kyung
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.33B no.1
    • /
    • pp.25-37
    • /
    • 1996
  • To process computer graphics in real time, the high-speed operations(multiplier and adder) are needed to increase the speed of graphic process. RNS(Residue Number System) is integer number system that has the parallel and high-speed operation. Also, it is able to design both high-speed multiplier and adder, since a cyclic group has an isomorphic relation between multiplication and addition in RNS. So in this paper, DRNS(Double Residue Number System) is proposed, it is used for the multiplier and the adder, which are designed using a circulative code for the high-speed graphic processor in RNS. The designed multiplier would operate with the speed of 87Mzz two TTL using 74s09 and 74s32.

  • PDF

Preprocessing Methods for Effective Modulo Scheduling on High Performance DSPs (고성능 디지털 신호 처리 프로세서상에서 효율적인 모듈로 스케쥴링을 위한 전처리 기법)

  • Cho, Doo-San;Paek, Yun-Heung
    • Journal of KIISE:Software and Applications
    • /
    • v.34 no.5
    • /
    • pp.487-501
    • /
    • 2007
  • To achieve high resource utilization for multi-issue DSPs, production compiler commonly includes variants of iterative modulo scheduling algorithm. However, excessive cyclic data dependences, which exist in communication and media processing loops, unduly restrict modulo scheduling freedom. As a result, replicated functional units in multi-issue DSPs are often under-utilized. To address this resource under-utilization problem, our paper describes a novel compiler preprocessing strategy for effective modulo scheduling. The preprocessing strategy proposed capitalizes on two new transformations, which are referred to as cloning and dismantling. Our preprocessing strategy has been validated by an implementation for StarCore SC140 DSP compiler.

Performance Analysis of a Multiprocessor System Using Simulator Based on Parsec (Parsec 기반 시뮬레이터를 이용한 다중처리시스템의 성능 분석)

  • Lee Won-Joo;Kim Sun-Wook;Kim Hyeong-Rae
    • Journal of the Korea Society of Computer and Information
    • /
    • v.11 no.2 s.40
    • /
    • pp.35-42
    • /
    • 2006
  • In this paper we implement a new simulator for performance analysis of a parallel digital signal processing distributed shared memory multiprocessor systems. using Parsec The key idea of this simulator is suitable in simulation of system that uses DMA function of TMS320C6701 DSP chip and local memory which have fast access time. Also, because correction of performance parameter and reconfiguration for hardware components are easy, we can analyze performance of system in various execution environments. In the simulation, FET, 2D FET, Matrix Multiplication. and Fir Filter, which are widely used DSP algorithms. have been employed. Using our simulator, the result has been recorded according to different the number of processor, data sizes, and a change of hardware element. The performance of our simulator has been verified by comparing those recorded results.

  • PDF

Efficient short-length running convolution algorithm using filter banks (필터 뱅크를 사용한 효율적인 short-length running convolution 알고리즘)

  • Jang Young-Beom;Oh Se-Man;Lee Won-Sang
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.42 no.6
    • /
    • pp.187-194
    • /
    • 2005
  • In this paper, an efficient and fast algerian to reduce calculation amount of FIR(Finite Impulse Responses) filtering is proposed. Proposed algorithm enables arbitrary size of parallel processing, and their structures are also easily derived. Furthermore, it is shown that the number of multiplication/sample is reduced, and number of instructions using MAC(Multiplication and Accumulation) processor are also reduced. For theoretical improvement numbers of sub filters are compared with those of conventional algorithm. In addition to the theoretical improvement, it is shown that number of element for hardwired implementation are reduced comparison to those of the conventional algorithm.

Implementation of Ray Tracing Processor for the Parallel Processing (병렬처리를 위한 고속 Ray Tracing 프로세서의 설계)

  • Choe, Gyu-Yeol;Jeong, Deok-Jin
    • The Transactions of the Korean Institute of Electrical Engineers A
    • /
    • v.48 no.5
    • /
    • pp.636-642
    • /
    • 1999
  • The synthesis of the 3D images is the most important part of the virtual reality. The ray tracing is the best method for reality in the 3D graphics. But the ray tracing requires long computation time for the synthesis of the 3D images. So, we implement the ray tracing with software and hardware. Specially we design the hit-test unit with FPGA tool for the ray tracing. Hit-test unit is a very important part of ray tracing to improve the speed. In this paper, we proposed a new hit-test algorithm and apply the parallel architecture for hit-test unit to improve the speed. We optimized the arithmetic unit because the critical path of hit-test unit is in the multiplication part. We used the booth algorithm and the baugh-wooley algorithm to reduce the partial product and adapted the CSA and CLA to improve the efficiency of the partial product addition. Our new Ray tracing processor can produce the image about 512ms/F and can be adapted to real-time application with only 10 parallel processors.

  • PDF

A Concurrent Incremental Evaluation Technique Using Multitasking (멀티태스킹에 의한 병행 점진 평가 방법)

  • Han, Jung-Lan
    • The KIPS Transactions:PartA
    • /
    • v.17A no.2
    • /
    • pp.73-80
    • /
    • 2010
  • As the power of hardware has improved, there have been numerous researches in processing concurrently using multitasking method. The incremental evaluation is the evaluation method of reevaluating only affected parts instead of reevaluating overall program when the program has been changed. It is necessary to do more studies that improve the efficiency of concurrent incremental evaluation to do multitasking using multi-threading of Java not to do in parallel using multiprocessor. In this paper, the dependency in the dependency chart is based on the attribute that describes the real value of the variable that directly affects the semantics, thereby doing efficient evaluation. So using the dependency, this paper presents the concurrent incremental evaluation algorithm for Java Languages and proves its correctness, analyzing the efficiency of concurrent incremental evaluation by the simulation.

Evaluation of Alignment Methods for Genomic Analysis in HPC Environment (HPC 환경의 대용량 유전체 분석을 위한 염기서열정렬 성능평가)

  • Lim, Myungeun;Jung, Ho-Youl;Kim, Minho;Choi, Jae-Hun;Park, Soojun;Choi, Wan;Lee, Kyu-Chul
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.2 no.2
    • /
    • pp.107-112
    • /
    • 2013
  • With the progress of NGS technologies, large genome data have been exploded recently. To analyze such data effectively, the assistance of HPC technique is necessary. In this paper, we organized a genome analysis pipeline to call SNP from NGS data. To organize the pipeline efficiently under HPC environment, we analyzed the CPU utilization pattern of each pipeline steps. We found that sequence alignment is computing centric and suitable for parallelization. We also analyzed the performance of parallel open source alignment tools and found that alignment method utilizing many-core processor can improve the performance of genome analysis pipeline.

An Aggressive Register Allocation Algorithm for EPIC Architectures (EPIC 아키텍쳐를 위한 적극적 레지스터 할당 알고리듬)

  • Choe, Jun-Gi;Lee, Sang-Jeong
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.2
    • /
    • pp.497-511
    • /
    • 1999
  • Recently, many parallel processing technologies were developed, ILP(Instruction level Parallelism) processor's performance have been growed very rapidly. especially, EPIC(Explicitly Parallel Instruction computing) architectures attempt to enhance the performance in the predicated execution and speculative execution with the hardware. In this paper to improve the code scheduling possibility by applying to the characteristics of EPIC architectures, a new register allocation algorithm is proposed. And we proves that proposed register allocation algorithm is more efficient scheme than the conventional scheme when predicated execution is applied to our scheme by experiments. In experimental results, it shows much more performance enhancement, about 19% in proposed scheme than the conventional scheme. So, our scheme is verified that it is an effective register allocation method.

  • PDF

Large Eddy Simulation of Turbulent Flows over Backward-facing Steps (후향 계단에서 난류 유동에 대한 대와동모사)

  • Hwang, Cheol-Hong;Kum, Sung-Min
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.10 no.3
    • /
    • pp.507-514
    • /
    • 2009
  • Large eddy simulation code was developed to predict the turbulent flows over backward-facing steps including a recirculating flow phenomena. Localized dynamic ksgs-equation model was employed as a LES subgrid model and the LES solver was implemented on parallel computer consisting of 16 processors to reduce computational costs. The results of laminar flow showed qualitative and quantitative agreements between current simulations and experimental results availablein literatures. The simulation of the turbulent flows also yielded reasonable results. From these results, it can be expected that developed LES code will be very useful to analyze the combustion in stabilities and noise of a practical combustor in the future.