• Title/Summary/Keyword: 병렬화

Search Result 1,284, Processing Time 0.033 seconds

A Device of Parallelism Control in POSIX Based Parallelization of Recursive Algorithms (POSIX스레드에 의한 재귀적 알고리즘의 병렬화에서 병렬성 제어 방안)

  • Lee, Hyung-Bong;Baek, Chung-Ho
    • The KIPS Transactions:PartA
    • /
    • v.9A no.2
    • /
    • pp.249-258
    • /
    • 2002
  • One of the jai or purposes of multiprocessor system is to get a high efficiency in performance improvement. But in most cases, it is unavoidable to use some special programming languages or tools for full use of multiprocessor system. In general, loop and recursive call statements of algorithms are considered as typical parts for parallelization. Especially, recursive call statements are easy to parallelize conceptually without support of any special languages or tools. But it is difficult to control the degree of parallelism caused by high depth of recursive call leading to execution crash. This paper proposes a device to control Parallelism in the process of POSIX thread bated parallelization of recursive algorithms. For this, we define the concept of thread and process in UNIX system, and analyze the results of experimental application of the device to quick sorting algorithm.

Parallelization of Feature Detection and Panorama Image Generation using OpenCL and Embedded GPU (OpenCL 및 Embedded GPU를 이용한 영상 특징 추출 및 파노라마 영상 생성의 병렬화)

  • Kang, Seung Heon;Lee, Seung-Jae;Lee, Man Hee;Park, In Kyu
    • Journal of Broadcast Engineering
    • /
    • v.19 no.3
    • /
    • pp.316-328
    • /
    • 2014
  • In this paper, we parallelize the popular feature detection algorithms, i.e. SIFT and SURF, and its application to fast panoramic image generation on the latest embedded GPU. Parallelized algorithms are implemented using recently developed OpenCL as the embedded GPGPU software platform. We compare the implementation efficiency and speed performance of conventional OpenGL Shading Language and OpenCL. Experimental result shows that implementation on OpenCL has comparable performance with GLSL. Compared with the performance on the embedded CPU in the same application processor, the embedded GPU runs 3~4 times faster. As an example of using feature extraction, panorama image synthesis is performed on embedded GPU by applying image matching using detected features.

Parallelized Matrix Operation for Fast Computations of Antenna Characteristics (안테나 특성 고속 계산을 위한 병렬화 행렬 연산)

  • Cho, Yong-Heui
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2015.05a
    • /
    • pp.61-62
    • /
    • 2015
  • 밀리미터파 대역에서 사용하는 대형 안테나 해석 속도를 개선하기 위한 병렬형 행렬 연산법을 제안한다. 기존의 가우스 소거법을 병렬화하기 위해 행렬 분해와 반복법을 이용한다. 또한, 반복법의 수렴성을 높이기 위해 이전 행렬해를 부분적으로 사용하여 분해 행렬을 구성하는 방식도 제시한다. 본 제안법은 OpenMP, MPI, CUDA 등의 병렬법과 함께 사용할 수 있다.

  • PDF

Data Dependency Elimination for Parallelism in nested Loops (중첩루프에서 병렬화를 위한 자료 종속성제거)

  • Song, Wol-Bong;Park, Du-Sun
    • The Transactions of the Korea Information Processing Society
    • /
    • v.5 no.6
    • /
    • pp.1494-1506
    • /
    • 1998
  • 본 논문에서는 루프구조의 효율적인 병렬수행을 위한 병렬성 추출에 대하여 불변과 가변 종속거리에 모두적용할 수 있는 통합된 새로운 기법을 제시한다. 이것은 컴파일시간에 순차 루프를 중첩된 DOALL 루프로의 자동 변환에 대한 절차로서, 중첩 루프의 전체적인 병렬화를 하기 위하여 문장들을 반복적으로 수행시키는 것에 의해서 자료 종속을 효과적으로 제거하는 알고리즘이다. 본 논문에 제시된 방법은 성능평가에서도 매우 뛰어난 방법임을 보였다.

  • PDF

A Parallel Processor System for Cultural Assets Image Retrieval (문화재 검색을 위한 병렬처리기 구조)

  • Yoon, Hee-Jun;Lee, Hyung;Han, Ki-Sun;Partk, Jong-Won
    • Journal of Korea Multimedia Society
    • /
    • v.1 no.2
    • /
    • pp.154-161
    • /
    • 1998
  • This paper proposes a parallel processor system which processes cultural assets image recognition and retrieval algorithm in real time. A serial algorithm which is developed for the parallel processor system is parallellized. The parallel processor system consists of a control unit, 100 PE(Processing Elements), and 10 Park's multi-access memory systems which has 11 memory modules per each one. The parallel processor system is simulated by CADENCE Verilog-XL which is a package for the hardware simulation. With the same simulated results as that of the serial algorithm, the speed ratio of the parallel algorithm to the serial one is 81. The parallel processor system we proposed is quite effective for cultural assets image processing.

  • PDF

Computation-Communication Overlapping in AES-CCM Using Thread-Level Parallelism on a Multi-Core Processor (멀티코어 프로세서의 쓰레드-수준 병렬성을 활용한 AES-CCM 계산-통신 중첩화)

  • Lee, Eun-Ji;Lee, Sung-Ju;Chung, Yong-Wha;Lee, Myung-Ho;Min, Byoung-Ki
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.8
    • /
    • pp.863-867
    • /
    • 2010
  • Multi-core processors are becoming increasingly popular. As they are widely adopted in embedded systems as well as desktop PC's, many multimedia applications are being parallelized on multi-core platforms. However, it is difficult to parallelize applications with inherent data dependencies such as encryption algorithms for multimedia data. In order to overcome this limit, we propose a technique to overlap computation and communication using an otherwise idle core in this paper. In particular, we interpret the problem of multimedia computation and communication as a pipeline design problem at the application program level, and derive an optimal number of stages in the pipeline.

An Efficient Parallelization Implementation of PU-level ME for Fast HEVC Encoding (고속 HEVC 부호화를 위한 효율적인 PU레벨 움직임예측 병렬화 구현)

  • Park, Soobin;Choi, Kiho;Park, Sang-Hyo;Jang, Euee Seon
    • Journal of Broadcast Engineering
    • /
    • v.18 no.2
    • /
    • pp.178-184
    • /
    • 2013
  • In this paper, we propose an efficient parallelization technique of PU-level motion estimation (ME) in the next generation video coding standard, high efficiency video coding (HEVC) to reduce the time complexity of video encoding. It is difficult to encode video in real-time because ME has significant complexity (i.e., 80 percent at the encoder). In order to solve this problem, various techniques have been studied, and among them is the parallelization, which is carefully concerned in algorithm-level ME design. In this regard, merge estimation method using merge estimation region (MER) that enables ME to be designed in parallel has been proposed; but, parallel ME based on MER has still unconsidered problems to be implemented ideally in HEVC test model (HM). Therefore, we propose two strategies to implement stable parallel ME using MER in HM. Through experimental results, the excellence of our proposed methods is shown; the encoding time using the proposed method is reduced by 25.64 percent on average of that of HM which uses sequential ME.

Study of parallelization methods for real-time HEVC encoder implementation (실시간 HEVC 인코더 구현을 위한 병렬화 기법에 관한 연구)

  • Ahn, Yongjo;Hwang, Taejin;Lee, Dongkyu;Kim, Sangmin;Oh, Seoung-Jun;Sim, Dong-Gyu
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2013.06a
    • /
    • pp.119-122
    • /
    • 2013
  • ITU-T VCEG 과 ISO/IEC MPEG 이 공동으로 구성한 JCT-VC (Joint Collaborative Team on Video Coding)이 표준화를 진행 중인 HEVC (High Efficiency Video Coding)은 H.264/AVC 대비 약 2 배의 압축효율을 갖는다. 하지만, 계층적 구조를 갖는 가변크기 블록의 사용과 재귀적 부호화 구조에 따른 인코더의 복잡도 증가는 개선해야 할 문제점으로 지적되고 있다. 본 논문에서는 현재 표준화가 진행 중인 HEVC 인코더의 실시간 구현을 위한 SIMD 명령어를 이용한 data-level 병렬화 기법, CPU 및 GPU 를 이용한 multi-threading 기법과 같은 다양한 병렬화 기법을 소개한다. 또한, 이러한 병렬화 기법들을 HEVC 인코더에 적용하기 위해 적합한 연산 및 기능 모듈에 대하여 소개한다. 본 연구를 통하여 HM (HEVC reference model)에 적용한 결과 $832{\times}480$ 영상의 경우 20-30fps 의 부호화 속도를 나타냈으며, $1920{\times}1080$ 영상의 경우 5-10fps 의 부호화 속도를 나타내었다.

  • PDF

A Performance Evaluation on Parallel Sorting Algorithm in Multicore Environment (멀티 코어 환경에서 병렬 정렬 알고리즘 성능 평가)

  • Won, Jong-Min;Joo, Young-Hyun;Eom, Young-Ik
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2012.06a
    • /
    • pp.33-35
    • /
    • 2012
  • 개인용 컴퓨터가 보급된 후 오랫동안 CPU의 발전은 주로 클럭 스피드를 통해서 이루어져 왔다. 하지만 최근 들어서는 CPU 내에서 동작하는 코어의 수를 늘리는 방법을 통해 CPU의 성능 향상이 이루어지고 있다. 이렇게 멀티코어 환경의 시대가 도래함에 따라 CPU를 완전하게 이용하기 위해 기존 알고리즘들의 병렬화가 필요로 하게 되었다. 본 논문에서는 가장 많이 사용되는 알고리즘의 종류 중 하나인 정렬 알고리즘을 병렬화하여 멀티 코어 환경에서의 성능을 평가한다. 이는 기존의 단일 스레드 정렬 알고리즘들에 대해 알려진 바와는 다른 경향을 보이며 이러한 현상은 CPU의 병렬화가 진행될수록 더욱 심화될 것으로 예상된다.

Parallelism for Single Loops with Non-uniform Dependences (비균일 단일루프에서의 병렬화)

  • Jeong, Sam-Jin
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2006.11a
    • /
    • pp.565-569
    • /
    • 2006
  • This paper reviews some loop partitioning techniques such as loop splitting method by thresholds and Polychronopoulos' loop splitting method for exploiting parallelism from single loop which already developed. We propose improved loop splitting method for maximizing parallelism of single loops with non-constant dependence distances. By using the distance for the source of the first dependence, and by our defined theorems, we present generalized and optimal algorithms for single loops with non-uniform dependences. The algorithms generalize how to transform general single loops into parallel loops.

  • PDF