• 제목/요약/키워드: parallel architectures

검색결과 127건 처리시간 0.022초

Discrete Cosine Transform Algorithms for the VLSI Parallel Implementation (VLSI 병렬 연산을 위한 여현 변환 알고리듬)

  • 조남익;이상욱
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • 제25권7호
    • /
    • pp.851-858
    • /
    • 1988
  • In this paper, we propose two different VLSI architectures for the parallel computation of DCT (discrete cosine transform) algorithm. First, it is shown that the DCT algorithm can be implemented on the existing systolic architecture for the DFT(discrete fourier transform) by introducing some modification. Secondly, a new prime factor DCT algorithm based on the prime factor DFT algorithm is proposed. And it is shown that the proposed algorihtm can be implemented in parallel on the systolic architecture for the prime factor DFT. However, proposed algorithm is only applicable to the data length which can be decomposed into relatively prime and odd numbers. It is also found that the proposed systolic architecture requires less multipliers than the structures implementing FDCT(fast DCT) algorithms directly.

  • PDF

Full Search Equivalent Motion Estimation Algorithm for General-Purpose Multi-Core Architectures

  • Park, Chun-Su
    • Journal of the Semiconductor & Display Technology
    • /
    • 제12권3호
    • /
    • pp.13-18
    • /
    • 2013
  • Motion estimation is a key technique of modern video processing that significantly improves the coding efficiency significantly by exploiting the temporal redundancy between successive frames. Thread-level parallelism is a promising method to accelerate the motion estimation process for multithreading general-purpose processors. In this paper, we propose a parallel motion estimation algorithm which parallelizes the motion search process of the current H.264/AVC encoder. The proposed algorithm is implemented using the OpenMP application programming interface (API) and can be easily integrated into the current encoder. The experimental results show that the proposed parallel algorithm can reduce the processing time of the motion estimation up to 65.08% without any penalty in the rate-distortion (RD) performance.

Design and Implementation of a Massively Parallel Multithreaded Architecture: DAVRID

  • Sangho Ha;Kim, Junghwan;Park, Eunha;Yoonhee Hah;Sangyong Han;Daejoon Hwang;Kim, Heunghwan;Seungho Cho
    • Journal of Electrical Engineering and information Science
    • /
    • 제1권2호
    • /
    • pp.15-26
    • /
    • 1996
  • MPAs(Massively Parallel Architectures) should address two fundamental issues for scalability: synchronization and communication latency. Dataflow architecture faces problems of excessive synchronization overhead and inefficient execution of sequential programs while they offer the ability to exploit massive parallelism inherent in programs. In contrast, MPAs based on von Neumann computational model may suffer from inefficient synchronization mechanism and communication latency. DAVRID (DAtaflow/Von Neumann RISC hybrID) is a massively parallel multithreaded architecture which takes advantages of von Neumann and dataflow models. It has good single thread performance as well as tolerates synchronization and communication latency. In this paper, we describe the DAVRID architecture in detail and evaluate its performance through simulation runs over several benchmarks.

  • PDF

Recent Successive Cancellation Decoding Methods for Polar Codes

  • Choi, Soyeon;Lee, Youngjoo;Yoo, Hoyoung
    • Journal of Semiconductor Engineering
    • /
    • 제1권2호
    • /
    • pp.74-80
    • /
    • 2020
  • Due to its superior error correcting performance with affordable hardware complexity, the Polar code becomes one of the most important error correction codes (ECCs) and now intensively examined to check its applicability in various fields. However, Successive Cancellation (SC) decoding that brings the advanced Successive Cancellation List (SCL) decoding suffers from the long latency due to the nature of serial processing limiting the practical implementation. To mitigate this problem, many decoding architectures, mainly divided into pruning and parallel decoding, are presented in previous manuscripts. In this paper, we compare the recent SC decoding architectures and analyze them using a tree structure.

Heterogeneous Parallel Architecture for Face Detection Enhancement

  • Albssami, Aishah;Sharaf, Sanaa
    • International Journal of Computer Science & Network Security
    • /
    • 제22권2호
    • /
    • pp.193-198
    • /
    • 2022
  • Face Detection is one of the most important aspects of image processing, it considers a time-consuming problem in real-time applications such as surveillance systems, face recognition systems, attendance system and many. At present, commodity hardware is getting more and more heterogeneity in terms of architectures such as GPU and MIC co-processors. Utilizing those co-processors along with the existing traditional CPUs gives the algorithm a better chance to make use of both architectures to achieve faster implementations. This paper presents a hybrid implementation of the face detection based on the local binary pattern (LBP) algorithm that is deployed on both traditional CPU and MIC co-processor to enhance the speed of the LBP algorithm. The experimental results show that the proposed implementation achieved improvement in speed by 3X when compared to a single architecture individually.

Performance Analysis of a Parallel Mesh Smoothing Algorithm using Graph Coloring and OpenMP (그래프 컬러링과 OpenMP를 이용한 병렬 메쉬 스무딩 알고리즘의 성능 분석)

  • Shin, Myeonggyu;Kim, Jibum
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • 제53권6호
    • /
    • pp.80-87
    • /
    • 2016
  • We propose a parallel mesh smoothing algorithm using graph coloring and OpenMP library for shared memory many core computer architectures. The proposed algorithm partitions a mesh into independent sets and performs a parallel mesh smoothing using OpenMP library. We study the effect of using various graph coloring and color reordering algorithms on the efficiency of performing the proposed parallel mesh smoothing algorithm. We also investigate the influence of using various OpenMP loop scheduling methods on the parallel mesh smoothing efficiency.

Design of Parallel Algorithms for Conventional Matched-Field Processing over Array of DSP Processors (다중 DSP 프로세서 기반의 병렬 수중정합장처리 알고리즘 설계)

  • Kim, Keon-Wook
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • 제44권4호통권316호
    • /
    • pp.101-108
    • /
    • 2007
  • Parallel processing algorithms, coupled with advanced networking and distributed computing architectures, improve the overall computational performance, dependability, and versatility of a digital signal processing system In this paper, novel parallel algorithms are introduced and investigated for advanced sonar algorithm, conventional matched-field processing (CMFP). Based on a specific domain, each parallel algorithm decomposes the sequential workload in order to obtain scalable parallel speedup. Depending on the processing requirement of the algorithm, the computational performance of the parallel algorithm reveals different characteristics. The high-complexity algorithm, CMFP shows scalable parallel performance on the array of DSP processors. The impact on parallel performance due to workload balancing, communication scheme, algorithm complexity, processor speed, network performance, and testbed configuration is explored.

Parallel Testing Circuits with Versatile Data Patterns for SOP Image SRAM Buffer (SOP Image SRAM Buffer용 다양한 데이터 패턴 병렬 테스트 회로)

  • Jeong, Kyu-Ho;You, Jae-Hee
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • 제46권9호
    • /
    • pp.14-24
    • /
    • 2009
  • Memory cell array and peripheral circuits are designed for system on panel style frame buffer. Moreover, a parallel test methodology to test multiple blocks of memory cells is proposed to overcome low yield of system on panel processing technologies. It is capable of faster fault detection compared to conventional memory tests and also applicable to the tests of various embedded memories and conventional SRAMs. The various patterns of conventional test vectors can be used to enhance fault coverage. The proposed testing method is also applicable to hierarchical bit line and divided word line, one of design trends of recent memory architectures.

Performance Analysis of the XMESH Topology for the Massively Parallel Computer Architecture (대규모 병렬컴퓨터를 위한 교차메쉬구조 및 그의 성능해석)

  • 김종진;최흥문
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • 제32B권5호
    • /
    • pp.720-729
    • /
    • 1995
  • We proposed a XMESH(crossed-mesh) topology as a suitable interconnection for the massively parallel computer architectures, and presented performance analysis of the proposed interconnection topology. Horizontally, the XMESH has the same links as those of the toroidal mesh(TMESH) or toroid, but vertically, it has diagonal cross links instead of the vertical links. It reveals desirable interconnection characteristics for the massively parallel computers as the number of nodes increases, while retaining the same structural advantages of the TMESH such as the symmetric structure, periodic placement of subsystems, and constant degree, which are highly recommended features for VLSI/WSI implementations. Furthermore, n*k XMESH can be easily expanded without increasing the diameter as long as n.leq.k.leq.n+4. Analytical performance evaluations show that the XMESH has a shorter diameter, a shorter mean internode distance, and a higher message completion rate than the TMESH or the diagonal mesh(DMESH). To confirm these results, an optimal self-routing algorithm for the proposed topology is developed and is used to simulate the average delay, the maximum delay, and the throughput in the presence of contention. In all cases, the XMESH is shown to outperform the TMESH and the DMESH regardless of the communication load conditions or the number of nodes of the networks, and can provide an attractive alternative to those networks in implementing massively parallel computers.

  • PDF

Technical Trend and View of Neural Networks for Factory Automation (공장 자동화에 적용되는 Neural Networks의 기술동향 및 전망)

  • Lee, Jin-Seop;Ha, Jae-Hun
    • Proceedings of the KIEE Conference
    • /
    • 대한전기학회 1991년도 하계학술대회 논문집
    • /
    • pp.892-895
    • /
    • 1991
  • In this study, it has been refering that disposal of rapidly international information society and artificial intelligence neural networks of the vanguard software technology. This paper is human brain cell structure modeling in order to neural networks realization for order language and computer embodiment of parallel processing. And it is shown that the usage extreme of time saving and correct judgement for business services, Overviews some of the currently popular neural networks architectures, and describes the current state of the neural networks technology.

  • PDF