• Title/Summary/Keyword: Parallel Implementation

Search Result 883, Processing Time 0.024 seconds

Implementation and Performance Evaluation of an Object-Oriented Parallel Programming Environment with Multithreaded Computational Model (다중스레드 계산 모델을 이용한 병렬 객체 지향 프로그래밍 환경의 구현 및 성능 평가)

  • Song, Jong-Hun;Kim, Heung-Hwan;Han, Sang-Yeong
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.5 no.6
    • /
    • pp.708-718
    • /
    • 1999
  • 본 논문에서 제안하는 시스템은 일반적인 병렬 시스템의 하드웨어 구조에서, 다중 스레드 계산 모델을 이용하여 객체 지향 프로그래밍 환경을 구현한 시스템이다. 제안하는 시스템을 효과적으로 구현하기 위하여 컴파일러와 실행 시간 시스템의 측면에서 여러 가지 기법을 제시한다. 컴파일러의 측면에서는 멤버 변수의 접근 분석, 메소드의 병렬성 분석 기법을 제시하고, 실행 시간 시스템에서는 실시간 스레드/메시지 결합, 프레임 공유 기법을 제시한다. 본 논문에서 제안된 프로그래밍 환경은, MPI 메시지 인터페이스를 이용하여 구현하였으며, 벤치마크 프로그램을 실행함으로써 성능 분석을 하였다. 분석의 결과는 실행시간 시스템의 여러 가지 기법들이 성능 향상에 많은 효과가 있음을 보여주며, 이러한 결과는 일반적인 병렬 시스템에서도 적용 가능하다.Abstract In this paper, we suggest an object-oriented programming environment with multithreaded computation model on general parallel processing systems. We developed many methods for our environment to be efficient : in compiler, the analysis of member variable and method parallelism, and in runtime system, thread/message merging and frame sharing. The programming environment is implemented with MPI message interface, and its performance is analyzed with executing benchmark programs. The results show that the developed methods have influence on performance improvement, and this improvement can be applied to general parallel processing systems.

Design and Implementation of a TMN Agent Platform based on a Multi-thread Parallel Processing Architecture (멀티쓰레드 기반 병렬처리 구조를 이용한 TMN 에이젼트 플랫폼 설계 및 구현)

  • Kim, Seong-U;Kim, Yeong-Tak
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.5 no.6
    • /
    • pp.793-800
    • /
    • 1999
  • TMN Agent Platform은 망 요소의 운영상태와 자원들을 GDMO에 따라 관리객체(Managed Object : MO)로 모델링 하고, 자원들의 현재 상태를 유지하며, 관리자(Manager)로부터의 망 관리 기능 요구에 따라 조작된다. 그러므로, 에이전트의 성능향상은 전체적인 통신망 관리의 성능향상에 직접적인 영향을 미친다.본 논문에서는 TMN 에이전트의 기능요구 사항을 분석하고, 이를 토대로 성능향상을 위해 멀티스레드 기법을 사용하는 병렬 처리 구조의 TMN Agent Platform의 기능구조를 제시한다. 또한 에이전트와 다양한 자원들간의 효율적인 메시지전달을 위한 체계를 제시하며, 구현된 TMN Agent Platform의 성능을 분석한다.Abstract TMN Agent manages the operational status and real-resources of network elements, such as switching nodes and transmission systems. It performs the requested management functions from manager and maintains consistent status data of real-resource. The performance of agent system affects directly the performance of network management operation. If the agent is implemented by sequential processing scheme with single process, the agent processing can be delayed or blocked according to the status of real-resources. This problem can be solved by parallel and distributed processing scheme.To improve the processing performance of TMN Agent, we propose a TMN Agent Platform's functional architecture that is based on parallel processing with multi-tread and effective message transferring scheme between agent and various real-resource. We analyze the performance of the implemented TMN Agent Platform.

Design and Parallel Operation of 30 kW SiC MOSFET-Based High Frequency Switching LLC Converter With a Wide Voltage Range for EV Fast Charger (전기자동차 급속충전기용 넓은 전압 범위를 갖는 30kW급 SiC MOSFET 기반 고속 스위칭 LLC 컨버터 설계 및 병렬 운전)

  • Lee, Gi-Young;Min, Sung-Soo;Park, Su-Seong;Cho, Young-Chan;Lee, Sang-Taek;Kim, Rae-Young
    • The Transactions of the Korean Institute of Power Electronics
    • /
    • v.27 no.2
    • /
    • pp.165-173
    • /
    • 2022
  • The electrification trend of mobility increases every year due to the development of power semiconductor and battery technology. Accordingly, the development and distribution of fast chargers for electric vehicles (EVs) are in demand. In this study, we propose a design and implementation method of an LLC converter for fast chargers. Two 15 kW LLC converters are configured in parallel to have 30 kW rated output power, and the control algorithm and driving sequence are designed accordingly and verified. In addition, the improved power conversion efficiency is confirmed through zero-voltage switching (ZVS) of the LLC converter and reduction of turn-off loss through snubber capacitors. The implemented 30 kW LLC converters show a wide output voltage range of 200-950 V. Experiments applying various load conditions verify the converter performance.

Implementation of Euclidean Calculation Circuit with Two-Way Addressing Method for Reed-Solomon Decoder (Reed-Solomon decoder를 위한 Two-way addressing 방식의 Euclid 계산용 회로설계)

  • Ryu, Jee-Ho;Lee, Seung-Jun
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.36C no.6
    • /
    • pp.37-43
    • /
    • 1999
  • Two-way addressing method has been proposed for efficient VLSI implementation of Euclidean calculation circuit for pipelined Reed-Solomon decoder. This new circuit is operating with single clock while exploiting maximum parallelism, and uses register addressing instead of register shifting to minimize the switching power. Logic synthesis shows the circuit with the new scheme takes 3,000 logic gates, which is about 40% reduction from the previous 5,000 gate implementation. Computer simulation also shows the power consumption is about 3mW. The previous implementation with multiple clock consumed about 5mW.

  • PDF

FPGA-Based Hardware Accelerator for Feature Extraction in Automatic Speech Recognition

  • Choo, Chang;Chang, Young-Uk;Moon, Il-Young
    • Journal of information and communication convergence engineering
    • /
    • v.13 no.3
    • /
    • pp.145-151
    • /
    • 2015
  • We describe in this paper a hardware-based improvement scheme of a real-time automatic speech recognition (ASR) system with respect to speed by designing a parallel feature extraction algorithm on a Field-Programmable Gate Array (FPGA). A computationally intensive block in the algorithm is identified implemented in hardware logic on the FPGA. One such block is mel-frequency cepstrum coefficient (MFCC) algorithm used for feature extraction process. We demonstrate that the FPGA platform may perform efficient feature extraction computation in the speech recognition system as compared to the generalpurpose CPU including the ARM processor. The Xilinx Zynq-7000 System on Chip (SoC) platform is used for the MFCC implementation. From this implementation described in this paper, we confirmed that the FPGA platform is approximately 500× faster than a sequential CPU implementation and 60× faster than a sequential ARM implementation. We thus verified that a parallelized and optimized MFCC architecture on the FPGA platform may significantly improve the execution time of an ASR system, compared to the CPU and ARM platforms.

On the Design Technique and VLSI Structure for a Multiplierless Quincuncial Interpolation Filter (무곱셈 대각 보간 필터의 설계 및 VLSI 구현에 관한 연구)

  • 최진우;이상욱
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.29B no.8
    • /
    • pp.54-65
    • /
    • 1992
  • A huge amount of multiplications is required for 2-D filtering on the image data, making it difficult to implement a real-time quincuncial interpolator. In this paper, efficient design technique and VLSI structures for 2-D multipleierless filter are presented. In the filter design, by introducing an efficient scheme for discretizing the frequency response of the prototype filter, it is shown that a significant amount of the computational burden required in the conventional techniques, such as local search, branch and bound techniques, could be saved. In the case of 5$\times$5 filter, it is found that the design technique described in this paper could save about 80% of the computation time, compared to the conventional methods, while providing a comparable performance. For a hardware implementation, two different VLSI structures for 2-D multiplierless filter are also introduced in the paper : One is for block parallel processing and the other for scan-line parallel processing. In both structure, the AP(area-period) figure improves over Wu's structure[4].

  • PDF

An Alternating Implicit Block Overlapped FDTD (AIBO-FDTD) Method and Its Parallel Implementation

  • Pongpaibool, Pornanong;Kamo, Atsushi;Watanabe, Takayuki;Asai, Hideki
    • Proceedings of the IEEK Conference
    • /
    • 2002.07a
    • /
    • pp.137-140
    • /
    • 2002
  • In this paper, a new algorithm for two-dimensional (2-D) finite-difference time-domain (FDTD) method is presented. By this new method, the maximum time step size can be increased over the Courant-Friedrich-Levy (CFL) condition restraint. This new algorithm is adapted from an Alternating-Direction Implicit FDTD (ADI-FDTD) method. However, unlike the ADI-FDTD algorithm. the alternation is performed with respect to the blocks of fields rather than with respect to each respective coordinate direction. Moreover. this method can be efficiently simulated with parallel computation. and it is more efficient than the conventional FDTD method in terms of CPU time. Numerical formulations are shown and simulation results are presented to demonstrate the effectiveness and efficiency of our proposed method.

  • PDF

Weighted Least-Squares Design and Parallel Implementation of Variable FIR Filters

  • Deng, Tian-Bo
    • Proceedings of the IEEK Conference
    • /
    • 2002.07a
    • /
    • pp.686-689
    • /
    • 2002
  • This paper proposes a weighted least-squares(WLS) method for designing variable one-dimensional (1-D) FIR digital filters with simultaneously variable magnitude and variable non-integer phase-delay responses. First, the coefficients of a variable FIR filter are represented as the two-dimensional (2-D) polynomials of a pair of spectral parameters: one is for tuning the magnitude response, and the other is for varying its non-integer phase-delay response. Then the optimal coefficients of the 2-D polynomials are found by minimizing the total weighted squared error of the variable frequency response. Finally, we show that the resulting variable FIR filter can be implemented in a parallel form, which is suitable for high-speed signal processing.

  • PDF

Block-Based Predictive Watershed Transform for Parallel Video Segmentation

  • Jang, Jung-Whan;Lee, Hyuk-Jae
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.12 no.2
    • /
    • pp.175-185
    • /
    • 2012
  • Predictive watershed transform is a popular object segmentation algorithm which achieves a speed-up by identifying image regions that are different from the previous frame and performing object segmentation only for those regions. However, incorrect segmentation is often generated by the predictive watershed transform which uses only local information in merge-split decision on boundary regions. This paper improves the predictive watershed transform to increase the accuracy of segmentation results by using the additional information about the root of boundary regions. Furthermore, the proposed algorithm is processed in a block-based manner such that an image frame is decomposed into blocks and each block is processed independently of the other blocks. The block-based approach makes it easy to implement the algorithm in hardware and also permits an extension for parallel execution. Experimental results show that the proposed watershed transform produces more accurate segmentation results than the predictive watershed transform.

The design and implementation of an universal interface with serial and parallel formats for DTV transport stream (DTV 트랜스포트 스트림용 만능 직.병렬 인터페이스 설계 및 구현)

  • 유종언;장용석;고영욱;김대진;김은도
    • Proceedings of the IEEK Conference
    • /
    • 2001.09a
    • /
    • pp.323-326
    • /
    • 2001
  • DTV 방송 신호를 수신하거나 송신하는 장비의 경우 대부분 한두 가지 인터페이스 방식을 이용하여 서로 통신을 하고 있다. 따라서 서로 다른 인터페이스 포맷을 사용하여 스트림을 전송하는 경우 기존의 장비를 사용하지 못하는 경우가 많이 있다. 본 논문에서는 이런 장비들 사이에서 주고받는 스트림의 포맷을 자유로이 연결 가능하도록 해주는 인터페이스를 설계 및 구현하였다. 본 논문에서 구현한 인터페이스는 스트림 자체 내용은 변경하지 않고, 송·수신하기 위한 인터페이스 규격에 스트림을 적용하여 자유로이 송·수신할 수 있도록 하였다. 구현한 인터페이스 규격은 SMPTE 310M, ASI(Asynchronous Serial Inerface), SPI (Synchronous Parallel Interface)와 셋탑박스에서 사용하는 TS(Transport Stream)의 네 가지로 서로간에 송·수신 가능하도록 매트릭스 형태를 취하고 있다. 주요 블록은 YHDL 코딩을 이용하여 설계를 하였으며, FPGA(EPF10K10T144)를 사용하였다.

  • PDF