• Title/Summary/Keyword: hardware optimization

Search Result 210, Processing Time 0.03 seconds

An Implementation of Turbo -Code Decoder using Posteriori Probability Optimization (사후확률 최적화를 이용한 터보코드 복호기 구현)

  • Noh Jin-Soo;Rhee Kang-Hyeon
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.43 no.4 s.310
    • /
    • pp.73-79
    • /
    • 2006
  • Due to the powerful correcting performance, turbo codes have been adopted in many communication standards such as W-CDMA(Wideband Code Division Multiple Access), CDMA2000, etc., and implemented by hardware in many kind of fields. Although several hardware structures and improved algorithm have been proposed, these problems such as hardware area, operating speed and power consumption are still a major issue to be solved in practical implementations. In this paper, we designed the turbo-code decoder using MAX -SCALE operation derived from the posterior probability optimization. The proposed circuit has been measured their performance on Matlab and MaxPlusII and implemented on the FPGA As a result, when implementing the proposed algorithm on the FPGA, this circuit only occupies 616 logic elements. And comparing the performance with the MAP(Maxirnum a Posteriori) decoding algorithm, the operating speed was increased by about 40%(56.48MHz) and BER(Bit Error Rate) was increased by 6.12.

Bus and Registor Optimization in Datapath Synthesis (데이터패스 합성에서의 버스와 레지스터의 최적화 기법)

  • Sin, Gwan-Ho;Lee, Geun-Man
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.8
    • /
    • pp.2196-2203
    • /
    • 1999
  • This paper describes the bus scheduling problem and register optimization method in datapath synthesis. Scheduling is process of operation allocation to control steps in order to minimize the cost function under the given circumstances. For that purpose, we propose some formulations to minimize the cost function for bus assignment to get an optimal and minimal cost function in hardware allocations. Especially, bus and register minimization technique are fully considered which are the essential topics in hardware allocation. Register scheduling is done after the operation and bus scheduling. Experiments are done with the DFG model of fifth-order digital ware filter to show its effectiveness. Structural integer programming formulations are used to solve the scheduling problems in order to get the optimal scheduling results in the integer linear programming environment.

  • PDF

Prediction of Protein Tertiary Structure Based on Optimization Design (최적설계 기법을 이용한 단백질 3차원 구조 예측)

  • Jeong Min-Joong;Lee Joon-Seong
    • Transactions of the Korean Society of Mechanical Engineers A
    • /
    • v.30 no.7 s.250
    • /
    • pp.841-848
    • /
    • 2006
  • Many researchers are developing computational prediction methods for protein tertiary structures to get much more information of protein. These methods are very attractive on the aspects of breaking technologies of computer hardware and simulation software. One of the computational methods for the prediction is a fragment assembly method which shows good ab initio predictions at several cases. There are many barriers, however, in conventional fragment assembly methods. Argues on protein energy functions and global optimization to predict the structures are in progress fer example. In this study, a new prediction method for protein structures is proposed. The proposed method mainly consists of two parts. The first one is a fragment assembly which uses very shot fragments of representative proteins and produces a prototype of a given sequence query of amino acids. The second one is a global optimization which folds the prototype and makes the only protein structure. The goodness of the proposed method is shown through numerical experiments.

Automatic Optimization Methods for Image Processing Programs Using OpenCL (OpenCL을 이용한 이미지 처리 프로그램의 자동 최적화 방법)

  • Shin, Jaeho;Jo, Gangwon;Lee, Ilkoo;Lee, Jaejin
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.3
    • /
    • pp.188-193
    • /
    • 2017
  • In this paper, we propose automatic OpenCL optimization techniques that offer the best performance for image processing programs on any hardware system. Developers should seek a proper way of parallelization and an appropriate work-group size for the architecture of target compute devices to achieve the best performance. However, testing potential devices to find them is both time-consuming and costly. Our techniques automatically set up hardware-optimized parallelization and find a suitable work-group size for the target device. Furthermore, using OpenCL does not always provide better performance in image processing. Hence, we also propose a way to automatically search for a threshold image size to allow image processing programs to decide whether or not to use OpenCL. Our findings demonstrate that out techniques improve the image processing performance significantly.

Toward Optimal FPGA Implementation of Deep Convolutional Neural Networks for Handwritten Hangul Character Recognition

  • Park, Hanwool;Yoo, Yechan;Park, Yoonjin;Lee, Changdae;Lee, Hakkyung;Kim, Injung;Yi, Kang
    • Journal of Computing Science and Engineering
    • /
    • v.12 no.1
    • /
    • pp.24-35
    • /
    • 2018
  • Deep convolutional neural network (DCNN) is an advanced technology in image recognition. Because of extreme computing resource requirements, DCNN implementation with software alone cannot achieve real-time requirement. Therefore, the need to implement DCNN accelerator hardware is increasing. In this paper, we present a field programmable gate array (FPGA)-based hardware accelerator design of DCNN targeting handwritten Hangul character recognition application. Also, we present design optimization techniques in SDAccel environments for searching the optimal FPGA design space. The techniques we used include memory access optimization and computing unit parallelism, and data conversion. We achieved about 11.19 ms recognition time per character with Xilinx FPGA accelerator. Our design optimization was performed with Xilinx HLS and SDAccel environment targeting Kintex XCKU115 FPGA from Xilinx. Our design outperforms CPU in terms of energy efficiency (the number of samples per unit energy) by 5.88 times, and GPGPU in terms of energy efficiency by 5 times. We expect the research results will be an alternative to GPGPU solution for real-time applications, especially in data centers or server farms where energy consumption is a critical problem.

Optimizing Boot Stage of Linux for Low-power ARM Embedded Devices (리눅스기반 저전력 ARM 임베디드 장비의 부팅과정 최적화)

  • Kim, Jongseok;Yang, Jinyoung;Kim, Daeyoung
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2013.10a
    • /
    • pp.137-140
    • /
    • 2013
  • Conventionally embedded devices used simple operating system (OS); however, the number of embedded devices using Linux as OS is increasing to keep up with hardware's performance improvement and customer's various needs. While embedded devices using Linux can take advantage of expandability, generality, portability, Linux's flexibility nature may cause undesirable overheads because of its increased complexity. One such overhead makes boot stage optimization essential in most embedded systems, where many features are redundant and possible to be removed or reconfigured. This paper applies well-known software optimization technique for Linux's boot stage to an CLM9722 DTK, measures the results, and studies about limitation of such techniques from hardware dependancy on the standard framework of Linux. The booting time from power-on until completion were decreased by 33% approximately.

  • PDF

The Architecture of Intra-prediction & DCTQ Hardware for H.264 Encoder (H.264 부호화기를 위한 Intra-prediction & DCTQ Hardware 구조)

  • Suh, Ki-Bum
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.47 no.5
    • /
    • pp.1-9
    • /
    • 2010
  • In this paper, the novel architecture of Intra-prediction & DCTQ hardware, which can process for the Full HD image($1980{\times}1088$@30fps) in realtime, is proposed. The cycle optimization method for the overall cycle of prediction, transform, scaling, descaling, and reconstruction is proposed. To reduce the cycle in the $4{\times}4$ prediction, the quantization process is performed during the prediction cycle and pre-selection of 2 modes among the 9 modes is performed to reduce the hardware area. To reduce the hardware of $16{\times}16$ and $8{\times}8$ prediction, the sharing logic between 2 prediction is utilized. The proposed architecture can process the 30frame/sec of full HD image in 108 MHz clock and operate 425 cycle for one macroblock.

Design of Evolvable Hardware based on Genetic Algorithm Processor(GAP)

  • Sim Kwee-Bo;Harashiam Fumio
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.5 no.3
    • /
    • pp.206-215
    • /
    • 2005
  • In this paper, we propose a new design method of Genetic Algorithm Processor(GAP) and Evolvable Hardware(EHW). All sorts of creature evolve its structure or shape in order to adapt itself to environments. Evolutionary Computation based on the process of natural selection not only searches the quasi-optimal solution through the evolution process, but also changes the structure to get best results. On the other hand, Genetic Algorithm(GA) is good fur finding solutions of complex optimization problems. However, it has a major drawback, which is its slow execution speed when is implemented in software of a conventional computer. Parallel processing has been one approach to overcome the speed problem of GA. In a point of view of GA, long bit string length caused the system of GA to spend much time that clear up the problem. Evolvable Hardware refers to the automation of electronic circuit design through artificial evolution, and is currently increased with the interested topic in a research domain and an engineering methodology. The studies of EHW generally use the XC6200 of Xilinx. The structure of XC6200 can configure with gate unit. Each unit has connected up, down, right and left cell. But the products can't use because had sterilized. So this paper uses Vertex-E (XCV2000E). The cell of FPGA is made up of Configuration Logic Block (CLB) and can't reconfigure with gate unit. This paper uses Vertex-E is composed of the component as cell of XC6200 cell in VertexE

Performance Optimization Considering I/O Data Coherency in Stream Processing (Stream Processing에서 I/O데이터 일관성을 고려한 성능 최적화)

  • Na, Hana;Yi, Joonwhan
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.53 no.8
    • /
    • pp.59-65
    • /
    • 2016
  • Performance optimization of applications with massive stream data processing has been performed by considering I/O data coherency problem where a memory is shared between processors and hardware accelerators. A formula for performance analyses is derived based on profiling results of system-level simulations. Our experimental results show that overall performance was improved by 1.40 times on average for various image sizes. Also, further optimization has been performed based on the parameters appeared in the derived formula. The final performance gain was 3.88 times comparing to the original design and we can find that the performance of the design with cacheable shared memory is not always.

A Rule-Based System for VLSI Gate-Level Logic Optimization (VLSI 게이트 레벨 논리설계 최적화를 위한 Rule-Based 시스템)

  • Lee, Seong-Bong;Chong, Jong-Wha
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.26 no.1
    • /
    • pp.98-103
    • /
    • 1989
  • A new system for logic optimization at gate-level is proposed in this paper. Ths system is rule-based, i which the rules represent the local trnsformation replacing a portion of circuits with the simplified equivalent circuits. In this system, 'rule generalization' and 'local optimization' are proposed for effective pattern matching. Rule generalization is used to reduce the circuit-search for pattern matching, and local optimization, to exclude unnecessary circuit-search. In additionk, in order to reduce unnecessary trial of pattern matching, the matching order of circuit patern is included in the rule descriptions. The effectiveness of this system is shown by its application ot the circuits which are generated by a hardware compiler.

  • PDF