• Title/Summary/Keyword: implementation algorithm

Search Result 4,233, Processing Time 0.03 seconds

Optimized Implementation of Scalable Multi-Precision Multiplication Method on RISC-V Processor for High-Speed Computation of Post-Quantum Cryptography (차세대 공개키 암호 고속 연산을 위한 RISC-V 프로세서 상에서의 확장 가능한 최적 곱셈 구현 기법)

  • Seo, Hwa-jeong;Kwon, Hyeok-dong;Jang, Kyoung-bae;Kim, Hyunjun
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.31 no.3
    • /
    • pp.473-480
    • /
    • 2021
  • To achieve the high-speed implementation of post-quantum cryptography, primitive operations should be tailored to the architecture of the target processor. In this paper, we present the optimized implementation of multiplier operation on RISC-V processor for post-quantum cryptography. Particularly, the column-wise multiplication algorithm is optimized with the primitive instruction of RISC-V processor, which improved the performance of 256-bit and 512-bit multiplication by 19% and 8% than previous works, respectively. Lastly, we suggest the instruction extension for the high-speed multiplication on the RISC-V processor.

An Adaptive Bit-reduced Mean Absolute Difference Criterion for Block-Matching Algorithm and Its VlSI Implementation (블럭 정합 알고리즘을 위한 적응적 비트 축소 MAD 정합 기준과 VLSI 구현)

  • Oh, Hwang-Seok;Baek, Yun-Ju;Lee, Heung-Kyu
    • Journal of KIISE:Software and Applications
    • /
    • v.27 no.5
    • /
    • pp.543-550
    • /
    • 2000
  • An adaptive bit-reduced mean absolute difference (ABRMAD) is presented as a criterion for the block-matching algorithm (BMA) to reduce the complexity of the VLSI Implementation and to improve the processing time. The ABRMAD uses the lower pixel resolution of the significant bits instead of full resolution pixel values to estimate the motion vector (MV) by examining the pixels Ina block. Simulation results show that the 4-bit ABRMAD has competitive mean square error (MSE)results and a half less hardware complexity than the MAD criterion, It has also better characteristics in terms of both MSE performance and hardware complexity than the Minimax criterion and has better MSE performance than the difference pixel counting(DPC), binary block-matching with edge-map(BBME), and bit-plane matching(BPM) with the same number of bits.

  • PDF

Distributed Arithmetic Adaptive Digital Filter Using FPGA

  • Chivapreecha, Sorawat;Piyamahachot, Satianpon;Namcharoenwattanakul, Anekchai;Chaimanee, Deow;Dejhan, Kobchai
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2004.08a
    • /
    • pp.1577-1580
    • /
    • 2004
  • This paper proposes a design and implementation of transversal adaptive digital filter using LMS (Least Mean Squares) adaptive algorithm. The filter structure is based on Distributed Arithmetic (DA) which is able to calculate the inner product by shifting and accumulating of partial products and storing in look-up table, also the desired adaptive digital filter will be multiplierless filter. In addition, the hardware implementation uses VHDL (Very high speed integrated circuit Hardware Description Language) and synthesis using FLEX10K Altera FPGA (Field Programmable Gate Array) as target technology and uses Leonardo Spectrum and MAX+plusII program for overall development. The results of this design are shown that the speed performance and used area of FPGA. The experimental results are presented to demonstrate the feasibility of the desired adaptive digital filter.

  • PDF

A Mechanism for Configurable Network Service Chaining and Its Implementation

  • Xiong, Gang;Hu, Yuxiang;Lan, Julong;Cheng, Guozhen
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.8
    • /
    • pp.3701-3727
    • /
    • 2016
  • Recently Service Function Chaining (SFC) is promising to innovate the network service mode in modern networks. However, a feasible implementation of SFC is still difficult due to the need to achieve functional equivalence with traditional modes without sacrificing performance or increasing network complexity. In this paper, we present a configurable network service chaining (CNSC) mechanism to provide services for network traffics in a flexible and optimal way. Firstly, we formulate the problem of network service chaining and design an effective service chain construction framework based on integrating software-defined networking (SDN) with network functions virtualization (NFV). Then, we model the service path computation problem as an integer liner optimization problem and propose an algorithm named SPCM to cooperatively combine service function instances with a network utility maximum policy. In the procedure of SPCM, we achieve the service node mapping by defining a service capacity matrix for substrate nodes, and work out the optimal link mapping policies with segment routing. Finally, the simulation results indicate that the average request acceptance ratio and resources utilization ratio can reach above 85% and 75% by our SPCM algorithm, respectively. Upon the prototype system, it is demonstrated that CNSC outperforms other approaches and can provide flexible and scalable network services.

Accelerating Self-Similarity-Based Image Super-Resolution Using OpenCL

  • Jun, Jae-Hee;Choi, Ji-Hoon;Lee, Dae-Yeol;Jeong, Seyoon;Cho, Suk-Hee;Kim, Hui-Yong;Kim, Jong-Ok
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.4 no.1
    • /
    • pp.10-15
    • /
    • 2015
  • This paper proposes the parallel implementation of a self-similarity based image SR (super-resolution) algorithm using OpenCL. The SR algorithm requires tremendous computations to search for a similar patch. This becomes a bottleneck for the real-time conversion from a FHD image to UHD. Therefore, it is imperative to accelerate the processing speed of SR algorithms. For parallelization, the SR process is divided into several kernels, and memory optimization is performed. In addition, two GPUs are used for further acceleration. The experimental results shows that a GPGPU implementation can speed up over 140 times compared to a single-core CPU. Furthermore, it was confirmed experimentally that utilizing two GPUs can speed up the execution time proportionally, up to 277 times.

Comments on the Computation of Sun Position for Sun Tracking System (태양추적장치를 위한 태양위치계산에서의 제언)

  • Park, Young Chil
    • Journal of the Korean Solar Energy Society
    • /
    • v.36 no.6
    • /
    • pp.47-59
    • /
    • 2016
  • As the usage of sun tracking system in solar energy utilization facility increases, requirement of more accurate computation of sun position has also been increased. Accordingly, various algorithms to compute the sun position have been proposed in the literature and some of them insist that their algorithms guarantee less than 0.01 degree computational error. However, mostly, the true meaning of accuracy argued in their publication is not clearly explained. In addition to that, they do not clearly state under what condition the accuracy they proposed can be guaranteed. Such ambiguity may induce misunderstanding on the accuracy of the computed sun position and ultimately may make misguided notion on the actual sun tracking system's sun tracking accuracy. This work presents some comments related to the implementation of sun position computational algorithm for the sun tracking system. We first introduce the algorithms proposed in the literature. And then, from sun tracking system user's point of view, we explain the true meaning of accuracy of computed sun position. We also discuss how to select the proper algorithm for the actual implementation. We finally discuss how the input factors used in computation of sun position, like time, position etc, affect the computed sun position accuracy.

Optimized and Portable FPGA-Based Systolic Cell Architecture for Smith-Waterman-Based DNA Sequence Alignment

  • Shah, Hurmat Ali;Hasan, Laiq;Koo, Insoo
    • Journal of information and communication convergence engineering
    • /
    • v.14 no.1
    • /
    • pp.26-34
    • /
    • 2016
  • The alignment of DNA sequences is one of the important processes in the field of bioinformatics. The Smith-Waterman algorithm (SWA) performs optimally for aligning sequences but is computationally expensive. Field programmable gate array (FPGA) performs the best on parameters such as cost, speed-up, and ease of re-configurability to implement SWA. The performance of FPGA-based SWA is dependent on efficient cell-basic implementation-unit design. In this paper, we present an optimized systolic cell design while avoiding oversimplification, very large-scale integration (VLSI)-level design, and direct mapping of iterative equations such as previous cell designs. The proposed design makes efficient use of hardware resources and provides portability as the proposed design is not based on gate-level details. Our cell design implementing a linear gap penalty resulted in a performance improvement of 32× over a GPP platform and surpassed the hardware utilization of another implementation by a factor of 4.23.

Implementation of TFDR system with PXI type instruments for detection and estimation of the fault on the coaxial cable (동축 케이블의 결함 측정에 있어서 PXI 타입의 계측기를 이용한 개선된 TFDR 시스템의 구현)

  • Choe, Deok-Seon;Park, Jin-Bae;Yun, Tae-Seong
    • Proceedings of the KIEE Conference
    • /
    • 2003.11b
    • /
    • pp.91-94
    • /
    • 2003
  • In this paper, we achieve implementation of a Time-Frequency Domain Reflectometry(TFDR) system through comparatively low performance(100MS/s) PCI extensions for Instrumentation(PXI). The TFDR is the general methodology of Time Domain Reflectometry(TDR) and Frequency Domain Reflectometry(FDR). This methodology is robust in Gaussian noises, because the fixed frequency bandwidth is used. Moreover, the methodology can get more information of the fault by using the normalized time-frequency cross correlation function. The Arbitrary Waveform Generator(AWG) module generates the input signal, and the digital oscilloscope module acquires the input and reflected signals, while PXI controller module performs the control of the total PXI modules and execution of the main algorithm. The maximum range of measurement and the blind spot are calculated according ta variations of time duration and frequency bandwidth. On the basis of above calculations, the algorithm and the design of input signals used in the TFDR system are verified by real experiments. The correlation function is added to the TDR methodology for reduction of the blind spot in the TFDR system.

  • PDF

A Study on Design and Implementation of Hangul-NAVTEX Simulator (한글 NAVTEX시뮬레이터 설계 및 구현에 관한 연구)

  • 이헌택;김기문
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.3 no.4
    • /
    • pp.819-830
    • /
    • 1999
  • NAVTEX system is an international automated direct-printing service, broadcast on 5181kHz and 490kHz, for the promulgation of navigational and meteorological warnings and urgent information to ships. With our government's adoption of the international convention for SAR(Search and Rescue) in 1993, various trials for the installation of NAVTEX system have been executed by the government committee, relating laboratory and experts. An important consideration of the installation for NAVTEX system is the availability that could broadcast messages written in korean letter. Also, the receiver which can process the signal demodulated from the two frequencies, 518kHz and 490kHz, should be developed and supplied in domestic. In this paper, the code table and algorithm for conversions between NAVTEX characters and Korean Letters are studied, and signal processing techniques of code conversion are developed. Circuit design and implementation of the NAVTEX simulator using the Direct Digital Synthesizer are discussed, code conversion algorithm and signal processing technique of the NAVTEX transmission are programmed in its circuits. For evaluating the its functional characteristics, receiving module which has I-Q channel structure is designed. From the measurements of simulator, the characteristics show the frequency stability of the $(\pm)2Hz$ and Spurious free dynamic range is -63dBc. And the simulator can generate simultaneously wanted signal and several interfere signals. So, its capability is valuable for designers of the transmitting system and NAVTEX receiver, for provider as testing facilities of the type approval.

  • PDF

Real-time Multiple People Tracking using Competitive Condensation (경쟁적 조건부 밀도 전파를 이용한 실시간 다중 인물 추적)

  • 강희구;김대진;방승양
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.7_8
    • /
    • pp.713-718
    • /
    • 2003
  • The CONDENSATION (Conditional Density Propagation) algorithm has a robust tracking performance and suitability for real-time implementation. However, the CONDENSATION tracker has some difficulties with real-time implementation for multiple people tracking since it requires very complicated shape modeling and a large number of samples for precise tracking performance. Further, it shows a poor tracking performance in the case of close or partially occluded people. To overcome these difficulties, we present three improvements: First, we construct effective templates of people´s shapes using the SOM (Self-Organizing Map). Second, we take the discrete HMM (Hidden Markov Modeling) for an accurate dynamical model of the people´s shape transition. Third, we use the competition rule to separate close or partially occluded people effectively. Simulation results shows that the proposed CONDENSATION algorithm can achieve robust and real-time tracking in the image sequences of a crowd of people.