• Title/Summary/Keyword: implementation algorithm

Search Result 4,233, Processing Time 0.027 seconds

A Parallel Implementation of JPEG2000 4K Ultra High Definition Image using OpenCL (OpenCL을 이용한 JPEG2000 4K 초고화질 영상처리의 병렬고속화 구현)

  • Park, Daeseung;Kim, Cheong Ghil
    • Journal of Satellite, Information and Communications
    • /
    • v.10 no.1
    • /
    • pp.1-5
    • /
    • 2015
  • With the help of fast growing multimedia technology and high preference for users of large screens, the newest video coding standard, HEVC (High Efficiency Video Coding) high-quality video compression), has been introduced. Therefore, the high definition image services which are four times more clear than conventional HD video, are getting popular. JPEG 2000 also has stated to support 4K and 8K UHD. As a result, it requires fast processing technology to read and write UHD images. This paper introduces a study on fast parallel processing technology for UHD images. For this purpose, first, JPEG 2000 is reviewed and a GPU based parallel implementation is proposed for a preprocessing of color conversion stage. The parallelled algorithm is implemented with OpenCL (Open Computing Language). The simulation results show that the proposed method shows 5 times performance improvements on processing speed for 4K UHD over the method using threads.

Multi-Port Register File Design and Implementation for the SIMD Programmable Shader (SIMD 프로그래머블 셰이더를 위한 멀티포트 레지스터 파일 설계 및 구현)

  • Yoon, Wan-Oh;Kim, Kyeong-Seob;Cheong, Jin-Ha;Choi, Sang-Bang
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.45 no.9
    • /
    • pp.85-95
    • /
    • 2008
  • Characteristically, 3D graphic algorithms have to perform complex calculations on massive amount of stream data. The vertex and pixel shaders have enabled efficient execution of graphic algorithms by hardware, and these graphic processors may seem to have achieved the aim of "hardwarization of software shaders." However, the hardware shaders have hitherto been evolving within the limits of Z-buffer based algorithms. We predict that the ultimate model for future graphic processors will be an algorithm-independent integrated shader which combines the functions of both vertex and pixel shaders. We design the register file model that supports 3-dimensional computer graphic on the programmable unified shader processor. we have verified the accurate calculated value using FPGA Virtex-4(xcvlx200) made by Xilinx for operating binary files made by the implementation progress based on synthesis results.

Design of A Reed-Solomon Code Decoder for Compact Disc Player using Microprogramming Method (마이크로프로그래밍 방식을 이용한 CDP용 Reed-Solomon 부호의 복호기 설계)

  • 김태용;김재균
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.18 no.10
    • /
    • pp.1495-1507
    • /
    • 1993
  • In this paper, an implementation of RS (Reed-Solomon) code decoder for CDP (Compact Disc Player) using microprogramming method is presented. In this decoding strategy, the equations composed of Newton's identities are used for computing the coefficients of the error locator polynomial and for checking the number of erasures in C2(outer code). Also, in C2 decoding the values of erasures are computed from syndromes and the results of C1(inner code) decoding. We pulled up the error correctability by correcting 4 erasures or less. The decoder contains an arithmetic logic unit over GF(28) for error correcting and a decoding controller with programming ROM, and also microinstructions. Microinstructions are used for an implementation of a decoding algorithm for RS code. As a result, it can be easily modified for upgrade or other applications by changing the programming ROM only. The decoder is implemented by the Logic Level Modeling of Verilog HDL. In the decoder, each microinstruction has 14 bits( = 1 word), and the size of the programming ROM is 360 words. The number of the maximum clock-cycle for decoding both C1 and C2 is 424.

  • PDF

NTGST-Based Parallel Computer Vision Inspection for High Resolution BLU (NTGST 병렬화를 이용한 고해상도 BLU 검사의 고속화)

  • 김복만;서경석;최흥문
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.41 no.6
    • /
    • pp.19-24
    • /
    • 2004
  • A novel fast parallel NTGST is proposed for high resolution computer vision inspection of the BLUs in a LCD production line. The conventional computation- intensive NTGST algorithm is modified and its C codes are optimized into fast NTGST to be adapted to the SIMD parallel architecture. And then, the input inspection image is partitioned and allocated to each of the P processors in multi-threaded implementation, and the NTGST is executed on SIMD architecture of N data items simultaneously in each thread. Thus, the proposed inspection system can achieve the speedup of O(NP). Experiments using Dual-Pentium III processor with its MMX and extended MMX SIMD technology show that the proposed parallel NTGST is about Sp=8 times faster than the conventional NTGST, which shows the scalability of the proposed system implementation for the fast, high resolution computer vision inspection of the various sized BLUs in LCD production lines.

Implementation of Multiview Stereoscopic 3D Display System using Volume Holographic Lenticular Sheet (VHLS 광학판 기반의 다시점 스테레오스코픽 3D 디스플레이 시스템의 구현)

  • 이상우;이맹호;김은수
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.5C
    • /
    • pp.716-725
    • /
    • 2004
  • In this paper, a new multiview stereoscopic 3D display system using a VHLS(volume holographic lenticular sheet) is suggested. The VHLS, which acts just like an optical direction modulator, can be implemented by recording the diffraction gratings corresponding each directional vector of the multiview stereoscopic images in the holographic recording material by using the angularly multiplexed recording property of the conventional volume hologram. Then, this fabricated VHLS is attached to the panel of a LCD spatial light modulator and used to diffract each of the multiview image loaded in a SLM to the corresponding spatial direction for making a 3D stereo view-zone. Accordingly, in this paper, the operational principle and characteristics of the VHLS are analyzed and an optimized 4-view VHLS is fabricated by using a commercial photopolymer. Then, a new VHLS-based 4-view stereoscopic 3D display system is implemented. Through some experimental results using a 4-view image synthesized with adaptive disparity estimation algorithm, it is suggested that implementation of a new VHLS-based multiview stereoscopic 3D display system can be possible.

Implementation of Driver Fatigue Monitoring System (운전자 졸음 인식 시스템 구현)

  • Choi, Jin-Mo;Song, Hyok;Park, Sang-Hyun;Lee, Chul-Dong
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.37 no.8C
    • /
    • pp.711-720
    • /
    • 2012
  • In this paper, we introduce the implementation of driver fatigue monitering system and its result. Input video device is selected commercially available web-cam camera. Haar transform is used to face detection and adopted illumination normalization is used for arbitrary illumination conditions. Facial image through illumination normalization is extracted using Haar face features easily. Eye candidate area through illumination normalization can be reduced by anthropometric measurement and eye detection is performed by PCA and Circle Mask mixture model. This methods achieve robust eye detection on arbitrary illumination changing conditions. Drowsiness state is determined by the level on illumination normalize eye images by a simple calculation. Our system alarms and operates seatbelt on vibration through controller area network(CAN) when the driver's doze level is detected. Our algorithm is implemented with low computation complexity and high recognition rate. We achieve 97% of correct detection rate through in-car environment experiments.

A Cryptographic Processor Supporting ARIA/AES-based GCM Authenticated Encryption (ARIA/AES 기반 GCM 인증암호를 지원하는 암호 프로세서)

  • Sung, Byung-Yoon;Kim, Ki-Bbeum;Shin, Kyung-Wook
    • Journal of IKEEE
    • /
    • v.22 no.2
    • /
    • pp.233-241
    • /
    • 2018
  • This paper describes a lightweight implementation of a cryptographic processor supporting GCM (Galois/Counter Mode) authenticated encryption (AE) that is based on the two block cipher algorithms of ARIA and AES. It also provides five modes of operation (ECB, CBC, OFB, CFB, CTR) for confidentiality as well as the key lengths of 128-bit and 256-bit. The ARIA and AES are integrated into a single hardware structure, which is based on their algorithm characteristics, and a $128{\times}12-b$ partially parallel GF (Galois field) multiplier is adopted to efficiently perform concurrent processing of CTR encryption and GHASH operation to achieve overall performance optimization. The hardware operation of the ARIA/AES-GCM AE processor was verified by FPGA implementation, and it occupied 60,800 gate equivalents (GEs) with a 180 nm CMOS cell library. The estimated throughput with the maximum clock frequency of 95 MHz are 1,105 Mbps and 810 Mbps in AES mode, 935 Mbps and 715 Mbps in ARIA mode, and 138~184 Mbps in GCM AE mode according to the key length.

Adaptive Learning Based on Bit-Significance Optimization with Hebbian Learning Rule and Its Electro-Optic Implementation (Hebb의 학습 법칙과 화소당 가중치 최소화 기법에 의한 적응학습 및 그의 전기광학적 구현)

  • Lee, Soo-Young;Shim, Chang-Sup;Koh, Sang-Ho;Jang, Ju-Seog;Shin, Sang-Yung
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.26 no.6
    • /
    • pp.108-114
    • /
    • 1989
  • Introducing and optimizing bit-significance to the Hopfield model, ten highly correlated binary images, i.e., numbers "0" to "9", are successfully stored and retrieved in a $6{}8$ node system. Unlike many other neural network models, this model has stronger error correction capability for correlated images such as "6","8","3", and "9". The bit significance optimization is regarded as an adaptive learning process based on least-mean-square error algorithm, and may be implemented with Widrow-Hoff neural nets optimizer. A design for electro-optic implementation including the adaptive optimization networks is also introduced.

  • PDF

Study on Chip Design & Implementation of 32 Bit Floating Point Compatible DSP (32비트 부동소수점 호환 DSP의 설계 및 칩 구현에 관한 연구)

  • Woo, Jong-Sik;Seo, Jin-Keun;Lim, Jae-Young;Park, Ju-Sung
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.37 no.11
    • /
    • pp.74-84
    • /
    • 2000
  • This paper deals with procedures for design and implementation of a DSP, which is compatible with TMS320C30 DSP. CBS(Cycle Based Simulator) is developed to study the architecture of the target DSP. The simulator gives us detailed information such as function block operation, control signal values, register condition, bus and memory values when a instruction is being carried out. RTL design is carried out by VHDL. Logic simulation and hardware emulation are employed to verify proper operation of the design. The DSP is fabricated with 0.6${\mu}m$ CMOS technology. The Chip has 450,000 gates complexity, $9{\times}9mm^2$ area, 20 MIPS operation speed. It is confirmed by running 109 instructions out of 114 instructions and 13 kinds of algorithm that the developed DSP has compatibility with TMS320C30.

  • PDF

A Real-Time Implementation of Isolated Word Recognition System Based on a Hardware-Efficient Viterbi Scorer (효율적인 하드웨어 구조의 Viterbi Scorer를 이용한 실시간 격리단어 인식 시스템의 구현)

  • Cho, Yun-Seok;Kim, Jin-Yul;Oh, Kwang-Sok;Lee, Hwang-Soo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.13 no.2E
    • /
    • pp.58-67
    • /
    • 1994
  • Hidden Markov Model (HMM)-based algorithms have been used successfully in many speech recognition systems, especially large vocabulary systems. Although general purpose processors can be employed for the system, they inevitably suffer from the computational complexity and enormous data. Therefore, it is essential for real-time speech recognition to develop specialized hardware to accelerate the recognition steps. This paper concerns with a real-time implementation of an isolated word recognition system based on HMM. The speech recognition system consists of a host computer (PC), a DSP board, and a prototype Viterbi scoring board. The DSP board extracts feature vectors of speech signal. The Viterbi scoring board has been implemented using three field-programmable gate array chips. It employs a hardware-efficient Viterbi scoring architecture and performs the Viterbi algorithm for HMM-based speech recognition. At the clock rate of 10 MHz, the system can update about 100,000 states within a single frame of 10ms.

  • PDF