• Title/Summary/Keyword: memory compression

Search Result 255, Processing Time 0.023 seconds

Implementation of FPGA-based Accelerator for GRU Inference with Structured Compression (구조적 압축을 통한 FPGA 기반 GRU 추론 가속기 설계)

  • Chae, Byeong-Cheol
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.6
    • /
    • pp.850-858
    • /
    • 2022
  • To deploy Gate Recurrent Units (GRU) on resource-constrained embedded devices, this paper presents a reconfigurable FPGA-based GRU accelerator that enables structured compression. Firstly, a dense GRU model is significantly reduced in size by hybrid quantization and structured top-k pruning. Secondly, the energy consumption on external memory access is greatly reduced by the proposed reuse computing pattern. Finally, the accelerator can handle a structured sparse model that benefits from the algorithm-hardware co-design workflows. Moreover, inference tasks can be flexibly performed using all functional dimensions, sequence length, and number of layers. Implemented on the Intel DE1-SoC FPGA, the proposed accelerator achieves 45.01 GOPs in a structured sparse GRU network without batching. Compared to the implementation of CPU and GPU, low-cost FPGA accelerator achieves 57 and 30x improvements in latency, 300 and 23.44x improvements in energy efficiency, respectively. Thus, the proposed accelerator is utilized as an early study of real-time embedded applications, demonstrating the potential for further development in the future.

Design of Scalable Intra-prediction Architecture for H.264 Decoders (H.264 복호기를 위한 스케일러블 인트라 예측기 구조 설계)

  • Lee, Chan-Ho
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.45 no.11
    • /
    • pp.77-82
    • /
    • 2008
  • H.264 is a video coding standard of ITU-T and ISO/IEC, and widely spreads its application due to its high compression ratio more than twice that of MPEG-2 and high image quality. It has different architecture depending on demands since it is a lied from small image of QVGA to large size of HD. In this paper, We propose a scalable architecture for intra-prediction of H.264 decoders. The proposed scheme has a scalable architecture that can accommodate up to 4 processing elements depending on performance demands and can reduce the number of access to memory using efficient memory management so as to be energy-efficient. We design the intra-prediction unit using Verilog-HDL and verily it by prototyping using an FPGA. The performance is analyzed using the results of design.

A Low Cost Instruction Set for Bit Stream Process (비트열 처리를 위한 저비용 명령어 세트)

  • Ham, Dong-Hyeon;Lee, Hyoung-Pyo;Lee, Yong-Surk
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.45 no.2
    • /
    • pp.41-47
    • /
    • 2008
  • Most of media compression CODECs adopts the variable length coding method. This paper proposes special registers and instruction set for bit stream process in order to accelerate the decoding process of the variable length code. The instruction set shares the conventional data path to minimize additional costs. And bit stream is read from the memory instead of the special port. Therefore the instruction set minimizes the change of the processor, and is adopted without any additional input controller and buffer, and accelerate decoding process of variable length code. The data path of the instruction set needs additional 65 bits memory and 344 equivalent gates, 0.19 ns delay under TSMC $0.25{\mu}m$ technology. The instruction set reduced the execution time of the variable length code decoding process in H.264/AVC by about 55%.

Cross Compressed Replication Scheme for Large-Volume Column Storages (대용량 컬럼 저장소를 위한 교차 압축 이중화 기법)

  • Byun, Siwoo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.14 no.5
    • /
    • pp.2449-2456
    • /
    • 2013
  • The column-oriented database storage is a very advanced model for large-volume data analysis systems because of its superior I/O performance. Traditional data storages exploit row-oriented storage where the attributes of a record are placed contiguously in hard disk for fast write operations. However, for search-mostly datawarehouse systems, column-oriented storage has become a more proper model because of its superior read performance. Recently, solid state drive using MLC flash memory is largely recognized as the preferred storage media for high-speed data analysis systems. In this paper, we introduce fast column-oriented data storage model and then propose a new storage management scheme using a cross compressed replication for the high-speed column-oriented datawarehouse system. Our storage management scheme which is based on two MLC SSD achieves superior performance and reliability by the cross replication of the uncompressed segment and the compressed segment under high workloads of CPU and I/O. Based on the results of the performance evaluation, we conclude that our storage management scheme outperforms the traditional scheme in the respect of update throughput and response time of the column segments.

3-D Wavelet Compression with Lifting Scheme for Rendering Concentric Mosaic Image (동심원 모자이크 영상 표현을 위한 Lifting을 이용한 3차원 웨이브렛 압축)

  • Jang Sun-Bong;Jee Inn-Ho
    • Journal of Broadcast Engineering
    • /
    • v.11 no.2 s.31
    • /
    • pp.164-173
    • /
    • 2006
  • The data structure of the concentric mosaic can be regarded as a video sequence with a slowly panning camera. We take a concentric mosaic with match or alignment of video sequences. Also the concentric mosaic required for huge memory. Thus, compressing is essential in order to use the concentric mosaic. Therefore we need the algorithm that compressed data structure was maintained and the scene was decoded. In this paper, we used 3D lifting transform to compress concentric mosaic. Lifting transform has a merit of wavelet transform and reduces computation quantities and memory. Because each frame has high correlation, the complexity which a scene is detected form 3D transformed bitstream is increased. Thus, in order to have higher performance and decrease the complexity of detecting of a scene we executed 3D lifting and then transformed data set was sequently compressed with each frame unit. Each frame has a flexible bit rate. Also, we proposed the algorithm that compressed data structure was maintained and the scene was decoded by using property of lifting structure.

Hardware Design for JBIG2 Huffman Coder (JBIG2 허프만 부호화기의 하드웨어 설계)

  • Park, Kyung-Jun;Ko, Hyung-Hwa
    • Journal of Korea Multimedia Society
    • /
    • v.12 no.2
    • /
    • pp.200-208
    • /
    • 2009
  • JBIG2, as the next generation standard for binary image compression, must be designed in hardware modules for the JBIG2 FAX to be implemented in an embedded equipment. This paper proposes a hardware module of the high-speed Huffman coder for JBIG2. The Huffman coder of JBIG2 uses selectively 15 Huffman tables. As the Huffman coder is designed to use minimal data and have an efficient memory usage, high speed processing is possible. The designed Huffman coder is ported to Virtex-4 FPGA and co-operating with a software modules on the embedded development board using Microblaze core. The designed IP was successfully verified using the simulation function test and hardware-software co-operating test. Experimental results shows the processing time is 10 times faster than that of software only on embedded system, because of hardware design using an efficient memory usage.

  • PDF

Design of Memory-Efficient Deterministic Finite Automata by Merging States With The Same Input Character (동일한 입력 문자를 가지는 상태의 병합을 통한 메모리 효율적인 결정적 유한 오토마타 구현)

  • Choi, Yoon-Ho
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.23 no.3
    • /
    • pp.395-404
    • /
    • 2013
  • A pattern matching algorithm plays an important role in traffic identification and classification based on predefined patterns for intrusion detection and prevention. As attacks become prevalent and complex, current patterns are written using regular expressions, called regexes, which are expressed into the deterministic finite automata(DFA) due to the guaranteed worst-case performance in pattern matching process. Currently, because of the increased complexity of regex patterns and their large number, memory-efficient DFA from states reduction have become the mainstay of pattern matching process. However, most of the previous works have focused on reducing only the number of states on a single automaton, and thus there still exists a state blowup problem under the large number of patterns. To solve the above problem, we propose a new state compression algorithm that merges states on multiple automata. We show that by merging states with the same input character on multiple automata, the proposed algorithm can lead to a significant reduction of the number of states in the original DFA by as much as 40.0% on average.

Reducing Method of Energy Consumption of Phase Change Memory using Narrow-Value Data (내로우 값을 이용한 상변화 메모리상에서의 에너지 소모 절감 기법)

  • Kim, Young-Ung
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.15 no.2
    • /
    • pp.137-143
    • /
    • 2015
  • During the past 30 years, DRAM has been used for the reasons of economic efficiency of the production. Recently, PRAM has been emerged to overcome the shortcomings of DRAM. In this paper, we propose a technique that can reduce energy consumption by use of a narrow values to the write operation of PRAM. For this purpose, we describe the data compression method using a narrow value and the architecture of PRAM, We also experiment under the Simplescalar 3.0e simulator and SPEC CPU2000 benchmark environments. According to the experiments, the data hit rate of PRAM was increased by 39.4% to 67.7% and energy consumption was reduced by 9.2%. In order to use the proposed technique, it requires 3.12% of space overhead per word, and some additional hardware modules.

Design of Prediction Unit for H.264 decoder (H.264 복호기를 위한 효율적인 예측 연산기 설계)

  • Lee, Chan-Ho
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.46 no.7
    • /
    • pp.47-52
    • /
    • 2009
  • H.264 video coding standard is widely used due to the high compression rate and quality. The motion compensation is the most time-consuming and complex unit in the H.264 decoder. The performance of the motion compensation is determined by the calculation of pixel interpolation and management of the reference pixels. The reference pixels read from external memory using efficient memory management for data reuse is necessary along with the high performance interpolators. We propose the architecture of a motion compensation unit for H.264 decoders. It is composed of 2-dimensional circular register files, a motion vector predictor and high performance interpolators with low complexity. The 2-dimensional circular register files reuse reference pixel data as much as possible, and feed reference pixel data to interpolators without any latency and complex logic circuits. We design a motion compensation unit and a intra-prediction unit and integrate them into a prediction unit and verify the operation and the performance.

Processing Method of Mass Small File Using Hadoop Platform (하둡 플랫폼을 이용한 대량의 스몰파일 처리방법)

  • Kim, Chang-Bok;Chung, Jae-Pil
    • Journal of Advanced Navigation Technology
    • /
    • v.18 no.4
    • /
    • pp.401-408
    • /
    • 2014
  • Hadoop is composed with MapReduce programming model for distributed processing and HDFS distributed file system. Hadoop is suitable framework for big data processing, but processing of mass small files have many problems. The processing of mass small file in hadoop have problems to created one mapper per one file, and it have problems to needed many memory for store of meta information of file. This paper have comparison evaluation processing method of mass small file with various method in hadoop platform. The processing of general compression format is inadequate because of processing by one mapper regardless of data size. The processing of sequence and hadoop archive file is removed memory problem of namenode by compress and combine of small file. Hadoop archive file is faster then sequence file about combine time of small file. The processing using CombineFileInputFormat class is needed not combine of small file, and it have similar speed big data processing method.