• Title/Summary/Keyword: low-latency processing

Search Result 106, Processing Time 0.024 seconds

A Study on Implementation of the High Speed Feature Extraction System Based on Block Type Classification (블록 유형 분류 알고리즘 기반 고속 특징추출 시스템 구현에 관한 연구)

  • Lee, Juseong;An, Ho-Myoung
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.12 no.3
    • /
    • pp.186-191
    • /
    • 2019
  • In this paper, we propose a implementation approach of the high-speed feature extraction algorithm. The proposed method is based on the block type classification algorithm which reduces the computation time when target macro block is divided to smooth block type that has no image features. It is quantitatively identified that occurs at 29.5% of the total image using 200 standard test images with $64{\times}64$ macro block size. This means that within a standard test image containing various image information, 29.5% can reduce the complexity of the operation. When the proposed approach is applied to the Canny edge detection, the required latency of the edge detection can be completely eliminated, such as 2D derivative filter, gradient magnitude/direction computation, non-maximal suppression, adaptive threshold calculation, hysteresis thresholding. Also, it is expected that operation time of the feature detection can be reduced by applying block type classification algorithm to various feature extraction algorithms in this way.

Parallelized Architecture of Serial Finite Field Multipliers for Fast Computation (유한체 상에서 고속 연산을 위한 직렬 곱셈기의 병렬화 구조)

  • Cho, Yong-Suk
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.17 no.1
    • /
    • pp.33-39
    • /
    • 2007
  • Finite field multipliers are the basic building blocks in many applications such as error-control coding, cryptography and digital signal processing. Hence, the design of efficient dedicated finite field multiplier architectures can lead to dramatic improvement on the overall system performance. In this paper, a new bit serial structure for a multiplier with low latency in Galois field is presented. To speed up multiplication processing, we divide the product polynomial into several parts and then process them in parallel. The proposed multiplier operates standard basis of $GF(2^m)$ and is faster than bit serial ones but with lower area complexity than bit parallel ones. The most significant feature of the proposed architecture is that a trade-off between hardware complexity and delay time can be achieved.

APBT-JPEG Image Coding Based on GPU

  • Wang, Chengyou;Shan, Rongyang;Zhou, Xiao
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.9 no.4
    • /
    • pp.1457-1470
    • /
    • 2015
  • In wireless multimedia sensor networks (WMSN), the latency of transmission is an increasingly problem. With the improvement of resolution, the time cost in image and video compression is more and more, which seriously affects the real-time of WMSN. In JPEG system, the core of the system is DCT, but DCT-JPEG is not the best choice. Block-based DCT transform coding has serious blocking artifacts when the image is highly compressed at low bit rates. APBT is used in this paper to solve that problem, but APBT does not have a fast algorithm. In this paper, we analyze the structure in JPEG and propose a parallel framework to speed up the algorithm of JPEG on GPU. And we use all phase biorthogonal transform (APBT) to replace the discrete cosine transform (DCT) for the better performance of reconstructed image. Therefore, parallel APBT-JPEG is proposed to solve the real-time of WMSN and the blocking artifacts in DCT-JPEG in this paper. We use the CUDA toolkit based on GPU which is released by NVIDIA to design the parallel algorithm of APBT-JPEG. Experimental results show that the maximum speedup ratio of parallel algorithm of APBT-JPEG can reach more than 100 times with a very low version GPU, compared with conventional serial APBT-JPEG. And the reconstructed image using the proposed algorithm has better performance than the DCT-JPEG in terms of objective quality and subjective effect. The proposed parallel algorithm based on GPU of APBT also can be used in image compression, video compression, the edge detection and some other fields of image processing.

Analysis of Open Source Edge Computing Platforms: Architecture, Features, and Comparison (오픈 소스 엣지 컴퓨팅 플랫폼 분석: 구조, 특징, 비교)

  • Lim, Huhnkuk;Lee, Heejin
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.8
    • /
    • pp.985-992
    • /
    • 2020
  • Edge computing is a technology that can prepare for a new era of cloud computing. Edge computing is not a remote data center where data is processed and computed, but low-latency/high-speed computing is realized by adding computing power and data processing power to the edge side close to an access point such as a terminal device or a gateway. It is possible. The types of edge computing include mobile edge computing, fog computing, and cloudlet computing. In this article, we describes existing open source platforms for implementing edge computing nodes. By presenting and comparing the structure, features of open source edge platforms, it is possible to acquire knowledge required to select the best edge platform for industrial engineers who want to build an edge node using an actual open source edge computing platform.

A Low-Power Texture Mapping Technique for Mobile 3D Graphics (모바일 3D 그래픽스를 위한 저전력 텍스쳐 맵핑 기법)

  • Kim, Hyun-Hee;Kim, Ji-Hong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.2
    • /
    • pp.45-57
    • /
    • 2009
  • ETexture mapping is a technique used for adding reality to an image in 3D graphics. However. this technique becomes the bottleneck of the 3D graphics pipeline because it requires large processing power and high memory bandwidth. For reducing memory latency in texture mapping, texture cache is used. As portable devices become smaller and they have power constraint, it is important to reduce the area and the power consumption of the texture cache. In this paper we propose using a small texture cache to reduce the area and the power consumption of the texture cache. Furthermore, we propose techniques to keep a performance comparable to large texture caches by using prefetch techniques and a victim cache. Simulation results show the proposed small texture cache can reduce the area and the power consumption up to 70% and 60%, respectively, by using $1{\sim}2K$ bytes texture cache compared to the conventional 16K bytes cache while keeping the performance.

Fast Join Mechanism that considers the switching of the tree in Overlay Multicast (오버레이 멀티캐스팅에서 트리의 스위칭을 고려한 빠른 멤버 가입 방안에 관한 연구)

  • Cho, Sung-Yean;Rho, Kyung-Taeg;Park, Myong-Soon
    • The KIPS Transactions:PartC
    • /
    • v.10C no.5
    • /
    • pp.625-634
    • /
    • 2003
  • More than a decade after its initial proposal, deployment of IP Multicast has been limited due to the problem of traffic control in multicast routing, multicast address allocation in global internet, reliable multicast transport techniques etc. Lately, according to increase of multicast application service such as internet broadcast, real time security information service etc., overlay multicast is developed as a new internet multicast technology. In this paper, we describe an overlay multicast protocol and propose fast join mechanism that considers switching of the tree. To find a potential parent, an existing search algorithm descends the tree from the root by one level at a time, and it causes long joining latency. Also, it is try to select the nearest node as a potential parent. However, it can't select the nearest node by the degree limit of the node. As a result, the generated tree has low efficiency. To reduce long joining latency and improve the efficiency of the tree, we propose searching two levels of the tree at a time. This method forwards joining request message to own children node. So, at ordinary times, there is no overhead to keep the tree. But the joining request came, the increasing number of searching messages will reduce a long joining latency. Also searching more nodes will be helpful to construct more efficient trees. In order to evaluate the performance of our fast join mechanism, we measure the metrics such as the search latency and the number of searched node and the number of switching by the number of members and degree limit. The simulation results show that the performance of our mechanism is superior to that of the existing mechanism.

Efficient DRAM Buffer Access Scheduling Techniques for SSD Storage System (SSD 스토리지 시스템을 위한 효율적인 DRAM 버퍼 액세스 스케줄링 기법)

  • Park, Jun-Su;Hwang, Yong-Joong;Han, Tae-Hee
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.48 no.7
    • /
    • pp.48-56
    • /
    • 2011
  • Recently, new storage device SSD(Solid State Disk) based on NAND flash memory is gradually replacing HDD(Hard Disk Drive) in mobile device and thus a variety of research efforts are going on to find the cost-effective ways of performance improvement. By increasing the NAND flash channels in order to enhance the bandwidth through parallel processing, DRAM buffer which acts as a buffer cache between host(PC) and NAND flash has become the bottleneck point. To resolve this problem, this paper proposes an efficient low-cost scheme to increase SSD performance by improving DRAM buffer bandwidth through scheduling techniques which utilize DRAM multi-banks. When both host and NAND flash multi-channels request access to DRAM buffer concurrently, the proposed technique checks their destination and then schedules appropriately considering properties of DRAMs. It can reduce overheads of bank active time and row latency significantly and thus optimizes DRAM buffer bandwidth utilization. The result reveals that the proposed technique improves the SSD performance by 47.4% in read and 47.7% in write operation respectively compared to conventional methods with negligible changes and increases in the hardware.

A Disjoint Multi-path Routing Protocol for Efficient Transmission of Collecting Data in Wireless Sensor Network (무선 센서 네트워크에서 수집 데이터의 효과적인 전송을 위한 비겹침 다중경로 라우팅 프로토콜)

  • Han, Dae-Man;Lim, Jae-Hyun
    • The KIPS Transactions:PartC
    • /
    • v.17C no.5
    • /
    • pp.433-440
    • /
    • 2010
  • Energy efficiency, low latency and scalability for wireless sensor networks are important requirements, especially, the wireless sensor network consist of a large number of sensor nodes should be minimized energy consumption of each node to extend network lifetime with limited battery power. An efficient algorithm and energy management technology for minimizing the energy consumption at each sensor node is also required to improve transfer rate. Thus, this paper propose no-overlap multi-pass protocol provides for sensor data transmission in the wireless sensor network environment. The proposed scheme should minimize network overhead through reduced a sensor data translation use to searched multi-path and added the multi-path in routing table. Proposed routing protocol may minimize the energy consumption at each node, thus prolong the lifetime of the sensor network regardless of where the sink node is located outside or inside the received signal strength range. To verify propriety proposed scheme constructs sensor networks adapt to current model using the real data and evaluate consumption of total energy.

Design and Performance Evaluation of Expansion Buffer Cache (확장 버퍼 캐쉬의 설계 및 성능 평가)

  • Hong Won-Kee
    • The KIPS Transactions:PartA
    • /
    • v.11A no.7 s.91
    • /
    • pp.489-498
    • /
    • 2004
  • VLIW processor is considered to be an appropriate processor for the embedded system, provided with high performance and low power con-sumption due to its simple hardware structure. Unfortunately, the VLIW processor often suffers from high memory access latency due to the variable length of I-packets, which consist of independent instructions to be issued in parallel. It is because of the variable I-packet length that some I-packets must be placed over two cache blocks, which are called straddle I-packets, so that two cache accesses are required to fetch such I-packets. In this paper, an expansion buffer cache is proposed to improve not only the instruction fetch bandwidth, but also the power consumption of the I-cache with moderate hardware cost. The expansion buffer cache has a small expansion buffer containing a fraction of a straddle packet along with the main cache to reduce the additional cache accesses due to the straddle I-packets. With a great reduction in the cache accesses due to the straddle packets, the expansion buffer cache can achieve $5{\~}9{\%}$improvement over the conventional I-caches in the $Delay{\cdot}Power{\cdot}Area$ metric.

Home Screen Adaptive Next Generation Broadcasting Service using MSA-ABR (MSA-ABR을 이용한 홈 스크린 적응형 차세대 방송 서비스 연구)

  • Mariappan, Vinayagam;Lee, Minwoo;Lee, Seungyoun;Lee, Junghoon;Lee, Juyoung;Lim, Yunsik;Cha, Jaesang
    • Journal of Satellite, Information and Communications
    • /
    • v.11 no.3
    • /
    • pp.37-42
    • /
    • 2016
  • In this paper, in today's highly complex video and broadcast operations, broadcasters are constantly challenged to reliably deliver low-latency, high-quality video to multiscreen audiences on-air and online. The Adaptive Bit Rate (ABR) protocols enable internet video to a wide range of multiscreen devices. However, video quality is often marginal and would prove unacceptable for valued linear broadcast content delivered to the Big Screen today. The Media information processing technology advances in ABR enables service providers to take control and offer quality managed linear video services to ALL screens in the home, including the Big Screen, with a single unified IP Video infrastructure. The New Multiscreen-Assisted ABR (MSA-ABR) delivery management system proposed using Cloud based multicast-assisted ABR for a broadcast facility that performs routing of contribution content and online publishing services within a virtual, centralized cloud infrastructure.