• Title/Summary/Keyword: low-latency processing

Search Result 106, Processing Time 0.022 seconds

Energy-Efficient Signal Processing Using FPGAs (FPGA 상에서 에너지 효율이 높은 병렬 신호처리 기법)

  • Jang Ju-wook;Hwang Yunil;Scrofano Ronald;Prasanna Viktor K.
    • The KIPS Transactions:PartA
    • /
    • v.12A no.4 s.94
    • /
    • pp.305-312
    • /
    • 2005
  • In this paper, we present algorithm-level techniques for energy-efficient design at the algorithm level using FPGAs. We then use these techniques to create energy-efficient designs for two signal processing kernel applications: fast Fourier transform(FFT) and matrix multiplication. We evaluate the performance, in terms of both latency and energy efficiency, of FPGAs in performing these tasks. Using a Xilinx Virtex-II as the target FPGA, we compare the performance of our designs to those from the Xilinx library as well as to conventional algorithms run on the PowerPC core embedded in the Virtex-II Pro and the Texas Instruments TMS320C6415. Our evaluations are done both through estimation based on energy and latency equations on high-level and through low-level simulation. For FFT, our designs dissipated an average of $50\%$ less energy than the design from the Xilinx library and $56\%$ less than the DSP. Our designs showed an EAT factor of 10 times improvement over the embedded processor. These results provide a concrete evidence to substantiate the idea that FPGAs can outperform DSPs and embedded processors in signal processing. Further, they show that PFGAs can achieve this performance while still dissipating less energy than the other two types of devices.

Directory Cache Coherence Scheme using the Number-Balanced Binary Tree (수 평형 이진트리를 이용한 디렉토리 캐쉬 일관성 유지 기법)

  • Seo, Dae-Wha
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.3
    • /
    • pp.821-830
    • /
    • 1997
  • The directory-based cache coherence scheme is an attractive approach to solve the caceh coherence problem in a large-scale shared-memory multiprocessor.However, the exsting directory-based schemes have some problens such as the enormous storage overhead for a directory, the long invalidation latency, the heavy network condes-tion, and the low scalability.For resolving these problems, we propose a new directroy- based caceh coherence scheme which is suitable for building scalable, shred-memory multiprocessors.In this scheme, each directory en-try ofr a given memory block is a number-balanced binaty tree(NBBT) stucture.The NBBT has several proper-ties to effciently maintain the directory for the cache consistency such that the shape is unique, the maximum depth is [log$_2$n], and the tree has the minimum number of leaf nodes among the binarry tree with n nodes.Therefore, this scheme can reduce the storage overhead, the network traffic, and the inbalidation latency and can ensutr the high- scalability the large-scale shared-memory multiprocessors.

  • PDF

Performance Analysis and Improvement of WANProxy (WANProxy의 성능 분석 및 개선)

  • Kim, Haneul;Ji, Seungkyu;Chung, Kyusik
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.9 no.3
    • /
    • pp.45-58
    • /
    • 2020
  • In the current trend of increasing network traffic due to the popularization of cloud service and mobile devices, WAN bandwidth is very low compared to LAN bandwidth. In a WAN environment, a WAN optimizer is needed to overcome performance problems caused by transmission protocol, packet loss, and network bandwidth limitations. In this paper, we analyze the data deduplication algorithm of WANProxy, an open source WAN optimizer, and evaluate its performance in terms of network latency and WAN bandwidth. Also, we evaluate the performance of the two-stage compression method of WANProxy and Zstandard. We propose a new method to improve the performance of WANProxy by revising its data deduplication algorithm and evaluate its performance improvement. We perform experiments using 12 data files of Silesia with a data segment size of 2048 bytes. Experimental results show that the average compression rate by WANProxy is 150.6, and the average network latency reduction rates by WANProxy are 95.2% for a 10 Mbps WAN environment and 60.7% for a 100 Mbps WAN environment, respectively. Compared with WANProxy, the two-stage compression of WANProxy and Zstandard increases the average compression rate by 33%. However, it increases the average network latency by 2.1% for a 10 Mbps WAN environment and 5.27% for a 100 Mbps WAN environment, respectively. Compared with WANProxy, our proposed method increases the average compression rate by 34.8% and reduces the average network latency by 13.8% for a 10 Mbps WAN and 12.9% for a 100 Mbps WAN, respectively. Performance analysis results of WANProxy show that its performance improvement in terms of network latency and WAN bandwidth is excellent in a 10Mbps or less WAN environment while superior in a 100 Mbps WAN environment.

Approximate Top-k Subgraph Matching Scheme Considering Data Reuse in Large Graph Stream Environments (대용량 그래프 스트림 환경에서 데이터 재사용을 고려한 근사 Top-k 서브 그래프 매칭 기법)

  • Choi, Do-Jin;Bok, Kyoung-Soo;Yoo, Jae-Soo
    • The Journal of the Korea Contents Association
    • /
    • v.20 no.8
    • /
    • pp.42-53
    • /
    • 2020
  • With the development of social network services, graph structures have been utilized to represent relationships among objects in various applications. Recently, a demand of subgraph matching in real-time graph streams has been increased. Therefore, an efficient approximate Top-k subgraph matching scheme for low latency in real-time graph streams is required. In this paper, we propose an approximate Top-k subgraph matching scheme considering data reuse in graph stream environments. The proposed scheme utilizes the distributed stream processing platform, called Storm to handle a large amount of stream data. We also utilize an existing data reuse scheme to decrease stream processing costs. We propose a distance based summary indexing technique to generate Top-k subgraph matching results. The proposed summary indexing technique costs very low since it only stores distances among vertices that are selected in advance. Finally, we provide k subgraph matching results to users by performing an approximate Top-k matching on the summary indexing. In order to show the superiority of the proposed scheme, we conduct various performance evaluations in diverse real world datasets.

Design of Multicast Cut-through Switch using Shared Bus (공유 버스를 사용한 멀티캐스트 Cut-through 스위치의 설계)

  • Baek, Jung-Min;Kim, Sung-Chun
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.27 no.3
    • /
    • pp.277-286
    • /
    • 2000
  • Switch-based network is suitable for the environment of demanding high performance network. Traditional shared-medium Local Area Networks(LANs) do not provide sufficient throughput and latency. Specially, communication performance is more important with multimedia application. In these environments, switch-based network results in high performance. A kind of switch-based network provides higher bandwidth and low latency. Thus high-speed switch is essential to build switch-based LANs. An effective switch design is the most important factor of the switch-based network performance, and is required for the multicast message processing. In the previous cut-through switching technique, switch element reconfiguration has the capability of multicasting and deadlock-free. However, it has problems of low throughput as well as large scale of switch. Therfore, effective multicating can be implemented by using divided hardware unicast and multicast. The objective of this thesis is to suggest switch configuration with these features.

  • PDF

The Implementation of a Lift Emergency Video Call System based on WebRTC using OpenAPI

  • Woon-Yong Kim
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.10
    • /
    • pp.155-161
    • /
    • 2023
  • In this paper, we present a WebRTC-based emergency video call system structure that builds a service system in a constant monitoring environment to increase the usability and stability of elevator emergency call devices. The proposed system provides a smooth call environment between the emergency call system in the elevator and maintenance managers in case of an emergency, performs rapid response processing to elevator emergency calls through monitoring of the target elevator, and handles any emergency calls that may occur in the physical space of the elevator. The purpose is to build an environment that can implement low-latency, real-time video call services of voice and video by overcoming the physical constraints required for video calls. To this end, we have established a service environment based on OpenAPI, which is currently used in various fields and its performance has been proven, and provides video calls and emergency situation dissemination through rapid messaging by providing low-latency call quality. The presented system structure will be able to provide a basis for expanding various functions and constructing a reliable service environment and intelligent model for the elevator system through combination with the elevator control panel and various devices.

New Two-Level L1 Data Cache Bypassing Technique for High Performance GPUs

  • Kim, Gwang Bok;Kim, Cheol Hong
    • Journal of Information Processing Systems
    • /
    • v.17 no.1
    • /
    • pp.51-62
    • /
    • 2021
  • On-chip caches of graphics processing units (GPUs) have contributed to improved GPU performance by reducing long memory access latency. However, cache efficiency remains low despite the facts that recent GPUs have considerably mitigated the bottleneck problem of L1 data cache. Although the cache miss rate is a reasonable metric for cache efficiency, it is not necessarily proportional to GPU performance. In this study, we introduce a second key determinant to overcome the problem of predicting the performance gains from L1 data cache based on the assumption that miss rate only is not accurate. The proposed technique estimates the benefits of the cache by measuring the balance between cache efficiency and throughput. The throughput of the cache is predicted based on the warp occupancy information in the warp pool. Then, the warp occupancy is used for a second bypass phase when workloads show an ambiguous miss rate. In our proposed architecture, the L1 data cache is turned off for a long period when the warp occupancy is not high. Our two-level bypassing technique can be applied to recent GPU models and improves the performance by 6% on average compared to the architecture without bypassing. Moreover, it outperforms the conventional bottleneck-based bypassing techniques.

An Efficient MCTF Architecture using Processing Frame Re-configuration (처리 프레임의 재구성을 통한 효율적인 MCTF 구조)

  • Seo, Young-Ho;Choi, Hyun-Jun;Kim, Young-Hyun;Kim, Dong-Wook
    • Proceedings of the IEEK Conference
    • /
    • 2005.11a
    • /
    • pp.335-338
    • /
    • 2005
  • In this paper, we proposed a new MCTF (Motion Compensated Temporal Filtering) technique and its hardware (H/W) architecture for SVC (Scalable Video Coding). Since the proposed MCTF Kernel has a extensible architecture, it executes temporal filtering using (5,3) and (3,1) lifting operation. Also it has the same output data rate as the input, and it can continuously produce filtered frames after some latency time. Since the proposed architecture has simpler architecture than previous ones, it is easily mapped into H/W and has optimized memory usage rate and low cost.

  • PDF

Implementation of Efficient Channel Decoder for WiBro System (WiBro 시스템을 위한 효율적인 구조의 채널 복호화기 구현)

  • Kim, Jang-Hun;Han, Chul-Hee
    • Proceedings of the IEEK Conference
    • /
    • 2007.07a
    • /
    • pp.177-178
    • /
    • 2007
  • WiBro system provides reliable broadband communication services for mobile and portable subcribers. It allows interference-free reception under the conditions of multipath propagation and transmission errors. Thus, powerful channel-error correction ability Is required. CC/CTC Decoder which Is mandatory for WiBro system needs lots of computations for real-time operation. So, it is desired to design a CC/CTC Decoder having highly optimized hardware scheme for low latency operation under high data rates. This paper proposes an efficient CC/CTC Decoder structure for high data rate WiBro system. Particularly, the proposed CTC Decoder architecture reduces decoding delay by applying pipelining and multiple decoding blocks. Simulation results show that reduction of about 80% of processing time is enabled with the proposed CC/CTC Decoder despite of increase in are.

  • PDF

Task Distribution Scheme based on Service Requirements Considering Opportunistic Fog Computing Nodes in Fog Computing Environments (포그 컴퓨팅 환경에서 기회적 포그 컴퓨팅 노드들을 고려한 서비스 요구사항 기반 테스크 분배 방법)

  • Kyung, Yeunwoong
    • Journal of Korea Multimedia Society
    • /
    • v.24 no.1
    • /
    • pp.51-57
    • /
    • 2021
  • In this paper, we propose a task distribution scheme in fog computing environment considering opportunistic fog computing nodes. As latency is one of the important performance metric for IoT(Internet of Things) applications, there have been lots of researches on the fog computing system. However, since the load can be concentrated to the specific fog computing nodes due to the spatial and temporal IoT characteristics, the load distribution should be considered to prevent the performance degradation. Therefore, this paper proposes a task distribution scheme which considers the static as well as opportunistic fog computing nodes according to their mobility feature. Especially, based on the task requirements, the proposed scheme supports the delay sensitive task processing at the static fog node and delay in-sensitive tasks by means of the opportunistic fog nodes for the task distribution. Based on the performance evaluation, the proposed scheme shows low service response time compared to the conventional schemes.