• Title/Summary/Keyword: 메모리 효율

Search Result 1,782, Processing Time 0.034 seconds

A comparative study of the performance of machine learning algorithms to detect malicious traffic in IoT networks (IoT 네트워크에서 악성 트래픽을 탐지하기 위한 머신러닝 알고리즘의 성능 비교연구)

  • Hyun, Mi-Jin
    • Journal of Digital Convergence
    • /
    • v.19 no.9
    • /
    • pp.463-468
    • /
    • 2021
  • Although the IoT is showing explosive growth due to the development of technology and the spread of IoT devices and activation of services, serious security risks and financial damage are occurring due to the activities of various botnets. Therefore, it is important to accurately and quickly detect the activities of these botnets. As security in the IoT environment has characteristics that require operation with minimum processing performance and memory, in this paper, the minimum characteristics for detection are selected, and KNN (K-Nearest Neighbor), Naïve Bayes, Decision Tree, Random A comparative study was conducted on the performance of machine learning algorithms such as Forest to detect botnet activity. Experimental results using the Bot-IoT dataset showed that KNN can detect DDoS, DoS, and Reconnaissance attacks most effectively and efficiently among the applied machine learning algorithms.

Design of a High-Performance Mobile GPGPU with SIMT Architecture based on a Small-size Warp Scheduler (작은 크기의 Warp 스케쥴러 기반 SIMT구조 고성능 모바일 GPGPU 설계)

  • Lee, Kwang-Yeob
    • Journal of IKEEE
    • /
    • v.25 no.3
    • /
    • pp.479-484
    • /
    • 2021
  • This paper proposed and designed a structure to achieve high performance with a small number of cores in GPGPU with SIMT structure. GPGPU for application to mobile devices requires a structure to increase performance compared to power consumption. In order to reduce power consumption, the number of cores decreased, but to improve performance, the size of the warp scheduler for managing threads was set to 4, which was greatly reduced than 32 of general GPGPU. Reducing warp size can reduce the number of idle cycles in pipelines and efficiently apply memory latency to reduce miss penalty when accessing cache memory. The designed GPGPU measured computational performance using a test program that includes floating point operations and measured power consumption through a 28nm CMOS process to obtain 104.5GFlops/Watt as a performance per power. The results of this paper showed about four times better performance per power compared to Tegra K1 of Nvidia

Efficient Flash Memory Access Power Reduction Techniques for IoT-Driven Rare-Event Logging Application (IoT 기반 간헐적 이벤트 로깅 응용에 최적화된 효율적 플래시 메모리 전력 소모 감소기법)

  • Kwon, Jisu;Cho, Jeonghun;Park, Daejin
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.14 no.2
    • /
    • pp.87-96
    • /
    • 2019
  • Low power issue is one of the most critical problems in the Internet of Things (IoT), which are powered by battery. To solve this problem, various approaches have been presented so far. In this paper, we propose a method to reduce the power consumption by reducing the numbers of accesses into the flash memory consuming a large amount of power for on-chip software execution. Our approach is based on using cooperative logging structure to distribute the sampling overhead in single sensor node to adjacent nodes in case of rare-event applications. The proposed algorithm to identify event occurrence is newly introduced with negative feedback method by observing difference between past data and recent data coming from the sensor. When an event with need of flash access is determined, the proposed approach only allows access to write the sampled data in flash memory. The proposed event detection algorithm (EDA) result in 30% reduction of power consumption compared to the conventional flash write scheme for all cases of event. The sampled data from the sensor is first traced into the random access memory (RAM), and write access to the flash memory is delayed until the page buffer of the on-chip flash memory controller in the micro controller unit (MCU) is full of the numbers of the traced data, thereby reducing the frequency of accessing flash memory. This technique additionally reduces power consumption by 40% compared to flash-write all data. By sharing the sampling information via LoRa channel, the overhead in sampling data is distributed, to reduce the sampling load on each node, so that the 66% reduction of total power consumption is achieved in several IoT edge nodes by removing the sampling operation of duplicated data.

A Study on the Hardware Design of High-Throughput HEVC CABAC Binary Arithmetic Encoder (높은 처리량을 갖는 HEVC CABAC 이진 산술 부호화기의 하드웨어 설계에 관한 연구)

  • Jo, Hyun-gu;Ryoo, Kwang-ki
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2016.10a
    • /
    • pp.401-404
    • /
    • 2016
  • This paper proposes entropy coding method of HEVC CABAC Encoder for efficient hardware architecture. The Binary Arithmetic Encoder requires data dependency at each step, which is difficult to be operated in a fast. Proposed Binary Arithmetic Encoder is designed 4 stage pipeline to quickly process the input value bin. According to bin approach, either MPS or LPS is selected and the binary arithmetic encoding is performed. Critical path caused by repeated operation is reduced by using the LUT and designed as a shift operation which decreases hardware size and not using memory. The proposed Binary Arithmetic Encoder of CABAC is designed using Verilog-HDL and it was implemented in 65nm technology. Its gate count is 3.17k and operating speed is 1.53GHz.

  • PDF

Design and Implementation of Feature Detector for Object Tracking (객체 추적을 위한 특징점 검출기의 설계 및 구현)

  • Lee, Du-hyeon;Kim, Hyeon;Cho, Jae-chan;Jung, Yun-ho
    • Journal of IKEEE
    • /
    • v.23 no.1
    • /
    • pp.207-213
    • /
    • 2019
  • In this paper, we propose a low-complexity feature detection algorithm for object tracking and present hardware architecture design and implementation results for real-time processing. The existing Shi-Tomasi algorithm shows good performance in object tracking applications, but has a high computational complexity. Therefore, we propose an efficient feature detection algorithm, which can reduce the operational complexity with the similar performance to Shi-Tomasi algorithm, and present its real-time implementation results. The proposed feature detector was implemented with 1,307 logic slices, 5 DSP 48s and 86.91Kbits memory with FPGA. In addition, it can support the real-time processing of 54fps at an operating frequency of 114MHz for $1920{\times}1080FHD$ images.

Efficient 3D Acoustic Wave Propagation Modeling using a Cell-based Finite Difference Method (셀 기반 유한 차분법을 이용한 효율적인 3차원 음향파 파동 전파 모델링)

  • Park, Byeonggyeong;Ha, Wansoo
    • Geophysics and Geophysical Exploration
    • /
    • v.22 no.2
    • /
    • pp.56-61
    • /
    • 2019
  • In this paper, we studied efficient modeling strategies when we simulate the 3D time-domain acoustic wave propagation using a cell-based finite difference method which can handle the variations of both P-wave velocity and density. The standard finite difference method assigns physical properties such as velocities of elastic waves and density to grid points; on the other hand, the cell-based finite difference method assigns physical properties to cells between grid points. The cell-based finite difference method uses average physical properties of adjacent cells to calculate the finite difference equation centered at a grid point. This feature increases the computational cost of the cell-based finite difference method compared to the standard finite different method. In this study, we used additional memory to mitigate the computational overburden and thus reduced the calculation time by more than 30 %. Furthermore, we were able to enhance the performance of the modeling on several media with limited density variations by using the cell-based and standard finite difference methods together.

Fast Content-preserving Seam Estimation for Real-time High-resolution Video Stitching (실시간 고해상도 동영상 스티칭을 위한 고속 콘텐츠 보존 시접선 추정 방법)

  • Kim, Taeha;Yang, Seongyeop;Kang, Byeongkeun;Lee, Hee Kyung;Seo, Jeongil;Lee, Yeejin
    • Journal of Broadcast Engineering
    • /
    • v.25 no.6
    • /
    • pp.1004-1012
    • /
    • 2020
  • We present a novel content-preserving seam estimation algorithm for real-time high-resolution video stitching. Seam estimation is one of the fundamental steps in image/video stitching. It is to minimize visual artifacts in the transition areas between images. Typical seam estimation algorithms are based on optimization methods that demand intensive computations and large memory. The algorithms, however, often fail to avoid objects and results in cropped or duplicated objects. They also lack temporal consistency and induce flickering between frames. Hence, we propose an efficient and temporarily-consistent seam estimation algorithm that utilizes a straight line. The proposed method also uses convolutional neural network-based instance segmentation to locate seam at out-of-objects. Experimental results demonstrate that the proposed method produces visually plausible stitched videos with minimal visual artifacts in real-time.

Truncated Differential Cryptanalysis on PP-1/64-128 (블록 암호 PP-1/64-128에 대한 부정 차분 공격)

  • Hong, Yong-Pyo;Lee, Yus-Sop;Jeong, Ki-Tae;Sung, Jae-Chul;Hong, Seok-Hie
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.21 no.6
    • /
    • pp.35-44
    • /
    • 2011
  • The PP-1/64-128 block cipher support variety data block and secret key size. Also, it is suitable for hardware implementation and can much easier to apply Concurrent Error Detection(CED) for cryptographic chips compared to other block ciphers, because it has same encryption and decryption process. In this paper, we proposed truncated differential cryptanalysis of PP-1/64-128. the attack on PP-1/64-128 block cipher requires $2^{50.16}$ chosen plaintexts, $2^{46.16}$ bytes memory spaces and $2^{50.45}$ PP-1/64-128 encryption to retrieve secret key. This is the best result of currently known PP-1/64-128 differential cryptanalysis.

Degree-of-Freedom-Based Reduction Method for Modal Analysis of Repeated Structure (반복 구조물의 모드 해석을 위한 효과적인 자유도 기반 축소 기법)

  • Choi, Geomji;Chang, Seongmin
    • Journal of the Computational Structural Engineering Institute of Korea
    • /
    • v.34 no.2
    • /
    • pp.71-75
    • /
    • 2021
  • Despite the development of computational resources, the need to analyze models is increasing. The size of model has been increased to analyze the entire structure more accurately and precisely. As the analysis model becomes larger and more complex, the computation time increases exponentially. Various industries use many structures that have repeated patterns. We focus on these structures with repeated patterns and propose a dynamic analysis method to efficiently calculate these repeated structures. To devise an efficient method for repeated structures, the substructuring scheme and the degree of freedom-based reduction method are used in this study. We modify the existing reduction method in consideration of the characteristics of the repeating structure. In the proposed method, the entire structure was expressed as a combination of substructures, where each substructure was represented as an unit cell of repeated structures. The substructures were condensed and assembled using the substructuring scheme and the modified condensation method. Finally, numerical examples were demonstrated to verify the efficiency and accuracy of proposed method.

Implementation of FPGA-based Accelerator for GRU Inference with Structured Compression (구조적 압축을 통한 FPGA 기반 GRU 추론 가속기 설계)

  • Chae, Byeong-Cheol
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.6
    • /
    • pp.850-858
    • /
    • 2022
  • To deploy Gate Recurrent Units (GRU) on resource-constrained embedded devices, this paper presents a reconfigurable FPGA-based GRU accelerator that enables structured compression. Firstly, a dense GRU model is significantly reduced in size by hybrid quantization and structured top-k pruning. Secondly, the energy consumption on external memory access is greatly reduced by the proposed reuse computing pattern. Finally, the accelerator can handle a structured sparse model that benefits from the algorithm-hardware co-design workflows. Moreover, inference tasks can be flexibly performed using all functional dimensions, sequence length, and number of layers. Implemented on the Intel DE1-SoC FPGA, the proposed accelerator achieves 45.01 GOPs in a structured sparse GRU network without batching. Compared to the implementation of CPU and GPU, low-cost FPGA accelerator achieves 57 and 30x improvements in latency, 300 and 23.44x improvements in energy efficiency, respectively. Thus, the proposed accelerator is utilized as an early study of real-time embedded applications, demonstrating the potential for further development in the future.