• Title/Summary/Keyword: Kernel Memory

Search Result 179, Processing Time 0.028 seconds

A Study on the Optimization of Convolution Operation Speed through FFT Algorithm (FFT 적용을 통한 Convolution 연산속도 향상에 관한 연구)

  • Lim, Su-Chang;Kim, Jong-Chan
    • Journal of Korea Multimedia Society
    • /
    • v.24 no.11
    • /
    • pp.1552-1559
    • /
    • 2021
  • Convolution neural networks (CNNs) show notable performance in image processing and are used as representative core models. CNNs extract and learn features from large amounts of train dataset. In general, it has a structure in which a convolution layer and a fully connected layer are stacked. The core of CNN is the convolution layer. The size of the kernel used for feature extraction and the number that affect the depth of the feature map determine the amount of weight parameters of the CNN that can be learned. These parameters are the main causes of increasing the computational complexity and memory usage of the entire neural network. The most computationally expensive components in CNNs are fully connected and spatial convolution computations. In this paper, we propose a Fourier Convolution Neural Network that performs the operation of the convolution layer in the Fourier domain. We work on modifying and improving the amount of computation by applying the fast fourier transform method. Using the MNIST dataset, the performance was similar to that of the general CNN in terms of accuracy. In terms of operation speed, 7.2% faster operation speed was achieved. An average of 19% faster speed was achieved in experiments using 1024x1024 images and various sizes of kernels.

Change Detection for High-resolution Satellite Images Using Transfer Learning and Deep Learning Network (전이학습과 딥러닝 네트워크를 활용한 고해상도 위성영상의 변화탐지)

  • Song, Ah Ram;Choi, Jae Wan;Kim, Yong Il
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.37 no.3
    • /
    • pp.199-208
    • /
    • 2019
  • As the number of available satellites increases and technology advances, image information outputs are becoming increasingly diverse and a large amount of data is accumulating. In this study, we propose a change detection method for high-resolution satellite images that uses transfer learning and a deep learning network to overcome the limit caused by insufficient training data via the use of pre-trained information. The deep learning network used in this study comprises convolutional layers to extract the spatial and spectral information and convolutional long-short term memory layers to analyze the time series information. To use the learned information, the two initial convolutional layers of the change detection network are designed to use learned values from 40,000 patches of the ISPRS (International Society for Photogrammertry and Remote Sensing) dataset as initial values. In addition, 2D (2-Dimensional) and 3D (3-dimensional) kernels were used to find the optimized structure for the high-resolution satellite images. The experimental results for the KOMPSAT-3A (KOrean Multi-Purpose SATllite-3A) satellite images show that this change detection method can effectively extract changed/unchanged pixels but is less sensitive to changes due to shadow and relief displacements. In addition, the change detection accuracy of two sites was improved by using 3D kernels. This is because a 3D kernel can consider not only the spatial information but also the spectral information. This study indicates that we can effectively detect changes in high-resolution satellite images using the constructed image information and deep learning network. In future work, a pre-trained change detection network will be applied to newly obtained images to extend the scope of the application.

Operating System level Dynamic Power Management for Robot (로봇을 위한 운영체제 수준의 동적 전력 관리)

  • Choi Seungmin;Chae Sooik
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.42 no.5 s.335
    • /
    • pp.63-72
    • /
    • 2005
  • This paper describes a new approach for the operating system level power management to reduce the energy consumed in the IO devices in a robot platform, which provides various functions such as navigation, multimedia application, and wireless communication. The policy proposed in the paper, which was named the Energy-Aware Job Schedule (EAJS), rearranges the jobs scattered so that the idle periods of the devices are clustered into a time period and the devices are shut down during their idle period. The EAJS selects a schedule that consumes the minimum energyamong the schedules that satisfy the buffer and time constraints. Note that the burst job execution needs a larger memory buffer and causes a longer time delay from generating the job request until to finishing it. A prototype of the EAJS is implemented on the Linux kernel that manages the robot system. The experiment results show that a maximum $44\%$ power saving on a DSP and a wireless LAN card can be obtained with the EAJS.

Efficient VLSI Architecture of Full-Image Guided Filter Based on Two-Pass Model (양방향 모델을 적용한 Full-image Guided Filter의 효율적인 VLSI 구조)

  • Lee, Gyeore;Park, Taegeun
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.41 no.11
    • /
    • pp.1507-1514
    • /
    • 2016
  • Full-image guided filter reflects all pixels of image in filtering by using weight propagation and two-pass model, whereas the existing guide filter is processed based on the kernel window. Therefore the computational complexity can be improved while maintaining characteristics of guide filter, such as edge-preserving, smoothing, and so on. In this paper, we propose an efficient VLSI architecture for the full-image guided filter by analyzing the data dependency, the data frequency and the PSNR analysis of the image in order to achieve enough speed for various applications such as stereo vision, real-time systems, etc. In addition, the proposed efficient scheduling enables the realtime process by minimizing the idle period in weight computation. The proposed VLSI architecture shows 214MHz of maximum operating frequency (image size: 384*288, 965 fps) and 76K of gates (internal memory excluded).

Estimation of Large Amplitude Motions and Wave Loads of a Ship Advancing in Transient Waves by Using a Three Dimensional Time-domain Approximate Body-exact Nonlinear 2nd-order BEM (3 차원 시간영역 근사비선형 2 차경계요소법에 의한 선체의 대진폭 운동 및 파랑하중 계산)

  • Hong, Do-Chun;Hong, Sa-Young;Sung, Hong-Gun
    • Journal of the Society of Naval Architects of Korea
    • /
    • v.47 no.3
    • /
    • pp.291-305
    • /
    • 2010
  • A three-dimensional time-domain calculation method is of crucial importance in prediction of the motions and wave loads of a ship advancing in a severe irregular sea. The exact solution of the free surface wave-ship interaction problem is very complicated because of the essentially nonlinear boundary conditions. In this paper, an approximate body nonlinear approach based on the three-dimensional time-domain forward-speed free-surface Green function has been presented. The Froude-Krylov force and the hydrostatic restoring force are calculated over the instantaneous wetted surface of the ship while the forces due to the radiation and scattering potentials over the mean wetted surface. The time-domain radiation and scattering potentials have been obtained from a time invariant kernel of integral equations for the potentials which are discretized according to the second-order boundary element method (Hong and Hong 2008). The diffraction impulse-response functions of the Wigley seakeeping model advancing in transient head waves at various Froude numbers have been presented. A simulation of coupled heave-pitch motion of a long rectangular barge advancing in regular head waves of large amplitude has been carried out. Comparisons between the linear and the approximate body nonlinear numerical results of motions and wave loads of the barge at a nonzero Froude number have been made.

Deterministic Real-Time Task Scheduling (시간 결정성을 보장하는 실시간 태스크 스케줄링)

  • Cho, Moon-Haeng;Lee, Soong-Yeol;Lee, Won-Yong;Jeong, Geun-Jae;Kim, Yong-Hee;Lee, Cheol-Hoon
    • The Journal of the Korea Contents Association
    • /
    • v.7 no.1
    • /
    • pp.73-82
    • /
    • 2007
  • In recent years, embedded systems have been expanding their application domains from traditional applications (such as defense, robots, and artificial satellites) to portable devices which execute more complicated applications such as cellular phones, digital camcoders, PMPs, and MP3 players. So as to manage restricted hardware resources efficiently and to guarantee both temporal and logical correctness, every embedded system use a real-time operating system (RTOS). Only when the RTOS makes kernel services deterministic in time by specifying how long each service call will take to execute, application programers can write predictable applications. Moreover, so as for an RTOS to be deterministic, its scheduling and context switch overhead should also be predictable. In this paper, we present the complete generalized algorithm to determine the highest priority in the ready list with 22r levels of priorities in a constant time without additional memory overhead.

Task Management and Garbage Collection Execution Control Method for Providing Real-time Performance to Android (안드로이드에 실시간 성능 제공을 위한 태스크 관리 및 가비지컬렉션 실행 제어 방법)

  • Cho, Kyung-Yeon;Jo, Han-Moo;Lee, Jeong-Guk;Seo, Min-Won;Lee, Sang-Gil;Lee, Cheol-Hoon
    • The Journal of the Korea Contents Association
    • /
    • v.18 no.3
    • /
    • pp.101-113
    • /
    • 2018
  • Systems such as military inspection equipment which it is important to acquire and evaluate data in real-time should be able to real-time processing at the operating system level. As technology advances, there is a demand for replacing existing equipment with mobile device, but mobile devices with Android are not suitable for systems requiring real-time performance. On Android, garbage collection ensures free memory, while other tasks are interrupted while this task is performed, which cannot guarantee periodicity of particular tasks. In this paper, we designed and implemented a structure to control execution garbage collection of Android to solve this problem. Real-time performance is ensured by controlling garbage collection during the time required for real-time operation, and RTiK(Real-Time implanted Kernel) is applied to ensure real-time performance on Android. In order to evaluate the performance, we measured the call period of the 5ms period task, and, only 34.31% of the task was guaranteed before the control, but the task period of 98.18% was satisfied through control, providing real-time performance to Android.

Design & Implementation of the RMMC and Global Time based on the RT-eCos 3.0 (RT-eCos 3.0 기반의 RMMC 및 글로벌 타임 설계 및 구현)

  • Han, Seoung-Yeon;Kim, Jung-Guk
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.7
    • /
    • pp.759-767
    • /
    • 2010
  • RT-eCos 3.0 is a micro-sized embedded real-time kernel that has been developed based on the open source eCos 3.0 to support the basic task model of the well-known distributed real-time object model, TMO(Time-Triggered Message-triggered Object). In this paper, the design and implementation techniques of the RMMC(Real-time Multicast & Memory replication Channel) that is a standard distributed IPC model of the TMO is described based on the RT-eCos 3.0. And the support technique of the global time for using the same time in a distributed environment using the RMMC is also described. The developed global time based RMMC supports highly abstracted distributed IPC environment in a wide area distributed computing environment with the RT-eCos 3.0.

Dynamic Bandwidth Distribution Method for High Performance Non-volatile Memory in Cloud Computing Environment (클라우드 환경에서 고성능 저장장치를 위한 동적 대역폭 분배 기법)

  • Kwon, Piljin;Ahn, Sungyong
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.20 no.3
    • /
    • pp.97-103
    • /
    • 2020
  • Linux Cgroups takes a fundamental role for sharing system resources among multiple containers on container-based cloud computing environment. Especially for I/O resource, Linux Cgroups supports a mechanism for sharing I/O bandwidth in proportion to I/O weight. However, the current mechanism of Linux Cgroups using BFQ I/O scheduler seriously degrades the I/O performance with high bandwidth storage device such as NVMe SSDs. In this paper, we proposed a new feedback based I/O bandwidth sharing scheme for Linux Cgroups which allocates I/O credits to containers according to I/O weights and adjusts the amount of credits to performance fluctuation of NVMe SSDs. The proposed scheme is implemented on Linux kernel 5.3 and evaluated. The evaluation results show that it can share the I/O bandwidth among multiple containers proportionally to I/O weights while improving I/O performance more than twice as high as the existing scheme.

Implementation of High Speed Image Data Transfer using XDMA

  • Gwon, Hyeok-Jin;Choi, Doo-Hyun
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.7
    • /
    • pp.1-8
    • /
    • 2020
  • In this paper, we present an implementation of high speed image data transfer using XDMA for a video signal generation / acquisition device developed as a military test equipment. The technology proposed in this study obtains efficiency by replacing the method of copying data using the system buffer in the kernel area with the transmission and reception through the DMA engine in the FPGA. For this study, the device was developed as a PXIe platform in consideration of life cycle, and performance was maximized by using a low-cost FPGA considering mass productivity. The video I/O board implemented in this paper was tested by changing the AXI interface clock frequency and link speed through the existing memory copy method. In addition, the board was constructed using the DMA engine of the FPGA, and as a result, it was confirmed that the transfer speed was increased from 5~8Hz to 140Hz. The proposed method will contribute to strengthening defense capability by reducing the cost of device development using the PXIe platform and increasing the technology level.