• Title/Summary/Keyword: embedded computing

Search Result 537, Processing Time 0.045 seconds

Efficient Thread Allocation Method of Convolutional Neural Network based on GPGPU (GPGPU 기반 Convolutional Neural Network의 효율적인 스레드 할당 기법)

  • Kim, Mincheol;Lee, Kwangyeob
    • Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology
    • /
    • v.7 no.10
    • /
    • pp.935-943
    • /
    • 2017
  • CNN (Convolution neural network), which is used for image classification and speech recognition among neural networks learning based on positive data, has been continuously developed to have a high performance structure to date. There are many difficulties to utilize in an embedded system with limited resources. Therefore, we use GPU (General-Purpose Computing on Graphics Processing Units), which is used for general-purpose operation of GPU to solve the problem because we use pre-learned weights but there are still limitations. Since CNN performs simple and iterative operations, the computation speed varies greatly depending on the thread allocation and utilization method in the Single Instruction Multiple Thread (SIMT) based GPGPU. To solve this problem, there is a thread that needs to be relaxed when performing Convolution and Pooling operations with threads. The remaining threads have increased the operation speed by using the method used in the following feature maps and kernel calculations.

Analysis on Lightweight Methods of On-Device AI Vision Model for Intelligent Edge Computing Devices (지능형 엣지 컴퓨팅 기기를 위한 온디바이스 AI 비전 모델의 경량화 방식 분석)

  • Hye-Hyeon Ju;Namhi Kang
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.24 no.1
    • /
    • pp.1-8
    • /
    • 2024
  • On-device AI technology, which can operate AI models at the edge devices to support real-time processing and privacy enhancement, is attracting attention. As intelligent IoT is applied to various industries, services utilizing the on-device AI technology are increasing significantly. However, general deep learning models require a lot of computational resources for inference and learning. Therefore, various lightweighting methods such as quantization and pruning have been suggested to operate deep learning models in embedded edge devices. Among the lightweighting methods, we analyze how to lightweight and apply deep learning models to edge computing devices, focusing on pruning technology in this paper. In particular, we utilize dynamic and static pruning techniques to evaluate the inference speed, accuracy, and memory usage of a lightweight AI vision model. The content analyzed in this paper can be used for intelligent video control systems or video security systems in autonomous vehicles, where real-time processing are highly required. In addition, it is expected that the content can be used more effectively in various IoT services and industries.

A File System for User Special Functions using Speed-based Prefetch in Embedded Multimedia Systems (임베디드 멀티미디어 재생기에서 속도기반 미리읽기를 이용한 사용자기능 지원 파일시스템)

  • Choe, Tae-Young;Yoon, Hyeon-Ju
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.14 no.7
    • /
    • pp.625-635
    • /
    • 2008
  • Portable multimedia players have some different properties compared to general multimedia file server. Some of those properties are single user ownership, relatively low hardware performance, I/O burst by user special functions, and short software development cycles. Though suitable for processing multiple user requests at a time, the general multimedia file systems are not efficient for special user functions such as fast forwards/backwards. Soml' methods has been proposed to improve the performance and functionality, which the application programs give prediction hints to the file system. Unfortunately, they require the modification of all applications and recompilation. In this paper, we present a file system that efficiently supports user special functions in embedded multimedia systems using file block allocation, buffer-cache, and prefetch. A prefetch algorithm, SPRA (SPeed-based PRefetch Algorithm) predicts the next block using I/O patterns instead of hints from applications and it is resident in the file system, so doesn't affect application development process. From the experimental file system implementation and comparison with Linux readahead-based algorithms, the proposed system shows $4.29%{\sim}52.63%$ turnaround time and 1.01 to 3,09 times throughput in average.

Incremental and Retargetable Linker for Embedded System (내장형 시스템을 위한 점진적이고 목표 재설정 가능한 링커)

  • Wu, Deok-Kyun;Han, Kyung-Sook;Pyo, Chang-Woo;Kim, Heung-Nam
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.7 no.2
    • /
    • pp.183-192
    • /
    • 2001
  • In a development environment for embedded system with a connection between host and target system, the linker of host system links the cross-compiled object file and modules of target system and downloads the linked object file to the target system. In this research, we separate this linker into the module dependent on object file format and the module independent on object file format. The dependent module gets the linking information independent on file format from the object file, and the independent module actually does the linking process with this linking information_ This separation can improve the portability of development environment for a target system_ Also, our linker does the incremental remote linking that applies relocation not only to the obj ect file to be loaded but also to target's modules to have been loaded_ This incremental remote linking can reduce a linking time than linking by the united modules because of linking by module. The result of measuring linking time for SPEC95 integer benchmarks shows an average of reduction rates of 64.90%. Also, incremental remote linking can improve the comfortability of users who develop programs because users do not consider a downloading order of linking object files. Currently, we developed this linker in the embedded application development environment ESTO [1] to be prepared for a commercial product.

  • PDF

A Component-Based Framework for Structural Embedding of Mobile Agent System (모바일 에이전트 시스템의 구성적 임베딩을 위한 컴포넌트 기반의 프레임워크)

  • Chung, Wonho;Kang, Namhi
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.12 no.6
    • /
    • pp.33-42
    • /
    • 2012
  • Rapid evolution of wired and wireless technologies results in various types of embedded systems, and the software to be embedded into those devices now needs the flexibility rather than the fixedness which was well-known property for the embedded software in the past. Mobile agent is one of the useful distributed technologies of reducing network load and latency because of its disconnected operations and high asynchrony. In this paper, a component-based mobile agent framework, called EmHUMAN, is designed and implemented for structural embedding into the devices showing different functions and resource constraints. It consists of 3 layers of components. Based on those components, a structural embedding, considering resource constraints of required functions, amount of storage space, computing power, network bandwidth, ${\ldots} $ etc can be performed. The components in each layer can be extended with addition of new components, removing some components and modifying components. EmHUMAN plays the role of a framework for developing mobile agent based distributed systems. It is also a mobile agent system by itself. EmHUMAN provides several utilities as built-in API's, and thus high effectiveness in programming mobile agents can be achieved.

Color Media Instructions for Embedded Parallel Processors (임베디드 병렬 프로세서를 위한 칼라미디어 명령어 구현)

  • Kim, Cheol-Hong;Kim, Jong-Myon
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.35 no.7
    • /
    • pp.305-317
    • /
    • 2008
  • As a mobile computing environment is rapidly changing, increasing user demand for multimedia-over-wireless capabilities on embedded processors places constraints on performance, power, and sire. In this regard, this paper proposes color media instructions (CMI) for single instruction, multiple data (SIMD) parallel processors to meet the computational requirements and cost goals. While existing multimedia extensions store and process 48-bit pixels in a 32-bit register, CMI, which considers that color components are perceptually less significant, supports parallel operations on two-packed compressed 16-bit YCbCr (6 bit Y and 5 bits Cb, Cr) data in a 32-bit datapath processor. This provides greater concurrency and efficiency for YCbCr data processing. Moreover, the ability to reduce data format size reduces system cost. The reduction in data bandwidth also simplifies system design. Experimental results on a representative SIMD parallel processor architecture show that CMI achieves an average speedup of 6.3x over the baseline SIMD parallel processor performance. This is in contrast to MMX (a representative Intel's multimedia extensions), which achieves an average speedup of only 3.7x over the same baseline SIMD architecture. CMI also outperforms MMX in both area efficiency (a 52% increase versus a 13% increase) and energy efficiency (a 50% increase versus an 11% increase). CMI improves the performance and efficiency with a mere 3% increase in the system area and a 5% increase in the system power, while MMX requires a 14% increase in the system area and a 16% increase in the system power.

The Study on Development of a Digital Internet Radio Receiver (디지털 인터넷 라디오 수신기 구현에 대한 연구)

  • Park, In-Gyu
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.12 no.2
    • /
    • pp.102-110
    • /
    • 2006
  • This paper explains the design and development of the stand-alone high sound quality Internet Radio system, which is aimed for a small embedded type audio device rather than a general PC type. This device is designed to work with an Internet connection. This kind of system is not standardized so far, and also the related algorithm is not open to the public. So it is necessary to analyze several receiving algorithms of current radio receivers, and develop our own hardware in order to overcome these obstacles, finally to get the high quality of sound radio. The main electronic components of this Internet Radio are TCP/IP interfaces, an audio MP3 decoder, an I/O interface, and a Flash Memory Card with advanced audio multicasting for the next-generation Internet Radio. Basic structures and implementation issues of the next-generation most-versatile digital music player, and Internet Radio receivers, are discussed.

Refinement for Loops in Buffer-Overrun Abstract Interpretation (요약해석을 이용한 버퍼오버런 분석에서 루프 분석결과의 정교화)

  • Oh, Hak-Joo;Yi, Kwang-Keun
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.14 no.1
    • /
    • pp.111-115
    • /
    • 2008
  • We present a simple and effective method to reduce loop-related false alarms raised by buffer-overrun static program analyzer. Interval domain buffer-overrun analyzer raise many false alarms in analyzing programs that frequently use loops and arrays. Firstly, we classified patterns of loop-related false alarms for loop-intensive programs, such as embedded programs or mathematical libraries. After that we designed a simple and effective false alarm refiner, specialized for the loop-related false alarms we classified. After the normal analysis of program in which alarms considered as false. We implemented this method on our buffer-overrun analyzer with the result that our refinement method decreased the number of false alarms by 32% of total amount the analyzer reported.

Knowledge Distributed Robot Control Framework

  • Chong, Nak-Young;Hongu, Hiroshi;Ohba, Kohtaro;Hirai, Shigeoki;Tanie, Kazuo
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2003.10a
    • /
    • pp.1071-1076
    • /
    • 2003
  • In this work, we propose a new framework of robot control for a variety of applications to our unstructured everyday environments. Programming robots can be a very time-consuming process and seems almost impossible for ordinary end users. To cope with this, this work is to provide a software framework for building robot application programs automatically, where we have robots learn how to accomplish a commanded task from the object. An integrated sensing and computing tag is embedded into every single object in the environment. In the robot controller, only the basic software libraries for low-level robot motion control are provided from the robot manufacturer. The main contributions of this work is to develop a server platform that we call Omniscient Server that generates the application programs and send them to the robot controller through the network. The object-related information from the object server merges into robot control software to generate a detailed application program based on the task commands from the human. We have built a test bed and demonstrated that a robot can perform a common household task within the proposed framework.

  • PDF

Implementation of the low power platform for sensor network based IEEE 802.15.4 (IEEE 802.15.4 기반 센서 네트워크를 위한 저전력 실시간 플랫폼의 설계 및 구현)

  • Hwang, Tae-Ho;Song, Byung-Chul;Kim, Seong-Dong
    • Proceedings of the IEEK Conference
    • /
    • 2005.11a
    • /
    • pp.1145-1148
    • /
    • 2005
  • The sensor network that may be deemed to fall in the field of ubiquitous computing performs the basic function of transmitting sensing data through the autonomous sensing and the Ad hoc network. In order to collect and treat various sensing data at the time of application and manage extremely limited system resources, the sensor network requires the embedded operating system that uses low power, a small cord size and the least hardware resources. In this paper, The operating system having a new structure for constructing the IEEE 802. 15.4 MAC and Zigbee sensor network is suggested and can be formed by reviewing the characteristics and the core structural requirements of the operating system for the sensor network based on operating systems, which have been formed under existing similar conditions, and applying such features and core structural requirements to the design of the operating system for achieving the features and the requirements.

  • PDF