• Title/Summary/Keyword: 프로세서 코어

Search Result 312, Processing Time 0.028 seconds

Analysis of Programming Techniques for Creating Optimized CUDA Software (최적화된 CUDA 소프트웨어 제작을 위한 프로그래밍 기법 분석)

  • Kim, Sung-Soo;Kim, Dong-Heon;Woo, Sang-Kyu;Ihm, In-Sung
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.7
    • /
    • pp.775-787
    • /
    • 2010
  • Unlike general-purpose CPUs, the GPUs have been specialized as many-core streaming processors, and are frequently replacing the CPUs in an increasing range of computations thanks to their outstanding parallel computing capacity. In order to respond to such trend, NVIDIA has recently issued a new parallel computing architecture called CUDA(Compute Unified Device Architecture), offering a flexible GPU programming environment for GPGPU(General Purpose GPU) computing. In general, when programmers use the CUDA API, they should clearly understand many aspects of GPU's computing architecture to produce efficient parallel software. In this article, we explain several optimization techniques for CUDA programming that we have verified through a lot of experiment and trial and error, and review how those techniques affect the performance of code execution. In particular, we use a specific problem as an example to analyze several elements that affect performances, such as effective accesses to hierarchical memory system, processor occupancy, and latency hiding. In conclusion, we present several directions that may be utilized effectively in CUDA-based parallel programming.

Design and Inplementation of S/W for a Davinci-based Smart Camera (다빈치 기반 스마트 카메라 S/W 설계 및 구현)

  • Yu, Hui-Jse;Chung, Sun-Tae;Jung, Souhwan
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2008.05a
    • /
    • pp.116-120
    • /
    • 2008
  • Smart Camera provides intelligent vision functionalities which can interpret captured video, extract context-aware information and execute a necessary action in real-timeliness in addition to the functionality of network cameras which transmit the compressed acquired videos through networks. Intelligent vision algorithms demand tremendous computations so that real-time processing of computation of intelligent vision algorithms as well as compression and transmission of videos simultaneously is too much burden for a single CPU. Davinci processor of Texas Instruments is a popular ASSP(Application Specific Standard Product) which has dual core architecture of ARM core and DSP core and provides various I/O interfaces as well as networking interface and video acquiring interface necessary for developing digital video embedded applications. In this paper, we report the results of designing and implementing S/W for Davinci-based smart camera. We implement a face detection as an example of vision application and verify the implementation works well. In the future, for the development of a smart camera with more broad and real-time vision functionalities, it is necessary to study about more efficient vision application S/W architecture and optimization of vision algorithms on DSP core of Davichi processor.

  • PDF

Virtualized System Development Based on ERC32 Processor for Satellite Simulator (위성 시뮬레이터 개발을 위한 ERC32 프로세서 기반의 가상화 시스템 개발)

  • Choi, Jong-Wook;Shin, Hyun-Kyu;Lee, Jae-Seung;Cheon, Yee-Jin
    • Journal of Satellite, Information and Communications
    • /
    • v.6 no.1
    • /
    • pp.50-56
    • /
    • 2011
  • During the development of flight software, the processor emulator and satellite simulator are essential tools for software development and verification. SWT/KARI developed the software-based spacecraft simulator based on TSIM-ERC32 processor emulator from Aeroflex Gaisler. But when developing flight software using TSIM-ERC32, there are much limitation for understanding of exact behavior of ERC32 processor, and it is impossible to change or modify the emulator core to develop the satellite simulator. To reslove this problem, this paper presents the development of new cycle-true ERC32 emulator as laysim-erc32 and describes the software development and debugging method on VxWorks/RTEMS RTOS.

Programming Model for SODA-II: a Baseband Processor for Software Defined Radio Systems (SDR용 기저대역 프로세서를 위한 프로그래밍 모델)

  • Lee, Hyun-Seok;Yi, Joon-Hwan;Oh, Hyuk-Jun
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.47 no.7
    • /
    • pp.78-86
    • /
    • 2010
  • This paper discusses the programming model of SODA-II that is a baseband processor for software defined radio (SDR) systems. Signal processing On-Demand Architecture Ⅱ (SODA-II) is an on-chip multiprocessor architecture consisting of four processor cores and each core has both an wide SIMD datapath and a scalar datapath. This architecture is appropriate for baseband processing that is a mixture of vector computations and scalar computations. The programming model of the SODA-II is based on C library routines. Because the library routines hide the details of complex SIMD datapath control procedures, end users can easily program the SODA-II without deep understanding on its architecture. In this paper, we discuss the details of library routines and how these routines are exploited in the implementation of baseband signal processing algorithms. As application examples, we show the implementation result of W-CDMA multipath searcher and OFDM demodulator on the SODA-II.

An Implementation of CAN Communication Interface using the Embedded Processor System based on FPGA (FPGA 기반의 임베디드 프로세서 시스템을 이용한 CAN 통신 인터페이스 구현)

  • Koo, Tae-Mook;Park, Young-Seak
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.11 no.1
    • /
    • pp.53-62
    • /
    • 2010
  • Recently, various industrial embedded systems including vehicles controlled electronically are evolving to distributed multi-micro controller system. Accordingly, there is a need for standard CAN(Controller Area Network) protocol that ensures high stability and reliability of communication and is simple to construct object-oriented system with high control efficiency. CAN communication interface used general-purpose processor doesn't have many limitations in various application development because of fixed hardware architecture. This paper design and implement a CAN communication interface system based on FPGA. It is verified function and performance of system through monitoring communication with existing AT90CAN128 controller. Implemented CAN communication interface can be reused in development of application systems based on FPGA. And it provides low-cost, small-size and low-power design advantages.

Hardware Implementation of Motor Controller Based on Zynq EPP(Extensible Processing Platform) (Zynq EPP를 이용한 모터 제어기의 하드웨어 구현)

  • Moon, Yong-Seon;Lim, Seung-Woo;Lee, Young-Pil;Bae, Young-Chul
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.8 no.11
    • /
    • pp.1707-1712
    • /
    • 2013
  • In this paper, we implement a hardware for motor control based on FPGA + embedded processor using Zynq EPP which is All Programmable SoC in order to improve a structural problem of motion control based on such as DSP, MCU and FPGA previously. The implemented motor controller that is fused controller with advantage of FPGA and embedded processor. The signal processing part of high velocity motor control is performed by motor controller based on FPGA. A motion profile and kinematic calculation that are required algorithm process such as operation of a complicate decimal point has processed in an embedded processor based on dual core. As a result of a hardware implementation, it has an advantage that has can be realized an effect of distribution process in one chip. It has also an advantage that is able to organize as a multi-axis motor controller through adding the IP core of motor control implemented on FPGA.

A Design of Authentication/Security Processor IP for Wireless USB (무선 USB 인증/보안용 프로세서 IP 설계)

  • Yang, Hyun-Chang;Shin, Kyung-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.12 no.11
    • /
    • pp.2031-2038
    • /
    • 2008
  • A small-area and high-speed authentication/security processor (WUSB_Sec) IP is designed, which performs the 4-way handshake protocol for authentication between host and device, and data encryption/decryption of wireless USB system. The PRF-256 and PRF-64 are implemented by CCM (Counter mode with CBC-MAC) operation, and the CCM is designed with two AES (Advanced Encryption Standard) encryption coles working concurrently for parallel processing of CBC mode and CTR mode operations. The AES core that is an essential block of the WUSB_Sec processor is designed by applying composite field arithmetic on AF$(((2^2)^2)^2)$. Also, S-Box sharing between SubByte block and key scheduler block reduces the gate count by 10%. The designed WUSB_Sec processor has 25,000 gates and the estimated throughput rate is about 480Mbps at 120MHz clock frequency.

A Design of a Shader Processor based on a dual-phase pipeline architecture (듀얼 페이즈 명령어 파이프라인구조의 쉐이더 프로세서 설계)

  • Jeong, Hyung-Ki;Nam, Ki-Hun;Lee, Gwang-Yeob
    • Journal of IKEEE
    • /
    • v.12 no.4
    • /
    • pp.246-254
    • /
    • 2008
  • This paper represents a design of a 4 way SIMD processor with multi-thread and dual phase instruction pipeline. 8 threads can be performing in round-robin order, so any hazards can’t occur. The dual phase pipeline makes a pipeline operate as two pipelines, and it can fetch maximum 4 unit instructions at once. This variable length instruction set divide into first phase and second phase instructions, and with this function, complex branch and addressing can be executed at one clock cycle. This processor reduces the code size to quarter, pull out the doubled performance improvement than normal SIMD architecture.

  • PDF

Implementation of a Branch Predictor and Its Cost Per Performance Analysis for a High Performance Embedded Microprocessor (고성능 내장형 마이크로프로세서의 분기 예측기 구현 및 성능 대비 비용 분석)

  • Shin, Sang-Hoon;Choi, Lynn
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2003.10a
    • /
    • pp.202-204
    • /
    • 2003
  • EISC ISA를 기반으로 한 64 비트 고성능 내장형 마이크로프로세서 AE64000의 효과적인 성능 향상을 위해서 비용 대비 성능 향상이 우수한 분기 예측 기법을 도입하여 AE64000 파이프라인에 적합한 분기 예측기를 추가로 설계하고 SPEClnt 벤치마크 및 타 내장형 벤치마크의 성능 분석 시뮬레이션을 통해 최적의 분기 예측기의 구조를 결정하였다. AE64000에서 LERI 명령 처리를 위해 AE64000 파이프라인에 추가된 독특한 IFU에 의하여 복잡성을 갖지만, IF 단계의 PC 대신에 IFU 단계의 PrePC를 이용하여 분기 명령을 명령어 prefetch 단계에서 예측함으로써, 올바른 분기 예측시 분기로 인한 손실을 제거할 수 있다. 결과적으로 최종 선정된 최적의 분기 예측기는 Verilog로 구현하여 AE64000 프로세서 코어 모델과 통합 합성하였고 아울러 추가되는 면적과 최종 목표 클럭에 동작하기 위한 타이밍 분석을 통해 최종 생산에 적합하도록 설계된 분기 예측기의 기능 및 타이밍 검증을 수행하였다. 최종 구현된 분기 예측기는 프로세서 칩 전체의 1% 미만의 비용으로 최고 12%의 성능 향상을 달성하여 성능 대비 면적의 효율성에서 높은 결과를 보였다.

  • PDF

An Implementation of Linux Device Drivers of Nios II Embedded Processor System for Image Surveillance System (영상 감시 시스템을 위한 Nios II 임베디드 프로세서 시스템의 Linux 디바이스 드라이버 구현)

  • Kim, Dong-Jin;Jung, Young-Bee;Kim, Tae-Hyo;Park, Young-Seak
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.20 no.3
    • /
    • pp.362-367
    • /
    • 2010
  • In this paper, we describe implementation of FPGA-based Nios II embedded processor system and linux device driver for image monitoring system which is supplement weakness for fixed surveillance area of existing CCTV system and by manual operation of the camera's moving. Altera Nios II processor 8.0 is supported MMU which is stable and efficient managed memory. We designed the image monitoring and control system by using Altera Nios II soft-core processor system which is flexible in various application and excellent adaptability. By implementation of camera device driver and VGA decvice driver for Linux-based Nios II system, we implemented image serveillance system for Nios II embedded processor system.