• Title/Summary/Keyword: 다중 코어 프로세서

Search Result 39, Processing Time 0.024 seconds

Design Space Exploration of Many-Core Processor for High-Speed Cluster Estimation (고속의 클러스터 추정을 위한 매니코어 프로세서의 디자인 공간 탐색)

  • Seo, Jun-Sang;Kim, Cheol-Hong;Kim, Jong-Myon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.10
    • /
    • pp.1-12
    • /
    • 2014
  • This paper implements and improves the performance of high computational subtractive clustering algorithm using a single instruction, multiple data (SIMD) based many-core processor. In addition, this paper implements five different processing element (PE) architectures (PEs=16, 64, 256, 1,024, 4,096) to select an optimal PE architecture for the subtractive clustering algorithm by estimating execution time and energy efficiency. Experimental results using two different medical images and three different resolutions ($128{\times}128$, $256{\times}256$, $512{\times}512$) show that PEs=4,096 achieves the highest performance and energy efficiency for all the cases.

A Real-Time Scheduling Technique on Multi-Core Systems for Multimedia Multi-Streaming (다중 멀티미디어 스트리밍을 위한 멀티코어 시스템 기반의 실시간 스케줄링 기법)

  • Park, Sang-Soo
    • Journal of Korea Multimedia Society
    • /
    • v.14 no.11
    • /
    • pp.1478-1490
    • /
    • 2011
  • Recently, multi-core processors have been drawing significant interest from the embedded systems research and industry communities due mainly to their potential for achieving high performance and fault-tolerance at low cost in such products as automobiles and cell phones. To process multimedia data, a scheduling algorithm is required to meet timing constraints of periodic tasks in the system. Though Pfair scheduling algorithm can meet all the timing constraints while achieving 100% utilization on multi-core based system theoretically, however, the algorithm incurs high scheduling overheads including frequent core migrations and system-wide synchronizations. To mitigate the problems, we propose a real-time scheduling algorithm for multi-core based system so that system-wide scheduling is performed only when it is absolutely necessary. Otherwise the proposed algorithm performs scheduling within each core independently. The experimental results by extensive simulations show that the proposed algorithm dramatically reduces the scheduling overheads up to as negligible one when the utilization is under 80%.

Design and Implementation of Multi-View 3D Video Player (다시점 3차원 비디오 재생 시스템 설계 및 구현)

  • Heo, Young-Su;Park, Gwang-Hoon
    • Journal of Broadcast Engineering
    • /
    • v.16 no.2
    • /
    • pp.258-273
    • /
    • 2011
  • This paper designs and implements a multi-view 3D video player system which is operated faster than existing video player systems. The structure for obtaining the near optimum speed in a multi-processor environment by parallelizing the component modules is proposed to process large volumes of multi-view image data at high speed. In order to use the concurrency of bottleneck, we designed image decoding, synthesis and rendering modules in a pipeline structure. For load balancing, the decoder module is divided into the unit of viewpoint, and the image synthesis module is geometrically divided based on synthesized images. As a result of this experiment, multi-view images were correctly synthesized and the 3D sense could be felt when watching the images on the multi-view autostereoscopic display. The proposed application processing structure could be used to process large volumes of multi-view image data at high speed, using the multi-processors to their maximum capacity.

Programming Model for SODA-II: a Baseband Processor for Software Defined Radio Systems (SDR용 기저대역 프로세서를 위한 프로그래밍 모델)

  • Lee, Hyun-Seok;Yi, Joon-Hwan;Oh, Hyuk-Jun
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.47 no.7
    • /
    • pp.78-86
    • /
    • 2010
  • This paper discusses the programming model of SODA-II that is a baseband processor for software defined radio (SDR) systems. Signal processing On-Demand Architecture Ⅱ (SODA-II) is an on-chip multiprocessor architecture consisting of four processor cores and each core has both an wide SIMD datapath and a scalar datapath. This architecture is appropriate for baseband processing that is a mixture of vector computations and scalar computations. The programming model of the SODA-II is based on C library routines. Because the library routines hide the details of complex SIMD datapath control procedures, end users can easily program the SODA-II without deep understanding on its architecture. In this paper, we discuss the details of library routines and how these routines are exploited in the implementation of baseband signal processing algorithms. As application examples, we show the implementation result of W-CDMA multipath searcher and OFDM demodulator on the SODA-II.

An Implementation of CAN Communication Interface using the Embedded Processor System based on FPGA (FPGA 기반의 임베디드 프로세서 시스템을 이용한 CAN 통신 인터페이스 구현)

  • Koo, Tae-Mook;Park, Young-Seak
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.11 no.1
    • /
    • pp.53-62
    • /
    • 2010
  • Recently, various industrial embedded systems including vehicles controlled electronically are evolving to distributed multi-micro controller system. Accordingly, there is a need for standard CAN(Controller Area Network) protocol that ensures high stability and reliability of communication and is simple to construct object-oriented system with high control efficiency. CAN communication interface used general-purpose processor doesn't have many limitations in various application development because of fixed hardware architecture. This paper design and implement a CAN communication interface system based on FPGA. It is verified function and performance of system through monitoring communication with existing AT90CAN128 controller. Implemented CAN communication interface can be reused in development of application systems based on FPGA. And it provides low-cost, small-size and low-power design advantages.

OpenGL ES 2.0 based Shader Compilation Method for the Instruction-Level Parallelism (OpenGL ES 2.0 기반 셰이더 명령어 병렬 처리를 위한 컴파일 기법)

  • Kim, Jong-Ho;Kim, Tae-Young
    • Journal of Korea Game Society
    • /
    • v.8 no.2
    • /
    • pp.69-76
    • /
    • 2008
  • In this paper, we present the architecture of graphics processor and its instruction format for the mobile device. In addition, we introduce tile shader data structure for the on/off-line compilation based on the OpenGL ES 2.0 and a new optimization method based on the ILP(Instruction-Level Parallelism). This paper shows where a processor with the sane core clock is being used, the shader instruction resulted from the compile structure and method in this paper is approximately 1.5 to 2 times faster than a code based on the single instruction.

  • PDF

Implementation of Multicore-Aware Load Balancing on Clusters through Data Distribution in Chapel (클러스터 상에서 다중 코어 인지 부하 균등화를 위한 Chapel 데이터 분산 구현)

  • Gu, Bon-Gen;Carpenter, Patrick;Yu, Weikuan
    • The KIPS Transactions:PartA
    • /
    • v.19A no.3
    • /
    • pp.129-138
    • /
    • 2012
  • In distributed memory architectures like clusters, each node stores a portion of data. How data is distributed across nodes influences the performance of such systems. The data distribution scheme is the strategy to distribute data across nodes and realize parallel data processing. Due to various reasons such as maintenance, scale up, upgrade, etc., the performance of nodes in a cluster can often become non-identical. In such clusters, data distribution without considering performance cannot efficiently distribute data on nodes. In this paper, we propose a new data distribution scheme based on the number of cores in nodes. We use the number of cores as the performance factor. In our data distribution scheme, each node is allocated an amount of data proportional to the number of cores in it. We implement our data distribution scheme using the Chapel language. To show our data distribution is effective in reducing the execution time of parallel applications, we implement Mandelbrot Set and ${\pi}$-Calculation programs with our data distribution scheme, and compare the execution times on a cluster. Based on experimental results on clusters of 8-core and 16-core nodes, we demonstrate that data distribution based on the number of cores can contribute to a reduction in the execution times of parallel programs on clusters.

Acceleration of Intrusion Detection for Multi-core Video Surveillance Systems (멀티 코어 프로세서 기반의 영상 감시 시스템을 위한 침입 탐지 처리의 가속화)

  • Lee, Gil-Beom;Jung, Sang-Jin;Kim, Tae-Hwan;Lee, Myeong-Jin
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.12
    • /
    • pp.141-149
    • /
    • 2013
  • This paper presents a high-speed intrusion detection process for multi-core video surveillance systems. The high-speed intrusion detection was designed to a parallel process. Based on the analysis of the conventional process, a parallel intrusion detection process was proposed so as to be accelerated by utilizing multiple processing cores in contemporary computing systems. The proposed process performs the intrusion detection in a per-frame parallel manner, considering the data dependency between frames. The proposed process was validated by implementing a multi-threaded intrusion detection program. For the system having eight processing cores, the detection speed of the proposed program is higher than that of the conventional one by up to 353.76% in terms of the frame rate.

A 8192-point pipelined FFT/IFFT processor using two-step convergent block floating-point scaling technique (2단계 수렴 블록 부동점 스케일링 기법을 이용한 8192점 파이프라인 FFT/IFFT 프로세서)

  • 이승기;양대성;신경욱
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.27 no.10C
    • /
    • pp.963-972
    • /
    • 2002
  • An 8192-point pipelined FFT/IFFT processor core is designed, which can be used in multi-carrier modulation systems such as DUf-based VDSL modem and OFDM-based DVB system. In order to improve the signal-to-quantization-noise ratio (SQNR) of FFT/IFFT results, two-step convergent block floating-point (TS_CBFP) scaling is employed. Since the proposed TS_CBFP scaling does not require additional buffer memory, it reduces memory as much as about 80% when compared with conventional CBFP methods, resulting in area-and power-efficient implementation. The SQNR of about 60-㏈ is achieved with 10-bit input, 14-bit internal data and twiddle factors, and 16-bit output. The core synthesized using 0.25-$\mu\textrm{m}$ CMOS library has about 76,300 gates, 390K bits RAM, and twiddle factor ROM of 39K bits. Simulation results show that it can safely operate up to 50-㎒ clock frequency at 2.5-V supply, resulting that a 8192-point FFT/IFFT can be computed every 164-${\mu}\textrm{s}$. It was verified by Xilinx FPGA implementation.

Optimal Design Space Exploration of Multi-core Architecture for Real-time Lane Detection Algorithm (실시간 차선인식 알고리즘을 위한 최적의 멀티코어 아키텍처 디자인 공간 탐색)

  • Jeong, Inkyu;Kim, Jongmyon
    • Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology
    • /
    • v.7 no.3
    • /
    • pp.339-349
    • /
    • 2017
  • This paper proposes a four-stage algorithm for detecting lanes on a driving car. In the first stage, it extracts region of interests in an image. In the second stage, it employs a median filter to remove noise. In the third stage, a binary algorithm is used to classify two classes of backgrond and foreground of an input image. Finally, an image erosion algorithm is utilized to obtain clear lanes by removing noises and edges remained after the binary process. However, the proposed lane detection algorithm requires high computational time. To address this issue, this paper presents a parallel implementation of a real-time line detection algorithm on a multi-core architecture. In addition, we implement and simulate 8 different processing element (PE) architectures to select an optimal PE architecture for the target application. Experimental results indicate that 40×40 PE architecture show the best performance, energy efficiency and area efficiency.