• Title/Summary/Keyword: SIMD Architecture

Search Result 60, Processing Time 0.026 seconds

Optimal Economic Load Dispatch using Parallel Genetic Algorithms in Large Scale Power Systems (병렬유전알고리즘을 응용한 대규모 전력계통의 최적 부하배분)

  • Kim, Tae-Kyun;Kim, Kyu-Ho;Yu, Seok-Ku
    • The Transactions of the Korean Institute of Electrical Engineers A
    • /
    • v.48 no.4
    • /
    • pp.388-394
    • /
    • 1999
  • This paper is concerned with an application of Parallel Genetic Algorithms(PGA) to optimal econmic load dispatch(ELD) in power systems. The ELD problem is to minimize the total generation fuel cost of power outputs for all generating units while satisfying load balancing constraints. Genetic Algorithms(GA) is a good candidate for effective parallelization because of their inherent principle of evolving in parallel a population of individuals. Each individual of a population evaluates the fitness function without data exchanges between individuals. In application of the parallel processing to GA, it is possible to use Single Instruction stream, Multiple Data stream(SIMD), a kind of parallel system. The architecture of SIMD system need not data communications between processors assigned. The proposed ELD problem with C code is implemented by SIMSCRIPT language for parallel processing which is a powerfrul, free-from and versatile computer simulation programming language. The proposed algorithms has been tested for 38 units system and has been compared with Sequential Quadratic programming(SQP).

  • PDF

Design of a scalable general-purpose parallel associative processor using content-addressable memory (Content-Addressable Memory를 이용한 확장 가능한 범용 병렬 Associative Processor 설계)

  • Park, Tae-Geun
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.43 no.2 s.344
    • /
    • pp.51-59
    • /
    • 2006
  • Von Neumann architecture suffers from the interface between the central processing unit and the memory, which is called 'Von Neumann bottleneck' In this paper, we propose a scalable general-purpose associative processor (AP) based on content-addressable memory (CAM) which solves this problem and is suitable for the search-oriented applications. We propose an efficient instruction set and a structural scalability to extend for larger applications. We define twelve instructions and provide some reduced instructions to speed up which execute two instructions in a single instruction cycle. The proposed AP performs in a bit-serial, word-parallel fashion and can be considered as a 32-bit general-purpose parallel processor with a massively parallel SIMD structure. We design and simulate a maximum/minumum search greater-than/less-than search, and parallel addition to verify the proposed architecture. The algorithms are executed in a constant time O(k) regardless of the number of input data.

Performance Evaluation and Verification of MMX-type Instructions on an Embedded Parallel Processor (임베디드 병렬 프로세서 상에서 MMX타입 명령어의 성능평가 및 검증)

  • Jung, Yong-Bum;Kim, Yong-Min;Kim, Cheol-Hong;Kim, Jong-Myon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.10
    • /
    • pp.11-21
    • /
    • 2011
  • This paper introduces an SIMD(Single Instruction Multiple Data) based parallel processor that efficiently processes massive data inherent in multimedia. In addition, this paper implements MMX(MultiMedia eXtension)-type instructions on the data parallel processor and evaluates and analyzes the performance of the MMX-type instructions. The reference data parallel processor consists of 16 processors each of which has a 32-bit datapath. Experimental results for a JPEG compression application with a 1280x1024 pixel image indicate that MMX-type instructions achieves a 50% performance improvement over the baseline instructions on the same data parallel architecture. In addition, MMX-type instructions achieves 100% and 51% improvements over the baseline instructions in energy efficiency and area efficiency, respectively. These results demonstrate that multimedia specific instructions including MMX-type have potentials for widely used many-core GPU(Graphics Processing Unit) and any types of parallel processors.

A Study of Printed Score Recognition and its Parallel Algorithm (인쇄 악보의 인식과 병렬 알고리즘에 관한 연구)

  • 황영길;김성천
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.19 no.5
    • /
    • pp.959-970
    • /
    • 1994
  • In this thesis, a printed score is read by using handy scanner and the recognition process is excuted in parallel, finally, on Mesh-Connected Computer. What is read is classified into certain patterns and is recognized, based on knowledge. The preprocessing steps are minimized and simple operations are used in the algorithm proposed in this thesis. The score symbols on a printed score can be recognized irrespective of their sizes but their diversity males it difficult to recognize them all, so it is programmed so as to recognize some symbols that is used necessarily and frequently. The recognized result is transformed into the MIDI standard file format. It is required to use a parallel processing system with multiprocessors because the high speed image processing is required. A digitized two-dimensional image is appropriate in processing on the SIMD Mesh-Connected Computer(MCC). Therefore, we explain this architecture and present parallel algorithm using SIMD MCC with n processors that achieves time complexity0(n).

  • PDF

Design of a Graphic Processor for Multimedia Data Processing (멀티미디어 데이타 처리를 위한 그래픽 프로세서 설계)

  • 고익상;한우종;선우명동
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.36C no.10
    • /
    • pp.56-65
    • /
    • 1999
  • This paper presents an architecture and its instruction set for a graphic coprocessor(GCP) which can be used for a multimedia server. The proposed instruction set employs parallel architecture concepts, such as SIMD and Superscalar. GCP consists of a scheduler and four functional units. The scheduler solves an instruction bottleneck problem causing by sharing with four general processors(GPs). GCP can execute up to 4 instructions in parallel. It consists of about 56,000 gates and operates at 30 MHz clock frequency due to speed limitation of SOG technology. GCP meets the real-time DCT algorithm requirement of the CIF image format and can process up to 63 frames/sec for the DCT Algorithm and 21 frames/sec for the Full Block matching Algorithm of the CIF image format.

  • PDF

Novel Parallel Approach for SIFT Algorithm Implementation

  • Le, Tran Su;Lee, Jong-Soo
    • Journal of information and communication convergence engineering
    • /
    • v.11 no.4
    • /
    • pp.298-306
    • /
    • 2013
  • The scale invariant feature transform (SIFT) is an effective algorithm used in object recognition, panorama stitching, and image matching. However, due to its complexity, real-time processing is difficult to achieve with current software approaches. The increasing availability of parallel computers makes parallelizing these tasks an attractive approach. This paper proposes a novel parallel approach for SIFT algorithm implementation using a block filtering technique in a Gaussian convolution process on the SIMD Pixel Processor. This implementation fully exposes the available parallelism of the SIFT algorithm process and exploits the processing and input/output capabilities of the processor, which results in a system that can perform real-time image and video compression. We apply this implementation to images and measure the effectiveness of such an approach. Experimental simulation results indicate that the proposed method is capable of real-time applications, and the result of our parallel approach is outstanding in terms of the processing performance.

Design and Verification of High-Performance Parallel Processor Hardware for JPEG Encoder (JPEG 인코더를 위한 고성능 병렬 프로세서 하드웨어 설계 및 검증)

  • Kim, Yong-Min;Kim, Jong-Myon
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.6 no.2
    • /
    • pp.100-107
    • /
    • 2011
  • As the use of mobile multimedia devices is increasing in the recent year, the needs for high-performance multimedia processors are increasing. In this regard, we propose a SIMD (Single Instruction Multiple Data) based parallel processor that supports high-performance multimedia applications with low energy consumption. The proposed parallel processor consists of 16 processing elements(PEs) and operates on a 3-stage pipelining. Experimental results for the JPEG encoding algorithm indicate that the proposed parallel processor outperforms conventional parallel processors in terms of performance and energy efficiency. In addition, the proposed parallel processor architecture was developed and verified with verilog HDL and a FPGA prototype system.

A reconfigurable modular approach for digital neural network (디지털 신경회로망의 하드웨어 구현을 위한 재구성형 모듈러 디자인의 적용)

  • Yun, Seok-Bae;Kim, Young-Joo;Dong, Sung-Soo;Lee, Chong-Ho
    • Proceedings of the KIEE Conference
    • /
    • 2002.07d
    • /
    • pp.2755-2757
    • /
    • 2002
  • In this paper, we propose a now architecture for hardware implementation of digital neural network. By adopting flexible ladder-style bus and internal connection network into traditional SIMD-type digital neural network architecture, the proposed architecture enables fast processing that is based on parallelism, while does not abandon the flexibility and extensibility of the traditional approach. In the proposed architecture, users can change the network topology by setting configuration registers. Such reconfigurability on hardware allows enough usability like software simulation. We implement the proposed design on real FPGA, and configure the chip to multi-layer perceptron with back propagation for alphabet recognition problem. Performance comparison with its software counterpart shows its value in the aspect of performance and flexibility.

  • PDF

Performance Analysis of Implementation on Image Processing Algorithm for Multi-Access Memory System Including 16 Processing Elements (16개의 처리기를 가진 다중접근기억장치를 위한 영상처리 알고리즘의 구현에 대한 성능평가)

  • Lee, You-Jin;Kim, Jea-Hee;Park, Jong-Won
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.49 no.3
    • /
    • pp.8-14
    • /
    • 2012
  • Improving the speed of image processing is in great demand according to spread of high quality visual media or massive image applications such as 3D TV or movies, AR(Augmented reality). SIMD computer attached to a host computer can accelerate various image processing and massive data operations. MAMS is a multi-access memory system which is, along with multiple processing elements(PEs), adequate for establishing a high performance pipelined SIMD machine. MAMS supports simultaneous access to pq data elements within a horizontal, a vertical, or a block subarray with a constant interval in an arbitrary position in an $M{\times}N$ array of data elements, where the number of memory modules(MMs), m, is a prime number greater than pq. MAMS-PP4 is the first realization of the MAMS architecture, which consists of four PEs in a single chip and five MMs. This paper presents implementation of image processing algorithms and performance analysis for MAMS-PP16 which consists of 16 PEs with 17 MMs in an extension or the prior work, MAMS-PP4. The newly designed MAMS-PP16 has a 64 bit instruction format and application specific instruction set. The author develops a simulator of the MAMS-PP16 system, which implemented algorithms can be executed on. Performance analysis has done with this simulator executing implemented algorithms of processing images. The result of performance analysis verifies consistent response of MAMS-PP16 through the pyramid operation in image processing algorithms comparing with a Pentium-based serial processor. Executing the pyramid operation in MAMS-PP16 results in consistent response of processing time while randomly response time in a serial processor.

Hardware Design and Implementation of a Parallel Processor for High-Performance Multimedia Processing (고성능 멀티미디어 처리용 병렬프로세서 하드웨어 설계 및 구현)

  • Kim, Yong-Min;Hwang, Chul-Hee;Kim, Cheol-Hong;Kim, Jong-Myon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.5
    • /
    • pp.1-11
    • /
    • 2011
  • As the use of mobile multimedia devices is increasing in the recent year, the needs for high-performance multimedia processors are increasing. In this regard, we propose a SIMD (Single Instruction Multiple Data) based parallel processor that supports high-performance multimedia applications with low energy consumption. The proposed parallel processor consists of 16 processing elements (PEs) and operates on a 3-stage pipelining. Experimental results indicated that the proposed parallel processor outperforms conventional parallel processors in terms of performance. In addition, our proposed parallel processor outperforms commercial high-performance TI C6416 DSP in terms of performance (1.4-31.4x better) and energy efficiency (5.9-8.1x better) with same 130nm technology and 720 clock frequency. The proposed parallel processor was developed with verilog HDL and verified with a FPGA prototype system.