• Title/Summary/Keyword: Multiprocessor

Search Result 315, Processing Time 0.021 seconds

A Study on the Interframe Image Coding Using Motion Compensated and Classified Vector Quantizer (Ⅱ : Hardware Implementation) (이동 보상과 분류 벡터 양자화기를 이용한 영상 부호화에 관한 연구 (Ⅱ: 하드웨어 실현))

  • Jeon, Joong-Nam;Shin, Tae-Min;Choi, Sung-Nam;Park, Kyu-Tae
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.27 no.3
    • /
    • pp.21-30
    • /
    • 1990
  • This paper describes a hardware implementation of the interframe monochrome video CODEC using a MC-CVQ(Motion Compensated and Classified Vector Quantization) algorithm. The specifications of this CODEC are (1) the resolution of image is $128{\times}128$ pixels, and (2) the transmission rates are about 10frames/sec at the 64Kbps channel. In order to design the CODEC under these conditions, it is implemented by a multiprocessor system composed of MC unit, CVQ nuit and decoder unit, which are controlled by microprogramming technique. And the 3~stage pipelined ALU(Arithmetic and Logic Unit) is adopted to calculate the minimum error distance in the MC unit and CVQ nuit. The realized system shows that the transmission rates are 6-15 frames/sec according to the relative motion of the video signal.

  • PDF

Separated Address/Data Network Design for Bus Protocol compatible Network-on-Chip (버스 프로토콜 호환 가능한 네트워크-온-칩에서의 분리된 주소/데이터 네트워크 설계)

  • Chung, Seungh Ah;Lee, Jae Hoon;Kim, Sang Heon;Lee, Jae Sung;Han, Tae Hee
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.53 no.4
    • /
    • pp.68-75
    • /
    • 2016
  • As the number of cores and IPs increase in multiprocessor system-on-chip (MPSoC), network-on-chip (NoC) has emerged as a promising novel interconnection architecture for its parallelism and scalability. However, minimization of the latency in NoC with legacy bus IPs must be addressed. In this paper, we focus on the latency minimization problem in NoC which accommodates legacy bus protocol based IPs considering the trade-offs between hop counts and path collisions. To resolve this problem, we propose separated address/data network for independent address and data phases of bus protocol. Compared to Mesh and irregular topologies generated by TopGen, experimental results show that average latency and execution time are reduced by 19.46% and 10.55%, respectively.

Programming Model for SODA-II: a Baseband Processor for Software Defined Radio Systems (SDR용 기저대역 프로세서를 위한 프로그래밍 모델)

  • Lee, Hyun-Seok;Yi, Joon-Hwan;Oh, Hyuk-Jun
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.47 no.7
    • /
    • pp.78-86
    • /
    • 2010
  • This paper discusses the programming model of SODA-II that is a baseband processor for software defined radio (SDR) systems. Signal processing On-Demand Architecture Ⅱ (SODA-II) is an on-chip multiprocessor architecture consisting of four processor cores and each core has both an wide SIMD datapath and a scalar datapath. This architecture is appropriate for baseband processing that is a mixture of vector computations and scalar computations. The programming model of the SODA-II is based on C library routines. Because the library routines hide the details of complex SIMD datapath control procedures, end users can easily program the SODA-II without deep understanding on its architecture. In this paper, we discuss the details of library routines and how these routines are exploited in the implementation of baseband signal processing algorithms. As application examples, we show the implementation result of W-CDMA multipath searcher and OFDM demodulator on the SODA-II.

Embedded Multithreading Processor Architecture for Personal Information Devices (개인용 정보 단말장치를 위한 내장형 멀티스레딩 프로세서 구조)

  • Jeong, Ha-Young;Chung, Won-Young;Lee, Yong-Surk
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.47 no.9
    • /
    • pp.7-13
    • /
    • 2010
  • In this paper, we proposed a processor architecture that is suitable for next generation embedded applications, especially for personal information devices such as smart phones, tablet PC. Latest high performance embedded processors are developed to achieve high clock speed. Because increasing performance makes design more difficult and induces large overhead, architectural evolution in embedded processor field is necessary. Among more enhanced processor types, out-of-order superscalar cannot be a candidate for embedded applications due to its excessive complexity and relatively low performance gain compared to its overhead. Therefore, new architecture with moderate complexity must be designed. In this paper, we developed a low-cost SMT architecture model and compared its performance to other architectures including scalar, superscalar and multiprocessor. Because current personal information devices have a tendency to execute multiple tasks simultaneously, SMT or CMP can be a good choice. And our simulation result shows that the efficiency of SMT is the best among the architectures considered.

Application Behavior-oriented Adaptive Remote Access Cache in Ring based NUMA System (링 구조 NUMA 시스템에서 적응형 다중 그레인 원격 캐쉬 설계)

  • 곽종욱;장성태;전주식
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.30 no.9
    • /
    • pp.461-476
    • /
    • 2003
  • Due to the implementation ease and alleviation of memory bottleneck effect, NUMA architecture has dominated in the multiprocessor systems for the past several years. However, because the NUMA system distributes memory in each node, frequent remote memory access is a key factor of performance degradation. Therefore, efficient design of RAC(Remote Access Cache) in NUMA system is critical for performance improvement. In this paper, we suggest Multi-Grain RAC which can adaptively control the RAC line size, with respect to each application behavior Then we simulate NUMA system with multi-grain RAC using MINT, event-driven memory hierarchy simulator. and analyze the performance results. At first, with profile-based determination method, we verify the optimal RAC line size for each application and, then, we compare and analyze the performance differences among NUMA systems with normal RAC, with optimal line size RAC, and with multi-grain RAC. The simulation shows that the worst case can be always avoided and results are very close to optimal case with any combination of application and RAC format.

Scalable CC-NUMA System using Repeater Node (리피터 노드를 이용한 Scalable CC-NUMA 시스템)

  • Kyoung, Jin-Mi;Jhang, Seong-Tae
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.29 no.9
    • /
    • pp.503-513
    • /
    • 2002
  • Since CC-NUMA architecture has to access remote memory, the interconnection network determines the performance of the CC-NUMA system. Bus which has been used as a popular interconnection network has many limits in a large-scale system because of the limited physical scalability and bandwidth. The dual ring interconnection network, composed of high-speed point-to-point links, is made to resolve the defects of the bus for the large-scale system. However, it also has a problem, in that the response latency is rapidly increased when many nodes are attached to the snooping based CC-NUMA system with the dual ring. In this paper, we propose a ring architecture with repeater nodes in order to overcome the problem of the dual ring on a snooping based CC-NUMA system, and design a repeater node adapted to this architecture. We will also analyze the effects of proposed architecture on the system performance and the response latency by using a probability-driven simulator.

A VLSI Architecture for the Real-Time 2-D Digital Signal Processing (실시간 2차원 디지털 신호처리를 위한 VLSI 구조)

  • 권희훈
    • Information and Communications Magazine
    • /
    • v.9 no.9
    • /
    • pp.72-85
    • /
    • 1992
  • The throughput requirement for many digital signal processing is such that multiple processing units are essential for real-time implementation. Advances in VLSI technology make it feasible to design and implement computer systems consisting of a large number of function units. The research on a very high throughput VLSI architecture for digital signal processing applications requires the development of an algorithm, decomposition scheme which can minimize data communication requirements as well as minimize computational complexity. The objectives of the research are to investigate computationally efficient algorithms for solution of the class of problems which can be modeled as DLSI systems or adaptive system, and develop VLSI architectures and associated multiprocessor systems which can be used to implement these algorithms in real-time. A new VLSI architecture for real-time 2-D digital signal processing applications is proposed in this research. This VLSI architecture extends the concept of having a single processing units in a chip. Because this VLSI architecture has the advantage that the complexity and the number of computations per input does not increase as the size of the input data in increased, it can process very large 2-D date in near real-time.

  • PDF

The Design of Hardware MPI Units for MPSoC (MPSoC를 위한 저비용 하드웨어 MPI 유닛 설계)

  • Jeong, Ha-Young;Chung, Won-Young;Lee, Yong-Surk
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.36 no.1B
    • /
    • pp.86-92
    • /
    • 2011
  • In this paper, we propose a novel hardware MPI(Message Passing Interface) unit which supports message passing in multiprocessor system which use distributed memory architecture. MPI Hardware unit processes data synchronization, transmission and completion, and it supports processor non-blocking operation so it reduces overhead according to synchronization. Additionally, MPI hardware unit combines ready entry, request entry, reserve entry which save and manage the synchronized messages and performs the multiple outstanding issue and out of order completion. According to BFM(Bus Functional Model) simulation result, the performance is increased by 25% on many to many communication. After we designed MPI unit using HDL, with synopsys design compiler we synthesized, and for synthesis library we used MagnaChip $0.18{\mu}m$. And then we making prototype chip. The proposed message transmission interface hardware shows high performance for its increase in size. Thus, as we consider low-cost design and scalability, MPI hardware unit is useful in increasing overall performance of embedded MPSoC(Multi-Processor System-on-Chip).

A study on the parallel processing of the avionic system computer using multi RISC processors (다중 RISC 프로세서를 이용한 항공전자시스템컴퓨터 병렬처리기법 연구)

  • Lee, Jae-Uk;Lee, Sung-Soo;Kim, Young-Taek;Yang, Seung-Yul;Kim, Bong-Gyu;Hwang, Sang-Hyun;Park, Deok-Bae
    • Journal of the Korean Society for Aeronautical & Space Sciences
    • /
    • v.30 no.7
    • /
    • pp.144-149
    • /
    • 2002
  • This paper presents a technique for real time multiprocessor parallel processing to develop an avionic system computer(ASC) which integrates the avionics control, navigation and fire control, cursive and raster graphic symbol generation into one line replaceable unit. The proposed method has optimal performance by adopting a logically asymmetric structure between four 32bit RISC processors based on the master-slave multiprocessing, a tightly coupled interaction level with the time shared common bus and global memory, and an efficient bus arbitration algorithm. The ASC has been verified through a series of flight tests. The relevant tests also have been rigorously conducted on the prototype ASC such as electrical test, environmental test, and electromagnetic interference test.

A Method for Distributed Database Processing with Optimized Communication Cost in Dataflow model (데이터플로우 모델에서 통신비용 최적화를 이용한 분산 데이터베이스 처리 방법)

  • Jun, Byung-Uk
    • Journal of Internet Computing and Services
    • /
    • v.8 no.1
    • /
    • pp.133-142
    • /
    • 2007
  • Large database processing is one of the most important technique in the information society, Since most large database is regionally distributed, the distributed database processing has been brought into relief. Communications and data compressions are the basic technologies for large database processing. In order to maximize those technologies, the execution time for the task, the size of data, and communication time between processors should be considered. In this paper, the dataflow scheme and vertically layered allocation algorithm have been used to optimize the distributed large database processing. The basic concept of this method is rearrangement of processes considering the communication time between processors. The paper also introduces measurement model of the execution time, the size of output data, and the communication time in order to implement the proposed scheme.

  • PDF