• Title/Summary/Keyword: Processor Architecture

Search Result 735, Processing Time 0.022 seconds

A Study on the Parallel Routing in Hybrid Optical Networks-on-Chip (하이브리드 광학 네트워크-온-칩에서 병렬 라우팅에 관한 연구)

  • Seo, Jung-Tack;Hwang, Yong-Joong;Han, Tae-Hee
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.48 no.8
    • /
    • pp.25-32
    • /
    • 2011
  • Networks-on-chip (NoC) is emerging as a key technology to overcome severe bus traffics in ever-increasing complexity of the Multiprocessor systems-on-chip (MPSoC); however traditional electrical interconnection based NoC architecture would be faced with technical limits of bandwidth and power consumptions in the near future. In order to cope with these problems, a hybrid optical NoC architecture which use both electrical interconnects and optical interconnects together, has been widely investigated. In the hybrid optical NoCs, wormhole switching and simple deterministic X-Y routing are used for the electrical interconnections which is responsible for the setup of routing path and optical router to transmit optical data through optical interconnects. Optical NoC uses circuit switching method to send payload data by preset paths and routers. However, conventional hybrid optical NoC has a drawback that concurrent transmissions are not allowed. Therefore, performance improvement is limited. In this paper, we propose a new routing algorithm that uses circuit switching and adaptive algorithm for the electrical interconnections to transmit data using multiple paths simultaneously. We also propose an efficient method to prevent livelock problems. Experimental results show up to 60% throughput improvement compared to a hybrid optical NoC and 65% power reduction compared to an electrical NoC.

Design of MAHA Supercomputing System for Human Genome Analysis (대용량 유전체 분석을 위한 고성능 컴퓨팅 시스템 MAHA)

  • Kim, Young Woo;Kim, Hong-Yeon;Bae, Seungjo;Kim, Hag-Young;Woo, Young-Choon;Park, Soo-Jun;Choi, Wan
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.2 no.2
    • /
    • pp.81-90
    • /
    • 2013
  • During the past decade, many changes and attempts have been tried and are continued developing new technologies in the computing area. The brick wall in computing area, especially power wall, changes computing paradigm from computing hardwares including processor and system architecture to programming environment and application usage. The high performance computing (HPC) area, especially, has been experienced catastrophic changes, and it is now considered as a key to the national competitiveness. In the late 2000's, many leading countries rushed to develop Exascale supercomputing systems, and as a results tens of PetaFLOPS system are prevalent now. In Korea, ICT is well developed and Korea is considered as a one of leading countries in the world, but not for supercomputing area. In this paper, we describe architecture design of MAHA supercomputing system which is aimed to develop 300 TeraFLOPS system for bio-informatics applications like human genome analysis and protein-protein docking. MAHA supercomputing system is consists of four major parts - computing hardware, file system, system software and bio-applications. MAHA supercomputing system is designed to utilize heterogeneous computing accelerators (co-processors like GPGPUs and MICs) to get more performance/$, performance/area, and performance/power. To provide high speed data movement and large capacity, MAHA file system is designed to have asymmetric cluster architecture, and consists of metadata server, data server, and client file system on top of SSD and MAID storage servers. MAHA system softwares are designed to provide user-friendliness and easy-to-use based on integrated system management component - like Bio Workflow management, Integrated Cluster management and Heterogeneous Resource management. MAHA supercomputing system was first installed in Dec., 2011. The theoretical performance of MAHA system was 50 TeraFLOPS and measured performance of 30.3 TeraFLOPS with 32 computing nodes. MAHA system will be upgraded to have 100 TeraFLOPS performance at Jan., 2013.

Design of Real-Time PreProcessor for Image Enhancement of CMOS Image Sensor (CMOS 이미지 센서의 영상 개선을 위한 실시간 전처리 프로세서의 설계)

  • Jung, Yun-Ho;Lee, Joon-Hwan;Kim, Jae-Seok;Lim, Won-Bae;Hur, Bong-Soo;Kang, Moon-Gi
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.38 no.8
    • /
    • pp.62-71
    • /
    • 2001
  • This paper presents a design of the real-time digital image enhancement preprocessor for CMOS image sensor. CMOS image sensor offers various advantages while it provides lower-quality images than CCD does. In order to compensate for the physical limitation of CMOS sensor, the spatially adaptive contrast enhancement algorithm was incorporated into the preprocessor with color interpolation, gamma correction, and automatic exposure control. The efficient hardware architecture for the preprocessor is proposed and was simulated in VHDL. It is composed of about 19K logic gates, which is suitable for low-cost one-chip PC camera. The test system was implemented on Altera Flex EPF10KGC503-3 FPGA chip in real-time mode, and performed successfully.

  • PDF

Timing Verification of AUTOSAR-compliant Diesel Engine Management System Using Measurement-based Worst-case Execution Time Analysis (측정기반 최악실행시간 분석 기법을 이용한 AUTOSAR 호환 승용디젤엔진제어기의 실시간 성능 검증에 관한 연구)

  • Park, Inseok;Kang, Eunhwan;Chung, Jaesung;Sohn, Jeongwon;Sunwoo, Myoungho;Lee, Kangseok;Lee, Wootaik;Youn, Jeamyoung;Won, Donghoon
    • Transactions of the Korean Society of Automotive Engineers
    • /
    • v.22 no.5
    • /
    • pp.91-101
    • /
    • 2014
  • In this study, we presented a timing verification method for a passenger car diesel engine management system (EMS) using measurement-based worst-case execution time (WCET) analysis. In order to cope with AUTOSAR-compliant software architecture, a development process model is proposed. In the process model, a runnable is regarded as a test unit and its temporal behavior (i.e. maximum observed execution time, MOET) is obtained along with on-target functionality evaluation results during online unit test. Furthermore, a cost-effective framework for online unit test is proposed. Because the runtime environment layer and the standard calibration environment are utilized to implement test interface, additional resource consumption of the target processor is minimized. Using the proposed development process model and unit test framework, the MOETs of 86 runnables for diesel EMS are obtained with 213 unit test cases. Using the obtained MOETs of runnables, the WCETs of tasks are estimated and the schedulability is evaluated. From the schedulability analysis results, the problems of the initially designed schedule table is recognized and it is fixed by redesigning of the runnable mapping and task offset. Through the various test scenarios, the proposed method is validated.

An Active Prefetch Filtering Schemes using Exclusive Prefetch Cache (선인출 전용 캐시를 이용한 적극적 선인출 필터링 기법)

  • Chon Young-Suk;Kim Suk-il;Jeon Joong-nam
    • The KIPS Transactions:PartA
    • /
    • v.12A no.1 s.91
    • /
    • pp.41-52
    • /
    • 2005
  • Memory reference instruction caused by cache miss is the critical factor that limits the processing power of processor. Cache prefetching technique is an effective way to reduce the latency due to memory access. However, excessively aggressive prefetch leads to cache pollution and finally to cancel out the advantage of prefetch. In this study, an active prefetch filtering scheme is introduced which dynamically decides whether to commence prefetching after referring a filtering table to reduce the cache pollution due to unnecessary prefetches. For the precision filtering, an evicted address referencing scheme has been proposed where the filter directly compares the current prefetch address with previous unnecessary prefetch addresses stored in filtering table. Moreover, a small sized exclusive prefetch cache has been introduced to increase the amount of eviction of unnecessarily prefetched addresses to enhance the accuracy of dynamic filtering. The exclusive prefetch cache also prevents useful demand data from being pushed out by prefetched data, while the evicted address direct referencing scheme enables the prefetch cache to keep most of useful prefetch data within its small size. Experimental results from commonly used general and multimedia benchmarks show that the average cache miss ratio has been decreased by $13.3{\%}$ by virtue of enhanced filtering accuracy compared with conventional schemes.

A Study of FC-NIC Design Using zynq SoC for Host Load Reduction (호스트 부하 경감 달성을 위한 zynq SoC를 적용한 FC-NIC 설계에 관한 연구)

  • Hwang, Byeung-Chang;Seo, Jung-hoon;Kim, Young-Su;Ha, Sung-woo;Kim, Jae-Young;Jang, Sun-geun
    • Journal of Advanced Navigation Technology
    • /
    • v.19 no.5
    • /
    • pp.423-432
    • /
    • 2015
  • This paper shows that design, manufacture and the performance of FC-NIC (fibre channel network interface card) for network unit configuration which is based on one of the 5 main configuration items of the common functional module for IMA (integrated modular Avionics) architecture. Especially, FC-NIC uses zynq SoC (system on chip) for host load reductions. The host merely transmit FC destination address, source memory location and size information to the FC-NIC. After then the FC-NIC read the host memory via DMA (direct memory access). FC upper layer protocol and sequence process at local processor and programmable logic of FC-NIC zynq SoC. It enables to free from host load for external communication. The performance of FC-NIC shows average 5.47 us low end-to-end latency at 2.125 Gbps line speed. It represent that FC-NIC is one of good candidate network for IMA.

A Study On the Design of a Floating Point Unit for MPEG-2 AAC Decoder (MPEG-2 AAC 복호기를 위한 부동소수점유닛 설계에 관한 연구)

  • 구대성;김필중;김종빈
    • Journal of the Institute of Electronics Engineers of Korea TE
    • /
    • v.39 no.4
    • /
    • pp.355-355
    • /
    • 2002
  • In this paper, we designed a FPU(floating point unit) that it is very important and requires of high density when digital audio is designed. Almost audio system must support the multi-channel and required for high quality. A floating point arithmetic function in MPEG-2 AAC that implemented by hardware is able to realtime decoding when DSP realization. The reason is that MPEG-2 AAC is compatible to the Audio field of MPEG-4 and afterwards. We designed a FPU by hardware to increase the speed of a floating point unit with much calculation part in the MPEG-2 AAC Decoder. A FPU is composed of a multiplier and an adder. A multiplier used the Radix-4 Booth algorithm and an adder adopted 1's complement method for speed up. A form of a floating point unit has 8bit of exponent part and 24bit of mantissa. It's compatible with the IEEE single precision format and adopted a pipeline architecture to increase the speed of a processor. All of sub blocks are based on ISO/IEC 13818-7 standard. The algorithm is tested by C language and the design does by use of VHDL(VHSIC Hardware Description Language). The maximum operation speed is 23.2MHz and the stable operation speed is 19MHz.

A New Hardware Design for Generating Digital Holographic Video based on Natural Scene (실사기반 디지털 홀로그래픽 비디오의 실시간 생성을 위한 하드웨어의 설계)

  • Lee, Yoon-Hyuk;Seo, Young-Ho;Kim, Dong-Wook
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.49 no.11
    • /
    • pp.86-94
    • /
    • 2012
  • In this paper we propose a hardware architecture of high-speed CGH (computer generated hologram) generation processor, which particularly reduces the number of memory access times to avoid the bottle-neck in the memory access operation. For this, we use three main schemes. The first is pixel-by-pixel calculation rather than light source-by-source calculation. The second is parallel calculation scheme extracted by modifying the previous recursive calculation scheme. The last one is a fully pipelined calculation scheme and exactly structured timing scheduling by adjusting the hardware. The proposed hardware is structured to calculate a row of a CGH in parallel and each hologram pixel in a row is calculated independently. It consists of input interface, initial parameter calculator, hologram pixel calculators, line buffer, and memory controller. The implemented hardware to calculate a row of a $1,920{\times}1,080$ CGH in parallel uses 168,960 LUTs, 153,944 registers, and 19,212 DSP blocks in an Altera FPGA environment. It can stably operate at 198MHz. Because of the three schemes, the time to access the external memory is reduced to about 1/20,000 of the previous ones at the same calculation speed.

Design of a Low Power Digital Filter Using Variable Canonic Signed Digit Coefficients (가변 CSD 계수를 이용한 저전력 디지털 필터의 설계)

  • Kim, Yeong-U;Yu, Jae-Taek;Kim, Su-Won
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.38 no.7
    • /
    • pp.455-463
    • /
    • 2001
  • In this Paper, an approximate processing method is proposed and tested. The proposed method uses variable CSD (VCSD) coefficients which approximate filter stopband attenuation by controlling the precision of the CSD coefficient sets. A decimation filter for Audio Codec '97 specifications has been designed having processor architecture that consists of program/data memory, arithmetic unit, energy/level decision, and sinc filter blocks, and fabricated with 0.6${\mu}{\textrm}{m}$ CMOS sea-of-gate technology. For the combined two halfband FIR filters in decimation filter, the number of addition operations were reduced to 63.5%, 35.7%, and 13.9%, compared to worst-case which is not an adaptive one. Experimental results show that the total power reduction rate of the filter is varying from 3.8 % to 9.0 % with respect to worst-case. The proposed approximate processing method using variable CSD coefficients is readily applicable to various kinds of filters and suitable, especially, for the speech and audio applications, like oversampling ADCs and DACs, filter banks, voice/audio codecs, etc.

  • PDF

New Sidelobe Canceller for 3-D Phased Array Radar in Strong Interference (강한 간섭 신호를 제거하기 위한 3차원 위상배열 레이다용 새로운 부엽제거기)

  • Cho, Myeong-Je;Han, Dogn-Seog;Jung, Jin-Won;Kim, Soo-Joong
    • Journal of the Korean Institute of Telematics and Electronics S
    • /
    • v.35S no.10
    • /
    • pp.144-155
    • /
    • 1998
  • The array weights that will maximize the SNR for any type of noise environment are determined by the function of the antenna design configuration and the directions of receiving target and interference signals. The conventional SLCs(sidelobe cancellers) using the SNR maximization perform worst from the saturation of the receiving system of main channel when the main antenna has pattern with high gain at the arrival angle of strong interference. In this paper, the new SLC is accomplished by using two independent antenna architecture. Main antenna is implemented with adaptive nulling, which is used for rejecting high-power interference primarily. Auxiliary antenna is realized with adaptive array for receiving interference signal to be suppressed completely, which has a characteristics of sufficient gain for every direction. The new SLC is implemented with above both antennas. We show that the new SLC, which consists of the adaptive nulling main antenna and the adaptive array auxiliary antenna, is useful in reducing the effect of strong interference like jammer, because the adaptive nulling at main antenna prevents its receiver and signal processor for saturation by strong interference. The proposed SLC has improved SNR over the conventional SLCs. The improved SNR at sidelobe region is typically more than 7 dB for a given test signal. Moreover, it improves the SNR of about 20 dB under strong interference at mainlobe.

  • PDF