• Title/Summary/Keyword: Processor Architecture

Search Result 735, Processing Time 0.025 seconds

Study on Chip Design & Implementation of 32 Bit Floating Point Compatible DSP (32비트 부동소수점 호환 DSP의 설계 및 칩 구현에 관한 연구)

  • Woo, Jong-Sik;Seo, Jin-Keun;Lim, Jae-Young;Park, Ju-Sung
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.37 no.11
    • /
    • pp.74-84
    • /
    • 2000
  • This paper deals with procedures for design and implementation of a DSP, which is compatible with TMS320C30 DSP. CBS(Cycle Based Simulator) is developed to study the architecture of the target DSP. The simulator gives us detailed information such as function block operation, control signal values, register condition, bus and memory values when a instruction is being carried out. RTL design is carried out by VHDL. Logic simulation and hardware emulation are employed to verify proper operation of the design. The DSP is fabricated with 0.6${\mu}m$ CMOS technology. The Chip has 450,000 gates complexity, $9{\times}9mm^2$ area, 20 MIPS operation speed. It is confirmed by running 109 instructions out of 114 instructions and 13 kinds of algorithm that the developed DSP has compatibility with TMS320C30.

  • PDF

A Study on High Speed Image Rotation Algorithm using CUDA (CUDA를 이용한 고속 영상 회전 알고리즘에 관한 연구)

  • Kwon, Hee-Choul;Cho, Hyung-Jin;Kwon, Hee-Yong
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.16 no.5
    • /
    • pp.1-6
    • /
    • 2016
  • Image rotation is one of main pre-processing step in image processing or image pattern recognition. It is implemented with rotation matrix multiplication. However it requires lots of floating point arithmetic operations and trigonometric function calculations, so it takes long execution time. We propose a new high speed image rotation algorithm without two major time-consuming operations. It use just 2 shear translation operations, so it is very fast. In addition, we apply a parallel computing technique with CUDA. CUDA is a massively parallel computing architecture using prevailed GPU recently. As GPU is a dedicated graphic processor, it is exellent for parallel processing of pixels. We compare the proposed algorithm with the conventional rotation one with various size images. Experimental results show that the proposed algorithm is superior to the conventional rotation ones.

Separated Address/Data Network Design for Bus Protocol compatible Network-on-Chip (버스 프로토콜 호환 가능한 네트워크-온-칩에서의 분리된 주소/데이터 네트워크 설계)

  • Chung, Seungh Ah;Lee, Jae Hoon;Kim, Sang Heon;Lee, Jae Sung;Han, Tae Hee
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.53 no.4
    • /
    • pp.68-75
    • /
    • 2016
  • As the number of cores and IPs increase in multiprocessor system-on-chip (MPSoC), network-on-chip (NoC) has emerged as a promising novel interconnection architecture for its parallelism and scalability. However, minimization of the latency in NoC with legacy bus IPs must be addressed. In this paper, we focus on the latency minimization problem in NoC which accommodates legacy bus protocol based IPs considering the trade-offs between hop counts and path collisions. To resolve this problem, we propose separated address/data network for independent address and data phases of bus protocol. Compared to Mesh and irregular topologies generated by TopGen, experimental results show that average latency and execution time are reduced by 19.46% and 10.55%, respectively.

Development of Received Acoustic Pressure Analysis Program of CHA using Beam Tracing Method (Beam Tracing 기법을 이용한 수동 소나 센서의 수신 음압해석 프로그램 개발)

  • Kwon, Hyun-Wung;Hong, Suk-Yoon;Song, Jee-Hun;Jeon, Jae Jin;Seo, Young-Soo
    • Journal of the Society of Naval Architects of Korea
    • /
    • v.50 no.3
    • /
    • pp.190-198
    • /
    • 2013
  • In order to predict acoustic pressure distributions by exterior incident wave at Cylindrical Hydrophone Array (CHA) sensor's positions, acoustic pressure analysis is performed by using beam tracing method. Beam tracing method is well-known of reliable pressure analysis methods at high-frequency range. When an acoustic noise source is located at the center of rectangular room, acoustic pressure analysis is performed by using both beam tracing method and Power Flow Boundary Element Method (PFBEM). By comparing with results of beam tracing method and those of PFBEM, the accuracy of beam tracing method is verified. We develop the CHA pressure analysis program by verified beam tracing method. The developed software is composed of model input, sensor array creator, analysis option, solver and post-processor. We can choose a model option of 2D or 3D. The sensor array generator is connected to a sonar which is composed of center position, bottom, top and angle between sensors. We also can choose an analysis option such as analysis frequency, beam number, reflect number, etc. The solver module calculates the ray paths, acoustic pressure and result of generating beams. We apply the program to 2D and 3D CHA models, and their results are reliable.

A Dual Integer Register File Structure for Temperature - Aware Microprocessors (온도 인지 마이크로프로세서를 위한 듀얼 레지스터 파일 구조)

  • Choi, Jin-Hang;Kong, Joon-Ho;Chung, Eui-Young;Chung, Sung-Woo
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.35 no.12
    • /
    • pp.540-551
    • /
    • 2008
  • Today's microprocessor designs are not free from temperature as well as power consumption. As processor technology scales down, an on-chip circuitry increases power density, which incurs excessive temperature (hotspot) problem. To tackle thermal problems cost-effectively, Dynamic Thermal Management (DTM) has been suggested: DTM techniques have benefits of thermal reliability and cooling cost. However, they require trade-off between thermal control and performance loss. This paper proposes a dual integer register file structure to minimize the performance degradation due to DTM invocations. In on-chip thermal control, the most important functional unit is an integer register file. It is the hotspot unit because of frequent read and write data accesses. The proposed dual integer register file migrates read data accesses by adding an extra register file, thus reduces per-unit dynamic power dissipation. As a result, the proposed structure completely eliminates localized hotspots in the integer register file, resulting in much less performance degradation by average 13.35% (maximum 18%) improvement compared to the conventional DTM architecture.

ASIC Design of OpenRISC-based Multimedia SoC Platform (OpenRISC 기반 멀티미디어 SoC 플랫폼의 ASIC 설계)

  • Kim, Sun-Chul;Ryoo, Kwang-Ki
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2008.10a
    • /
    • pp.281-284
    • /
    • 2008
  • This paper describes ASIC design of multimedia SoC Platform. The implemented Platform consists of 32-bit OpenRISC1200 Microprocessor, WISHBONE on-chip bus, VGA Controller, Debug Interface, SRAM Interface and UART. The 32-bit OpenRISC1200 processor has 5 stage pipeline and Harvard architecture with separated instruction/data bus. The VGA Controller can display RCB data on a CRT or LCD monitor. The Debug Interface supports a debugging function for the Platform. The SRAM Interface supports 18-bit address bus and 32-bit data bus. The UART provides RS232 protocol, which supports serial communication function. The Platform is design and verified on a Xilinx VERTEX-4 XC4VLX80 FPGA board. Test code is generated by a cross compiler' and JTAG utility software and gdb are used to download the test code to the FPGA board through parallel cable. Finally, the Platform is implemented into a single ASIC chip using Chatered 0.18um process and it can operate at 100MHz clock frequency.

  • PDF

A Study On The Wearable Embedded System Platform (입을 수 있는 내장형 시스템 플랫품에 관한 연구)

  • Yoo, Jin-Ho;Jeong, Hyun-Tae;Cho, Il-Yeon;Lee, Sang-Ho;Han, Dong-Won
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.30 no.12B
    • /
    • pp.831-837
    • /
    • 2005
  • Personal general purpose computer(PC) has been evolved from desktop to portable mobile device such as tablet PC and PDA. Technology innovation on semiconductor have made it possible to package a reasonably Powerful Processor and memory subsystem with advanced input/output devices. At last these subsystems are miniaturized into wearable system. Wearable computer has recently gained attention as the post PC in the ubiquitous environment. Wearable computing becomes more and more feasible and receives growing attention throughout industry and the consumer marketplaces. This paper proposed and developed WPS that has multimedia features and network features as a wearable embedded platform. We explain the form, overall architecture, functions and user applications of this WPS. This paper also discusses the form of next generation computer platform with intuitive user interfaces and well designed applications in the future.

Implementation of Optimizing Compiler for Bus-based VLIW Processors (버스기반의 VLIW형 프로세서를 위한 최적화 컴파일러 구현)

  • Hong, Seung-Pyo;Moon, Soo-Mook
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.27 no.4
    • /
    • pp.401-407
    • /
    • 2000
  • Modern microprocessors exploit instruction-level parallel processing to increase the performance. Especially VLIW processors supported by the parallelizing compiler are used more and more in specific applications such as high-end DSP and graphic processing. Bus-based VLIW architecture was proposed for these specific applications and it was designed to reduce the overhead of forwarding unit and the instruction width. In this paper, a optimizing scheduling compiler developed for the proposed bus-based VLIW processor is introduced. First, the method to model interconnections between buses and resource usage patterns is described. Then, on the basis of the modeling, machine-dependent optimization techniques such as bus-to-register promotion, copy coalescing and operand substitution were implemented. Optimization techniques for general-purpose VLIW microprocessors such as selective scheduling and enhanced pipelining scheduling(EPS) were also implemented. The experiment result shows about 20% performance gain for multimedia application benchmarks.

  • PDF

Construction of a Compiled-code Simulator Generation System for Efficient Design Exploration in Embedded Core Design (임베디드 코어 설계시 효율적인 설계 공간 탐색을 위한 컴파일드 코드 방식 시뮬레이터 생성 시스템 구축)

  • Kim, Sang-Woo;Hwang, Sun-Young
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.36 no.1B
    • /
    • pp.71-79
    • /
    • 2011
  • This paper proposes a compiled-code simulator generation system based-on machine description language for efficient design space exploration in designing an embedded system optimized for a specific application. The proposed system generates a compiled-code simulator which maintains the functional accuracy of an event-driven simulator by determining instruction fetch and decoding processes statically. Generated simulator takes instruction-level and cycle-level simulation for estimating performances in embedded core. To show the efficiency of the constructed compiled-code simulator generator, architecture exploration had been performed for the JPEG encoder application. Starting with MIPS R3000 processor for one embedded core, the proposed system can produce the core showing optimized execution time for the application programming. In this process, a huge amount of simulation time has been used. Cycle-level compiled-code simulator has the functional accuracy and shows performance improvement by 21.7% in terms of simulation speed on the average when compared with an event-driven simulator.

A Parallel Bulk Loading Method for $B^+$-Tree Using CUDA (CUDA를 활용한 병렬 $B^+$-트리 벌크로드 기법)

  • Sung, Joo-Ho;Lee, Yoon-Woo;Han, A;Choi, Won-Ik;Kwon, Dong-Seop
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.6
    • /
    • pp.707-711
    • /
    • 2010
  • Most relational database systems provide $B^+$-trees as their main index structures, and use bulk-loading techniques for creating new $B^+$-trees on existing data from scratch. Although bulk loadings are more effective than inserting keys one by one, they are still time-consuming because they have to sort all the keys from large data. To improve the performance of bulk loadings, this paper proposes an efficient parallel bulk loading method for $B^+$-trees based on CUDA, which is a parallel computing architecture developed by NVIDIA to utilize computing powers of graphic processor units for general purpose computing. Experimental results show that the proposed method enhance the performance more than 70 percents compared to existing bulk loading methods.