• Title/Summary/Keyword: Reconfigurable Processor

Search Result 56, Processing Time 0.023 seconds

Performance exploration on the number of register for Coarse grained reconfigurable array processor (재구성형 프로세서 성능과 레지스터와의 상관 관계 탐구)

  • Kim, Yongjoo;Heo, Ingoo;Yang, Seungjun;Lee, Jongwon;Choi, Youngkyu;Paek, Yunheung
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2010.04a
    • /
    • pp.22-25
    • /
    • 2010
  • 재구성형 프로세서는 파워를 적게 사용하면서도 높은 성능을 낼 수 있는 프로세서이다. 재구성형 프로세서는 하드웨어에 최대한 많은 계산 자원을 넣으면서도 구조를 최대한 간단하게 하여 저전력 소모와 고성능을 동시에 추구하였다. 하지만 구조를 최대한 간단히 하는 과정에서 프로그램의 수행을 관리하는 많은 하드웨어 로직이 빠지게 되었는데, 이 부분은 컴파일러에서 코드를 생성할 때 스케쥴링과 수행 순서까지 정해지도록 소프트웨어적 관점에서 처리하기로 하였다. 이를 사용하기 위해 컴파일러는 입력된 프로그램을 분석하고 재구성형 프로세서에서 수행될 수 있는 형태로 코드를 각 계산자원에 매핑하는 작업을 수행해 주어야 한다. 재구성형 프로세서의 레지스터는 이 매핑 과정에서 데이터의 전달을 위해서 주로 사용되게 된다. 이 논문에서는 다양한 멀티미디어 응용 프로그램을 사용하여 멀티미디어 환경에서 재구성형 프로세서가 사용될 때 레지스터 개수가 성능에 미치는 영향을 제시한다.

Efficient RMESH Algorithms for Computing the Intersection and the Union of Two Visibility Polygons (두 가시성 다각형의 교집합과 합집합을 구하는 효율적인 RMESH 알고리즘)

  • Kim, Soo-Hwan
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.20 no.2
    • /
    • pp.401-407
    • /
    • 2016
  • We can consider the following problems for two given points p and q in a simple polygon P. (1) Compute the set of points of P which are visible from both p and q. (2) Compute the set of points of P which are visible from either p or q. They are corresponding to the problems which are to compute the intersection and the union of two visibility polygons. In this paper, we consider algorithms for solving these problems on a reconfigurable mesh(in short, RMESH). The algorithm in [1] can compute the intersection of two general polygons in constant time on an RMESH with size O($n^3$), where n is the total number of vertices of two polygons. In this paper, we construct the planar subdivision graph in constant time on an RMESH with size O($n^2$) using the properties of the visibility polygon for preprocessing. Then we present O($log^2n$) time algorithms for computing the union as well as the intersection of two visibility polygons, which improve the processor-time product from O($n^3$) to O($n^2log^2n$).

Hardware Approach to Fuzzy Inference―ASIC and RISC―

  • Watanabe, Hiroyuki
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 1993.06a
    • /
    • pp.975-976
    • /
    • 1993
  • This talk presents the overview of the author's research and development activities on fuzzy inference hardware. We involved it with two distinct approaches. The first approach is to use application specific integrated circuits (ASIC) technology. The fuzzy inference method is directly implemented in silicon. The second approach, which is in its preliminary stage, is to use more conventional microprocessor architecture. Here, we use a quantitative technique used by designer of reduced instruction set computer (RISC) to modify an architecture of a microprocessor. In the ASIC approach, we implemented the most widely used fuzzy inference mechanism directly on silicon. The mechanism is beaded on a max-min compositional rule of inference, and Mandami's method of fuzzy implication. The two VLSI fuzzy inference chips are designed, fabricated, and fully tested. Both used a full-custom CMOS technology. The second and more claborate chip was designed at the University of North Carolina(U C) in cooperation with MCNC. Both VLSI chips had muliple datapaths for rule digital fuzzy inference chips had multiple datapaths for rule evaluation, and they executed multiple fuzzy if-then rules in parallel. The AT & T chip is the first digital fuzzy inference chip in the world. It ran with a 20 MHz clock cycle and achieved an approximately 80.000 Fuzzy Logical inferences Per Second (FLIPS). It stored and executed 16 fuzzy if-then rules. Since it was designed as a proof of concept prototype chip, it had minimal amount of peripheral logic for system integration. UNC/MCNC chip consists of 688,131 transistors of which 476,160 are used for RAM memory. It ran with a 10 MHz clock cycle. The chip has a 3-staged pipeline and initiates a computation of new inference every 64 cycle. This chip achieved an approximately 160,000 FLIPS. The new architecture have the following important improvements from the AT & T chip: Programmable rule set memory (RAM). On-chip fuzzification operation by a table lookup method. On-chip defuzzification operation by a centroid method. Reconfigurable architecture for processing two rule formats. RAM/datapath redundancy for higher yield It can store and execute 51 if-then rule of the following format: IF A and B and C and D Then Do E, and Then Do F. With this format, the chip takes four inputs and produces two outputs. By software reconfiguration, it can store and execute 102 if-then rules of the following simpler format using the same datapath: IF A and B Then Do E. With this format the chip takes two inputs and produces one outputs. We have built two VME-bus board systems based on this chip for Oak Ridge National Laboratory (ORNL). The board is now installed in a robot at ORNL. Researchers uses this board for experiment in autonomous robot navigation. The Fuzzy Logic system board places the Fuzzy chip into a VMEbus environment. High level C language functions hide the operational details of the board from the applications programme . The programmer treats rule memories and fuzzification function memories as local structures passed as parameters to the C functions. ASIC fuzzy inference hardware is extremely fast, but they are limited in generality. Many aspects of the design are limited or fixed. We have proposed to designing a are limited or fixed. We have proposed to designing a fuzzy information processor as an application specific processor using a quantitative approach. The quantitative approach was developed by RISC designers. In effect, we are interested in evaluating the effectiveness of a specialized RISC processor for fuzzy information processing. As the first step, we measured the possible speed-up of a fuzzy inference program based on if-then rules by an introduction of specialized instructions, i.e., min and max instructions. The minimum and maximum operations are heavily used in fuzzy logic applications as fuzzy intersection and union. We performed measurements using a MIPS R3000 as a base micropro essor. The initial result is encouraging. We can achieve as high as a 2.5 increase in inference speed if the R3000 had min and max instructions. Also, they are useful for speeding up other fuzzy operations such as bounded product and bounded sum. The embedded processor's main task is to control some device or process. It usually runs a single or a embedded processer to create an embedded processor for fuzzy control is very effective. Table I shows the measured speed of the inference by a MIPS R3000 microprocessor, a fictitious MIPS R3000 microprocessor with min and max instructions, and a UNC/MCNC ASIC fuzzy inference chip. The software that used on microprocessors is a simulator of the ASIC chip. The first row is the computation time in seconds of 6000 inferences using 51 rules where each fuzzy set is represented by an array of 64 elements. The second row is the time required to perform a single inference. The last row is the fuzzy logical inferences per second (FLIPS) measured for ach device. There is a large gap in run time between the ASIC and software approaches even if we resort to a specialized fuzzy microprocessor. As for design time and cost, these two approaches represent two extremes. An ASIC approach is extremely expensive. It is, therefore, an important research topic to design a specialized computing architecture for fuzzy applications that falls between these two extremes both in run time and design time/cost. TABLEI INFERENCE TIME BY 51 RULES {{{{Time }}{{MIPS R3000 }}{{ASIC }}{{Regular }}{{With min/mix }}{{6000 inference 1 inference FLIPS }}{{125s 20.8ms 48 }}{{49s 8.2ms 122 }}{{0.0038s 6.4㎲ 156,250 }} }}

  • PDF

Cloudification of On-Chip Flash Memory for Reconfigurable IoTs using Connected-Instruction Execution (연결기반 명령어 실행을 이용한 재구성 가능한 IoT를 위한 온칩 플래쉬 메모리의 클라우드화)

  • Lee, Dongkyu;Cho, Jeonghun;Park, Daejin
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.14 no.2
    • /
    • pp.103-111
    • /
    • 2019
  • The IoT-driven large-scaled systems consist of connected things with on-chip executable embedded software. These light-weighted embedded things have limited hardware space, especially small size of on-chip flash memory. In addition, on-chip embedded software in flash memory is not easy to update in runtime to equip with latest services in IoT-driven applications. It is becoming important to develop light-weighted IoT devices with various software in the limited on-chip flash memory. The remote instruction execution in cloud via IoT connectivity enables to provide high performance software execution with unlimited software instruction in cloud and low-power streaming of instruction execution in IoT edge devices. In this paper, we propose a Cloud-IoT asymmetric structure for providing high performance instruction execution in cloud, still low power code executable thing in light-weighted IoT edge environment using remote instruction execution. We propose a simulated approach to determine efficient partitioning of software runtime in cloud and IoT edge. We evaluated the instruction cloudification using remote instruction by determining the execution time by the proposed structure. The cloud-connected instruction set simulator is newly introduced to emulate the behavior of the processor. Experimental results of the cloud-IoT connected software execution using remote instruction showed the feasibility of cloudification of on-chip code flash memory. The simulation environment for cloud-connected code execution successfully emulates architectural operations of on-chip flash memory in cloud so that the various software services in IoT can be accelerated and performed in low-power by cloudification of remote instruction execution. The execution time of the program is reduced by 50% and the memory space is reduced by 24% when the cloud-connected code execution is used.

Design of Reconfigurable Processor for Information Security System (정보보호 시스템을 위한 재구성형 프로세서 설계)

  • Cha, Jeong-Woo;Kim, Il-Hyu;Kim, Chang-Hoon;Kim, Dong-Hwi
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2011.04a
    • /
    • pp.113-116
    • /
    • 2011
  • 최근 IT 기술의 급격한 발전으로 개인정보, 환경 등 다양한 정보를 수시로 수집 및 관리하면서 사용자가 원할시 즉각적인 정보서비스를 제공하고 있다. 그러나 유 무선상의 데이터 전송은 정보의 도청, 메시지의 위 변조 및 재사용, DoS(Denial of Service)등 외부의 공격으로부터 쉽게 노출된다. 이러한 외부 공격은 개인 프라이버시를 포함한 정보서비스 시스템 전반에 치명적인 손실을 야기 시킬 수 있기 때문에 정보보호 시스템의 필요성은 갈수록 그 중요성이 부각되고 있다. 현재까지 정보보호 시스템은 소프트웨어(S/W), 하드웨어(ASIC), FPGA(Field Progr- ammable Array) 디바이스를 이용하여 구현되었으며, 각각의 구현방법은 여러 가지 문제점이 있으며 그에 따른 해결방법이 제시되고 있다. 본 논문에서는 다양한 환경에서의 정보보호 서비스를 제공하기 위한 재구성형 SoC 구조를 제안한다. 제안된 SoC는 비밀키 암호알고리즘(AES), 암호학적 해쉬(SHA-256), 공개키 암호알고리즘(ECC)을 수행 할 수 있으며, 마스터 콘트롤러에 의해 제어된다. 또한 정보보호 시스템이 요구하는 다양한 제약조건(속도, 면적, 안전성, 유연성)을 만족하기 위해 S/W, ASIC, FPGA 디바이스의 모든 장점을 최대한 활용하였으며, MCU와의 효율적인 통신을 위한 I/O 인터페이스를 제안한다. 따라서 제안된 정보보호 시스템은 기존의 시스템보다 다양한 정보보호 알고리즘을 지원할 뿐만 아니라 속도 및 면적에 있어 상충 관계를 개선하였기 때문에 저비용 응용뿐만 아니라 고속 통신 장비 시스템에도 적용이 가능하다.

Core-aware Cache Replacement Policy for Reconfigurable Last Level Cache (재구성 가능한 라스트 레벨 캐쉬 구조를 위한 코어 인지 캐쉬 교체 기법)

  • Son, Dong-Oh;Choi, Hong-Jun;Kim, Jong-Myon;Kim, Cheol-Hong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.18 no.11
    • /
    • pp.1-12
    • /
    • 2013
  • In multi-core processors, Last Level Cache(LLC) can reduce the speed gap between the memory and the core. For this reason, LLC has big impact on the performance of processors. LLC is composed of shared cache and private cache. In computer architecture community, most researchers have mainly focused on the management techniques for shared cache, while management techniques for private cache have not been widely researched. In conventional private LLC, memory is statically assigned to each core, resulting in serious performance degradation when the workloads are not fairly distributed. To overcome this problem, this paper proposes the replacement policy for managing private cache of LLC efficiently. As proposed core-aware cache replacement policy can reconfigure LLC dynamically, hit rate of LLC is increases drastically. Moreover, proposed policy uses 2-bit saturating counters to improve the performance. According to our simulation results, the proposed method can improve hit rates by 9.23% and reduce the access time by 12.85% compared to the conventional method.