• Title/Summary/Keyword: memory based instruction

Search Result 80, Processing Time 0.024 seconds

A Study on the Design of Format Converter for Pixel-Parallel Image Processing (픽셀-병렬 영상처리에 있어서 포맷 컨버터 설계에 관한 연구)

  • 김현기;김현호;하기종;최영규;류기환;이천희
    • Proceedings of the IEEK Conference
    • /
    • 2001.06b
    • /
    • pp.269-272
    • /
    • 2001
  • In this paper we proposed the format converter design and implementation for real time image processing. This design method is based on realized the large processor-per-pixel array by integrated circuit technology in which this two types of integrated structure is can be classify associative parallel processor and parallel process with DRAM cell. Layout pitch of one-bit-wide logic is identical memory cell pitch to array high density PEs in integrate structure. This format converter design has control path implementation efficiently, and can be utilized the high technology without complicated controller hardware. Sequence of array instruction are generated by host computer before process start, and instructions are saved on unit controller. Host computer is executed the pixel-parallel operation starting at saved instructions after processing start

  • PDF

Designing Education Contents for Chinese Character Utilizing Internet of Things (IoT)

  • Jung, Sugkyu
    • Smart Media Journal
    • /
    • v.5 no.2
    • /
    • pp.24-32
    • /
    • 2016
  • Recently, the development of electronic teaching materials and the demand of digital learners have led the needs on the education contents that replace learning from character information and the change of an information design method for this. Chinese character education in the traditional schooling mainly focuses on writing and memorization (semantic memory). This way that the stories do not exist has brought the learners' recognition that Chinese character is difficult to learn. Meanwhile, for a language study such as English, cross-media development between printed materials and audio-visual materials has been actively introduced. The method that extends episode memories along with memorization through a story is widely used. Therefore, this content suggests a prototype, which is broken away from an existing way of learning Chinese character that mainly focuses on writing, one sided instruction and information cramming. This makes learners learn through a story from printed materials and animation. Furthermore, it suggests a method that extends episode memories through Chinese education contents based on IoT explaining the principle of Chinese character by combining IT technology (information and communications, IoT) and education contents on block toys.

A Fast SIFT Implementation Based on Integer Gaussian and Reconfigurable Processor

  • Su, Le Tran;Lee, Jong Soo
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.2 no.3
    • /
    • pp.39-52
    • /
    • 2009
  • Scale Invariant Feature Transform (SIFT) is an effective algorithm in object recognition, panorama stitching, and image matching, however, due to its complexity, real time processing is difficult to achieve with software approaches. This paper proposes using a reconfigurable hardware processor with integer half kernel. The integer half kernel Gaussian reduces the Gaussian pyramid complexity in about half [] and the reconfigurable processor carries out a parallel implementation of a full search Fast SIFT algorithm. We use a low memory, fine grain single instruction stream multiple data stream (SIMD) pixel processor that is currently being developed. This implementation fully exposes the available parallelism of the SIFT algorithm process and exploits the processing and I/O capabilities of the processor which results in a system that can perform real time image and video compression. We apply this novel implementation to images and measure the effectiveness. Experimental simulation results indicate that the proposed implementation is capable of real time applications.

  • PDF

Parallel algorithm of global routing for general purpose associative processign system (법용 연합 처리 시스템에서의 전역배선 병렬화 기법)

  • Park, Taegeun
    • Journal of the Korean Institute of Telematics and Electronics A
    • /
    • v.32A no.4
    • /
    • pp.93-102
    • /
    • 1995
  • This paper introduces a general purpose Associative Processor(AP) which is very efficient for search-oriented applications. The proposed architecture consists of three main functional blocks: Content-Addressable Memory(CAM) arry, row logic, and control section. The proposed AP is a Single-Instruction, Multiple-Data(SIMD) device based on a CAM core and an array of high speed processors. As an application for the proposed hardware, we present a parallel algorithm to solve a global routing problem in the layout process utilizing the processing capabilities of a rudimentary logic and the selective matching and writing capability of CAMs, along with basic algorithms such a minimum(maximum) search, less(greater) than search and parallel arithmetic. We have focused on the simultaneous minimization of the desity of the channels and the wire length by sedking a less crowded channel with shorter wire distance. We present an efficient mapping technique of the problem into the CAM structure. Experimental results on difficult examples, on randomly generated data, and on benchmark problems from MCNC are included.

  • PDF

Low-Complexity Deeply Embedded CPU and SoC Implementation (낮은 복잡도의 Deeply Embedded 중앙처리장치 및 시스템온칩 구현)

  • Park, Chester Sungchung;Park, Sungkyung
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.17 no.3
    • /
    • pp.699-707
    • /
    • 2016
  • This paper proposes a low-complexity central processing unit (CPU) that is suitable for deeply embedded systems, including Internet of things (IoT) applications. The core features a 16-bit instruction set architecture (ISA) that leads to high code density, as well as a multicycle architecture with a counter-based control unit and adder sharing that lead to a small hardware area. A co-processor, instruction cache, AMBA bus, internal SRAM, external memory, on-chip debugger (OCD), and peripheral I/Os are placed around the core to make a system-on-a-chip (SoC) platform. This platform is based on a modified Harvard architecture to facilitate memory access by reducing the number of access clock cycles. The SoC platform and CPU were simulated and verified at the C and the assembly levels, and FPGA prototyping with integrated logic analysis was carried out. The CPU was synthesized at the ASIC front-end gate netlist level using a $0.18{\mu}m$ digital CMOS technology with 1.8V supply, resulting in a gate count of merely 7700 at a 50MHz clock speed. The SoC platform was embedded in an FPGA on a miniature board and applied to deeply embedded IoT applications.

Dynamic Monitoring Framework and Debugging System for Embedded Virtualization System (가상화 환경에서 임베디드 시스템을 위한 모니터링 프레임워크와 디버깅 시스템)

  • Han, Inkyu;Lim, Sungsoo
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.12
    • /
    • pp.792-797
    • /
    • 2015
  • Effective profiling diagnoses the failure of the system and informs risk. If a failure in the target system occurs, it is impossible to diagnose more than one of the exiting tools. In this respect, monitoring of the system based on virtualization is useful. We present in this paper a monitoring framework that uses the characteristics of hardware virtualization to prevent side-effects from a target guest, and uses dynamic binary instrumentation with instruction-level trapping based on hardware virtualization to achieve efficiency and flexibility. We also present examples of some applications that use this framework. The framework provides tracing of guest kernel function, memory dump, and debugging that uses GDB stub with GDB remote protocol. The experimental evaluation of our prototype shows that the monitoring framework incurs at most 2% write memory performance overhead for end users.

Computing and Reducing Transient Error Propagation in Registers

  • Yan, Jun;Zhang, Wei
    • Journal of Computing Science and Engineering
    • /
    • v.5 no.2
    • /
    • pp.121-130
    • /
    • 2011
  • Recent research indicates that transient errors will increasingly become a critical concern in microprocessor design. As embedded processors are widely used in reliability-critical or noisy environments, it is necessary to develop cost-effective fault-tolerant techniques to protect processors against transient errors. The register file is one of the critical components that can significantly affect microprocessor system reliability, since registers are typically accessed very frequently, and transient errors in registers can be easily propagated to functional units or the memory system, leading to silent data error (SDC) or system crash. This paper focuses on investigating the impact of register file soft errors on system reliability and developing cost-effective techniques to improve the register file immunity to soft errors. This paper proposes the register vulnerability factor (RVF) concept to characterize the probability that register transient errors can escape the register file and thus potentially affect system reliability. We propose an approach to compute the RVF based on register access patterns. In this paper, we also propose two compiler-directed techniques and a hybrid approach to improve register file reliability cost-effectively by lowering the RVF value. Our experiments indicate that on average, RVF can be reduced to 9.1% and 9.5% by the hyperblock-based instruction re-scheduling and the reliability-oriented register assignment respectively, which can potentially lower the reliability cost significantly, without sacrificing the register value integrity.

Retargetable Instruction-Set Simulator for Energy Consumption Monitoring (에너지 소비 모니터링을 위한 재목적 인스트럭션-셋 시뮬레이터)

  • Ko, Kwang-Man
    • Journal of Korea Multimedia Society
    • /
    • v.14 no.3
    • /
    • pp.462-470
    • /
    • 2011
  • Retargetability is typically achieved by providing target machine information, ADL, as input. The ADL are used to specify processor and memory architectures and generate software toolkit including compiler, simulator, etc. Simulator are critical components of the exploration and software design toolkit for the system designer. They can be used to perform diverse tasks such as verifying the functionality and/or timing behavior of the system, and generating quantitative measurements(e.g., power energy consumption) which can be used to aid the design process. In this paper, we generate the energy consumption estimation simulator through ADL. For this goal, firstly, we describes the energy consumption estimation and monitoring informations on the ADL based on EXPRESSION. Secondly, we generate the energy estimation and monitoring simulation library and then constructs the simulator, RenergySim. Lastly, we represent the energy estimations results for MIPS R4000 ADL description. From this subjects, we contribute to the efficient architecture developments and prompt SDK generation through programmable experiments in the field of mobile software development.

AE32000B: a Fully Synthesizable 32-Bit Embedded Microprocessor Core

  • Kim, Hyun-Gyu;Jung, Dae-Young;Jung, Hyun-Sup;Choi, Young-Min;Han, Jung-Su;Min, Byung-Gueon;Oh, Hyeong-Cheol
    • ETRI Journal
    • /
    • v.25 no.5
    • /
    • pp.337-344
    • /
    • 2003
  • In this paper, we introduce a fully synthesizable 32-bit embedded microprocessor core called the AE32000B. The AE32000B core is based on the extendable instruction set computer architecture, so it has high code density and a low memory access rate. In order to improve the performance of the core, we developed and adopted various design options, including the load extension register instruction (LERI) folding unit, a high performance multiply and accumulate (MAC) unit, various DSP units, and an efficient coprocessor interface. The instructions per cycle count of the Dhrystone 2.1 benchmark for the designed core is about 0.86. We verified the synthesizability and the area and time performances of our design using two CMOS standard cell libraries: a 0.35-${\mu}m$ library and a 0.18-${\mu}m$ library. With the 0.35-${\mu}m$ library, the core can be synthesized with about 47,000 gates and operate at 70 MHz or higher, while it can be synthesized with about 53,000 gates and operate at 120 MHz or higher with the 0.18-${\mu}m$ library.

  • PDF

Random Partial Haar Wavelet Transformation for Single Instruction Multiple Threads (단일 명령 다중 스레드 병렬 플랫폼을 위한 무작위 부분적 Haar 웨이블릿 변환)

  • Park, Taejung
    • Journal of Digital Contents Society
    • /
    • v.16 no.5
    • /
    • pp.805-813
    • /
    • 2015
  • Many researchers expect the compressive sensing and sparse recovery problem can overcome the limitation of conventional digital techniques. However, these new approaches require to solve the l1 norm optimization problems when it comes to signal reconstruction. In the signal reconstruction process, the transform computation by multiplication of a random matrix and a vector consumes considerable computing power. To address this issue, parallel processing is applied to the optimization problems. In particular, due to huge size of original signal, it is hard to store the random matrix directly in memory, which makes one need to design a procedural approach in handling the random matrix. This paper presents a new parallel algorithm to calculate random partial Haar wavelet transform based on Single Instruction Multiple Threads (SIMT) platform.