• Title/Summary/Keyword: memory based instruction

Search Result 80, Processing Time 0.026 seconds

An Implementation of a Memory Operation System Architecture for Memory Latency Penalty Reduction in SIMT Based Stream Processor (Memory Latency Penalty를 개선한 SIMT 기반 Stream Processor의 Memory Operation System Architecture 설계)

  • Lee, Kwang-Yeob
    • Journal of IKEEE
    • /
    • v.18 no.3
    • /
    • pp.392-397
    • /
    • 2014
  • In this paper, we propose a memory operation system architecture for memory latency penalty reduction in SIMT architecture based stream processor. The proposed architecture applied non-blocking cache architecture to reduce cache miss penalty generated by blocking cache architecture. We verified that the proposed memory operation architecture improve the performance of the stream processor by comparing processing performances of various algorithms. We measured the performance improvement rate that was improved in accordance with the ratio of memory instruction in each algorithm. As a result, we confirmed that the performance of stream processor improves up to minimum 8.2% and maximum 46.5%.

Design of a Variable-Length Instruction based on a OpenGL ES 2.0 API (OpenGL ES 2.0 API 기반 가변길이 명령어 설계)

  • Lee, Kwang-Yeob
    • Journal of IKEEE
    • /
    • v.12 no.2
    • /
    • pp.118-123
    • /
    • 2008
  • The Khronos group releases OpenGL ES 2.0 API specification bringing streamlined shader programming to graphics processor of embedded system. For this reason, the mobile devices have need of graphics processor for supporting a OpenGL ES 2.0 API. We need to extend instruction`s length to support OpenGLES 2.0 API, so it needs more memory size. In this paper, we propose a new instruction format that offers availability for use the instructions. This proposed instruction adopt a variable length method and unit instruction architecture. This proposed instruction architecture that support to OpenGLES 2.0 API has consist of 32bit unit instructions up to 4 which can be combined for embellishing each other. Therefore, it can execute flexible instruction combination and reduce waste of instruction fields.

  • PDF

Fully Programmable Memory BIST for Commodity DRAMs

  • Kim, Ilwoong;Jeong, Woosik;Kang, Dongho;Kang, Sungho
    • ETRI Journal
    • /
    • v.37 no.4
    • /
    • pp.787-792
    • /
    • 2015
  • To accomplish a high-speed test on low-speed automatic test equipment (ATE), a new instruction-based fully programmable memory built-in self-test (BIST) is proposed. The proposed memory BIST generates a highspeed internal clock signal by multiplying an external low-speed clock signal from an ATE by a clock multiplier embedded in a DRAM. For maximum programmability and small area overhead, the proposed memory BIST stores the unique sets of instructions and corresponding test sequences that are implicit within the test algorithms that it receives from an external ATE. The proposed memory BIST is managed by an external ATE on-the-fly to perform complicated and hard-to-implement functions, such as loop operations and refresh-interrupts. Therefore, the proposed memory BIST has a simple hardware structure compared to conventional memory BIST schemes. The proposed memory BIST is a practical test solution for reducing the overall test cost for the mass production of commodity DDRx SDRAMs.

Design and Implementation of a Host Interface for a Regular Expression Processor (정규표현식 프로세서를 위한 호스트 인터페이스 설계 및 구현)

  • Kim, JongHyun;Yun, SangKyun
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.2
    • /
    • pp.97-103
    • /
    • 2017
  • Many hardware-based regular expression matching architectures have been proposed for high-performance matching. In particular, regular expression processors, which perform pattern matching by treating the regular expressions as the instruction sequence like general purpose processors, have been proposed. After instruction sequence and data are provided in the instruction memory and data memory, respectively, a regular expression processor can perform pattern matching. To use a regular expression processor as a coprocessor, we need the host interface to transfer the instruction and data into the memory of a regular expression processor. In this paper, we design and implement the host interface between a host and a regular expression processor in the DE1-SoC board and the application program interface. We verify the operations of the host interface and a regular expression processor by executing the application programs which perform pattern matching using the application program interface.

The effects of a vocabulary instructional method on vocabulary learning strategy use and the affective domain: Focus on an analysis of students' survey responses (어휘 지도 방법이 어휘 학습전략 사용과 정의적 측면에 미치는 효과: 학생 설문 조사 분석을 중심으로)

  • Kim, Nahk-Bohk
    • English Language & Literature Teaching
    • /
    • v.11 no.3
    • /
    • pp.89-112
    • /
    • 2005
  • This study investigated the effects of collocation-based vocabulary instruction for the experimental group (G2). It was compared to the traditional wordlist-based vocabulary instruction for the control group (G1). This results reflect the development of low level high school EFL learners' vocabulary learning strategy use and the positive change in the affective domain. In the analysis of the survey responses, G1 and G2 did not differ significantly on the first questionnaire. They did, however, differ significantly on the second questionnaire. G2 used more strategies to discover and to consolidate the meaning of the words by means of combining words. In terms of the affective domain, G2 participated more actively in the learning activities, which had a significant effect on vocabulary growth, memory, self-confidence, motivation, and cooperative learning. This is attributable to the fact that G2 was more inquisitive, interested, challenged, participatory, cooperative, and attentive than G1 in performing the vocabulary task activities. Moreover, the data collected from the questionnaire showed that G2 performed more interactive and dynamic activities in solving the given tasks.

  • PDF

TP-Sim: A Trace-driven Processing-in-Memory Simulator (TP-Sim: 트레이스 기반의 프로세싱 인 메모리 시뮬레이터)

  • Jeonggeun Kim
    • Journal of the Semiconductor & Display Technology
    • /
    • v.22 no.3
    • /
    • pp.78-83
    • /
    • 2023
  • This paper proposes a lightweight trace-driven Processing-In-Memory (PIM) simulator, TP-Sim. TP-Sim is a General Purpose PIM (GP-PIM) simulator that evaluates various PIM system performance-related metrics. Based on instruction and memory traces extracted from the Intel Pin tool, TP-Sim can replay trace files for multiple models of PIM architectures to compare its performance. To verify the availability of TP-Sim, we estimated three different system configurations on the STREAM benchmark. Compared to the traditional Host CPU-only systems with conventional memory hierarchy, simple GP-PIM architecture achieved better performance; even the Host CPU has the same number of in-order cores. For further study, we also extend TP-Sim as a part of a heterogeneous system simulator that contains CPU, GPGPU, and PIM as its primary and co-processors.

  • PDF

Mathematical thinking, its neural systems and implication for education (수학적 사고에 동원되는 두뇌 영역들과 이의 교육학적 의미)

  • Kim, Yeon Mi
    • The Mathematical Education
    • /
    • v.52 no.1
    • /
    • pp.19-41
    • /
    • 2013
  • What is the foundation of mathematical thinking? Is it logic based symbolic language system? or does it rely more on mental imagery and visuo-spatial abilities? What kind of neural changes happen if someone's mathematical abilities improve through practice? To answer these questions, basic cognitive processes including long term memory, working memory, visuo-spatial perception, number processes are considered through neuropsychological outcomes. Neuronal changes following development and practices are inspected and we can show there are neural networks critical for the mathematical thinking and development: prefrontal-anterior cingulate-parietal network. Through these inquiry, we can infer the answer to our question.

JMP+RAND: Mitigating Memory Sharing-Based Side-Channel Attack by Embedding Random Values in Binaries (JMP+RAND: 바이너리 난수 삽입을 통한 메모리 공유 기반 부채널 공격 방어 기법)

  • Kim, Taehun;Shin, Youngjoo
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.9 no.5
    • /
    • pp.101-106
    • /
    • 2020
  • Since computer became available, much effort has been made to achieve information security. Even though memory protection defense mechanisms were studied the most among of them, the problems of existing memory protection defense mechanisms were found due to improved performance of computer and new defense mechanisms were needed due to the advent of the side-channel attacks. In this paper, we propose JMP+RAND that embedding random values of 5 to 8 bytes per page to defend against memory sharing based side-channel attacks and bridging the gap of existing memory protection defense mechanism. Unlike the defense mechanism of the existing side-channel attacks, JMP+RAND uses static binary rewriting and continuous jmp instruction and random values to defend against the side-channel attacks in advance. We numerically calculated the time it takes for a memory sharing-based side-channel attack to binary adopted JMP+RAND technique and verified that the attacks are impossible in a realistic time. Modern architectures have very low overhead for JMP+RAND because of the very fast and accurate branching of jmp instruction using branch prediction. Since random value can be embedded only in specific programs using JMP+RAND, it is expected to be highly efficient when used with memory deduplication technique, especially in a cloud computing environment.

Design of a scalable general-purpose parallel associative processor using content-addressable memory (Content-Addressable Memory를 이용한 확장 가능한 범용 병렬 Associative Processor 설계)

  • Park, Tae-Geun
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.43 no.2 s.344
    • /
    • pp.51-59
    • /
    • 2006
  • Von Neumann architecture suffers from the interface between the central processing unit and the memory, which is called 'Von Neumann bottleneck' In this paper, we propose a scalable general-purpose associative processor (AP) based on content-addressable memory (CAM) which solves this problem and is suitable for the search-oriented applications. We propose an efficient instruction set and a structural scalability to extend for larger applications. We define twelve instructions and provide some reduced instructions to speed up which execute two instructions in a single instruction cycle. The proposed AP performs in a bit-serial, word-parallel fashion and can be considered as a 32-bit general-purpose parallel processor with a massively parallel SIMD structure. We design and simulate a maximum/minumum search greater-than/less-than search, and parallel addition to verify the proposed architecture. The algorithms are executed in a constant time O(k) regardless of the number of input data.

A Parallel Speech Recognition Model on Distributed Memory Multiprocessors (분산 메모리 다중프로세서 환경에서의 병렬 음성인식 모델)

  • 정상화;김형순;박민욱;황병한
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.5
    • /
    • pp.44-51
    • /
    • 1999
  • This paper presents a massively parallel computational model for the efficient integration of speech and natural language understanding. The phoneme model is based on continuous Hidden Markov Model with context dependent phonemes, and the language model is based on a knowledge base approach. To construct the knowledge base, we adopt a hierarchically-structured semantic network and a memory-based parsing technique that employs parallel marker-passing as an inference mechanism. Our parallel speech recognition algorithm is implemented in a multi-Transputer system using distributed-memory MIMD multiprocessors. Experimental results show that the parallel speech recognition system performs better in recognition accuracy than a word network-based speech recognition system. The recognition accuracy is further improved by applying code-phoneme statistics. Besides, speedup experiments demonstrate the possibility of constructing a realtime parallel speech recognition system.

  • PDF