• Title/Summary/Keyword: memory unit

Search Result 555, Processing Time 0.022 seconds

The Characteristics of p-channel SONOS Transistor for the NAND Charge-trap Flash Memory (NAND 전하트랩 플래시메모리를 위한 p채널 SONOS 트랜지스터의 특성)

  • Kim, Byung-Cheul;Kim, Joo-Yeon
    • Journal of the Korean Institute of Electrical and Electronic Material Engineers
    • /
    • v.22 no.1
    • /
    • pp.7-11
    • /
    • 2009
  • In this study, p-channel silicon-oxide-nitride-oxide-silicon(SONOS) transistors are fabricated and characterized as an unit cell for NAND flash memory. The SONOS transistors are fabricated by $0.13{\mu}m$ low power standard logic process technology. The thicknesses of gate insulators are 2.0 nm for the tunnel oxide, 1.4 nm for the nitride layer, and 4.9 nm for the blocking oxide. The fabricated SONOS transistors show low programming voltage and fast erase speed. However, the retention and endurance of the devices show poor characteristics.

Large-scale 3D fast Fourier transform computation on a GPU

  • Jaehong Lee;Duksu Kim
    • ETRI Journal
    • /
    • v.45 no.6
    • /
    • pp.1035-1045
    • /
    • 2023
  • We propose a novel graphics processing unit (GPU) algorithm that can handle a large-scale 3D fast Fourier transform (i.e., 3D-FFT) problem whose data size is larger than the GPU's memory. A 1D FFT-based 3D-FFT computational approach is used to solve the limited device memory issue. Moreover, to reduce the communication overhead between the CPU and GPU, we propose a 3D data-transposition method that converts the target 1D vector into a contiguous memory layout and improves data transfer efficiency. The transposed data are communicated between the host and device memories efficiently through the pinned buffer and multiple streams. We apply our method to various large-scale benchmarks and compare its performance with the state-of-the-art multicore CPU FFT library (i.e., fastest Fourier transform in the West [FFTW]) and a prior GPU-based 3D-FFT algorithm. Our method achieves a higher performance (up to 2.89 times) than FFTW; it yields more performance gaps as the data size increases. The performance of the prior GPU algorithm decreases considerably in massive-scale problems, whereas our method's performance is stable.

The Novel OverDriving Technology with Optimum Look-Up-Table

  • Huang, Ming-Wei;Huang, Juin-Ying;Tseng, Wen-Tse;Yu, Hong-Tien
    • 한국정보디스플레이학회:학술대회논문집
    • /
    • 2005.07a
    • /
    • pp.276-279
    • /
    • 2005
  • The proposed Novel OverDriving Technology with Optimum Look-up-Table(LUT)is suggested to be a better solution to reducing the occurrence of overshooting caused by the traditional overdriving method implemented to normally liquid crystal (VA mode). Chunghwa Picture Tubes, LTD. (CPT) has successfully implement this tech into 20 inch TFTLCD TV module at the present day. The proposed technology can speed up gray-to-gray response time of LCD less than one half of frame time. On the side, Optimum LUT construction apply the compression scheme to record total actual grayscale transfer characteristics instead of applying the normal spread method such as linear / non-linear interpolation. The memory space is been reduced and the distortion of the image quality is lesser.

  • PDF

Design of FPGA in Power Control Unit for Control Rod Control System (원자로 제어봉 구동장치 제어시스템용 전력제어기 FPGA 설계)

  • Lee, Jong-Moo;Shin, Jong-Ryeol;Kim, Choon-Kyung;Park, Min-Kook;Kwon, Soon-Man
    • Proceedings of the KIEE Conference
    • /
    • 2003.11c
    • /
    • pp.563-566
    • /
    • 2003
  • We have designed the power control unit which belongs to the power cabinet and controls the power supplied to Control Rod Drive Mechanism(CRDM) as a digital system based on Digital Signal Processor(DSP). The power control unit dualized as the form of Master/Slave has had its increased reality. The Central Process Unit(CPU) board of a power control unit possesses two Digital Signal Processors(DSPs) of the control DSP for performing the tasks of power control and system monitoring and the communication of the Control DSP and the Communication DSP. To accomplish the functions requested in the power control unit effectively, we have installed Field Programmable Gate Arrays(FPGAS) on the CPU board and have FPGAs perform the memory mapping, the generation of each chip selection signal, the giving and receiving of the signals between the power controllers dualized, the fault detection and the generation of the firing signals.

  • PDF

Implementation of 3-D Collision Avoidance Algorithm and Comparison of Micro Controller Unit's Performance using Real-Time Operating System (항공기 3차원 충돌회피 알고리즘 구현과 실시간 운영체계를 이용한 Micro Controller Unit의 성능 비교)

  • Lim, Ji-Sung;Kim, Dong-Sin;Park, In-Hyeok;Lee, Sangchul
    • Journal of Aerospace System Engineering
    • /
    • v.12 no.5
    • /
    • pp.48-53
    • /
    • 2018
  • In this study, Real-Time Operating System(RTOS) and 3-D collision avoidance algorithm are implemented to three different Miciro Controller Unit(MCU)s and their performances compared. We selected Microchip Technology's ATmega2560, STM's ARM Cortex-M3 and ARM Cortex-M4, because they are widely used. FreeRTOS, an open-source operating system, was also used. The 3D collision avoidance algorithm consists of the vertical and the horizontal avoidance algorithm, which is implemented using C++. The performances of the MCUs were compared with respect to used memory and calculation time. As a result, Cortex-M4's calculation time was the fastest and ATmega2560 used least memory.

Analysis of Tensor Processing Unit and Simulation Using Python (텐서 처리부의 분석 및 파이썬을 이용한 모의실행)

  • Lee, Jongbok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.19 no.3
    • /
    • pp.165-171
    • /
    • 2019
  • The study of the computer architecture has shown that major improvements in price-to-energy performance stems from domain-specific hardware development. This paper analyzes the tensor processing unit (TPU) ASIC which can accelerate the reasoning of the artificial neural network (NN). The core device of the TPU is a MAC matrix multiplier capable of high-speed operation and software-managed on-chip memory. The execution model of the TPU can meet the reaction time requirements of the artificial neural network better than the existing CPU and the GPU execution models, with the small area and the low power consumption even though it has many MAC and large memory. Utilizing the TPU for the tensor flow benchmark framework, it can achieve higher performance and better power efficiency than the CPU or CPU. In this paper, we analyze TPU, simulate the Python modeled OpenTPU, and synthesize the matrix multiplication unit, which is the key hardware.

HARP의 캐쉬 메모리 및 메모리 관리 유니트 구조 설계

  • Lee, Gyu-Ho;Gang, Ik-Tae
    • ETRI Journal
    • /
    • v.10 no.3
    • /
    • pp.49-61
    • /
    • 1988
  • HARP(High-performance Architecture for Risc-type Processor)는 한국전자통신연구소에서 정의한 고유모델의 RISC형 32비트 CPU이다. HACAM(HArp CAche and Mmu)은 HARP의 캐쉬 메모리 및 MMU(Memory Management Unit)를 custom IC로 구현한 VLSI 칩이다. 본 논문에서는 HACAM의 구조 설계에 대해 메모리 구조 및 메모리 관리 방식, 캐쉬 메모리 및 HACAM의 구성 등으로 나누어 설명하고 그 타당성을 논하였다.

  • PDF

Efficient FTL Mapping Management for Multiple Sector Size-based Storage Systems with NAND Flash Memory (다중 섹터 사이즈를 지원하는 낸드 플래시 메모리 기반의 저장장치를 위한 효율적인 FTL 매핑 관리 기법)

  • Lim, Seung-Ho;Choi, Min
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.12
    • /
    • pp.1199-1203
    • /
    • 2010
  • Data transfer between host system and storage device is based on the data unit called sector, which can be varied depending on computer systems. If NAND flash memory is used as a storage device, the variant sector size can affect storage system performance since its operation is much related to sector size and page size. In this paper, we propose an efficient FTL mapping management scheme to support multiple sector size within one NAND flash memory based storage device, and analyze the performance effect and management overhead. According to the proposed scheme, the management overhead of proposed FTL management is lower than conventional scheme when various sector sizes are configured in computer systems, while performance is less degraded in comparison with single sector size support system.

Spark Framework Based on a Heterogenous Pipeline Computing with OpenCL (OpenCL을 활용한 이기종 파이프라인 컴퓨팅 기반 Spark 프레임워크)

  • Kim, Daehee;Park, Neungsoo
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.67 no.2
    • /
    • pp.270-276
    • /
    • 2018
  • Apache Spark is one of the high performance in-memory computing frameworks for big-data processing. Recently, to improve the performance, general-purpose computing on graphics processing unit(GPGPU) is adapted to Apache Spark framework. Previous Spark-GPGPU frameworks focus on overcoming the difficulty of an implementation resulting from the difference between the computation environment of GPGPU and Spark framework. In this paper, we propose a Spark framework based on a heterogenous pipeline computing with OpenCL to further improve the performance. The proposed framework overlaps the Java-to-Native memory copies of CPU with CPU-GPU communications(DMA) and GPU kernel computations to hide the CPU idle time. Also, CPU-GPU communication buffers are implemented with switching dual buffers, which reduce the mapped memory region resulting in decreasing memory mapping overhead. Experimental results showed that the proposed Spark framework based on a heterogenous pipeline computing with OpenCL had up to 2.13 times faster than the previous Spark framework using OpenCL.