• Title/Summary/Keyword: Embedded memory

Search Result 723, Processing Time 0.027 seconds

High Speed and Robust Processor based on Parallelized Error Correcting Code Module (병렬화된 에러 보정 코드 모듈 기반 프로세서 속도 및 신뢰도 향상)

  • Kang, Myeong-jin;Park, Daejin
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.9
    • /
    • pp.1180-1186
    • /
    • 2020
  • One of the Embedded systems Tiny Processing Unit (TPU) usually acts in harsh environments like external shock or insufficient power. In these cases, data could be polluted, and cause critical problems. As a solution to data pollution, many embedded systems are using Error Correcting Code (ECC) to protect and restore data. However, ECC processing in TPU increases the overall processing time by increasing the time of instruction fetch which is the bottleneck. In this paper, we propose an architecture of parallelized ECC block to the reduce bottleneck of TPU. The proposed architecture results in the reduction of time 10% compared to the original model, although memory usage increased slightly. The test is evaluated with a matrix product that has various instructions. TPU with proposed parallelized ECC block shows 7% faster than the original TPU with ECC and was able to perform the proposed test accurately.

Fine-scalable SPIHT Hardware Design for Frame Memory Compression in Video Codec

  • Kim, Sunwoong;Jang, Ji Hun;Lee, Hyuk-Jae;Rhee, Chae Eun
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.17 no.3
    • /
    • pp.446-457
    • /
    • 2017
  • In order to reduce the size of frame memory or bus bandwidth, frame memory compression (FMC) recompresses reconstructed or reference frames of video codecs. This paper proposes a novel FMC design based on discrete wavelet transform (DWT) - set partitioning in hierarchical trees (SPIHT), which supports fine-scalable throughput and is area-efficient. In the proposed design, multi-cores with small block sizes are used in parallel instead of a single core with a large block size. In addition, an appropriate pipelining schedule is proposed. Compared to the previous design, the proposed design achieves the processing speed which is closer to the target system speed, and therefore it is more efficient in hardware utilization. In addition, a scheme in which two passes of SPIHT are merged into one pass called merged refinement pass (MRP) is proposed. As the number of shifters decreases and the bit-width of remained shifters is reduced, the size of SPIHT hardware significantly decreases. The proposed FMC encoder and decoder designs achieve the throughputs of 4,448 and 4,000 Mpixels/s, respectively, and their gate counts are 76.5K and 107.8K. When the proposed design is applied to high efficiency video codec (HEVC), it achieves 1.96% lower average BDBR and 0.05 dB higher average BDPSNR than the previous FMC design.

Design of an Asynchronous Instruction Cache based on a Mixed Delay Model (혼합 지연 모델에 기반한 비동기 명령어 캐시 설계)

  • Jeon, Kwang-Bae;Kim, Seok-Man;Lee, Je-Hoon;Oh, Myeong-Hoon;Cho, Kyoung-Rok
    • The Journal of the Korea Contents Association
    • /
    • v.10 no.3
    • /
    • pp.64-71
    • /
    • 2010
  • Recently, to achieve high performance of the processor, the cache is splits physically into two parts, one for instruction and one for data. This paper proposes an architecture of asynchronous instruction cache based on mixed-delay model that are DI(delay-insensitive) model for cache hit and Bundled delay model for cache miss. We synthesized the instruction cache at gate-level and constructed a test platform with 32-bit embedded processor EISC to evaluate performance. The cache communicates with the main memory and CPU using 4-phase hand-shake protocol. It has a 8-KB, 4-way set associative memory that employs Pseudo-LRU replacement algorithm. As the results, the designed cache shows 99% cache hit ratio and reduced latency to 68% tested on the platform with MI bench mark programs.

FPGA Implementation of SURF-based Feature extraction and Descriptor generation (SURF 기반 특징점 추출 및 서술자 생성의 FPGA 구현)

  • Na, Eun-Soo;Jeong, Yong-Jin
    • Journal of Korea Multimedia Society
    • /
    • v.16 no.4
    • /
    • pp.483-492
    • /
    • 2013
  • SURF is an algorithm which extracts feature points and generates their descriptors from input images, and it is being used for many applications such as object recognition, tracking, and constructing panorama pictures. Although SURF is known to be robust to changes of scale, rotation, and view points, it is hard to implement it in real time due to its complex and repetitive computations. Using 3.3 GHz Pentium, in our experiment, it takes 240ms to extract feature points and create descriptors in a VGA image containing about 1,000 feature points, which means that software implementation cannot meet the real time requirement, especially in embedded systems. In this paper, we present a hardware architecture that can compute the SURF algorithm very fast while consuming minimum hardware resources. Two key concepts of our architecture are parallelism (for repetitive computations) and efficient line memory usage (obtained by analyzing memory access patterns). As a result of FPGA synthesis using Xilinx Virtex5LX330, it occupies 101,348 LUTs and 1,367 KB on-chip memory, giving performance of 30 frames per second at 100 MHz clock.

Resistive Switching Properties of Cr-Doped SrZrO3 Thin Film on Si Substrate (실리콘 기판위에서의 Cr-Doped SrZrO3 박막의 저항변화 특성)

  • Yang, Min-Kyu;Ko, Tae-Kuk;Park, Jae-Wan;Lee, Jeon-Kook
    • Korean Journal of Materials Research
    • /
    • v.20 no.5
    • /
    • pp.241-245
    • /
    • 2010
  • One of the weak points of the Cr-doped SZO is that until now, it has only been fabricated on perovskite substrates, whereas NiO-ReRAM devices have already been deposited on Si substrates. The fabrication of RAM devices on Si substrates is important for commercialization because conventional electronics are based mainly on silicon materials. Cr-doped ReRAM will find a wide range of applications in embedded systems or conventional memory device manufacturing processes if it can be fabricated on Si substrates. For application of the commercial memory device, Cr-doped $SrZrO_3$ perovskite thin films were deposited on a $SrRuO_3$ bottom electrode/Si(100)substrate using pulsed laser deposition. XRD peaks corresponding to the (112), (004) and (132) planes of both the SZO and SRO were observed with the highest intensity along the (112) direction. The positions of the SZO grains matched those of the SRO grains. A well-controlled interface between the $SrZrO_3$:Cr perovskite and the $SrRuO_3$ bottom electrode were fabricated, so that good resistive switching behavior was observed with an on/off ratio higher than $10^2$. A pulse test showed the switching behavior of the Pt/$SrZrO_3:Cr/SrRuO^3$ device under a pulse of 10 kHz for $10^4$ cycles. The resistive switching memory devices made of the Cr-doped $SrZrO_3$ thin films deposited on Si substrates are expected to be more compatible with conventional Si-based electronics.

Block Associativity Limit Scheme for Efficient Flash Translation Layer (효율적인 플래시 변환 계층을 위한 블록 연관성 제한 기법)

  • Ok, Dong-Seok;Lee, Tae-Hoon;Chung, Ki-Dong
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.6
    • /
    • pp.673-677
    • /
    • 2010
  • Recently, NAND flash memory has been widely used in embedded systems, personal computers, and server systems because of its attractive features, such as non-volatility, fast access speed, shock resistance, and low power consumption. Due to its hardware characteristics, specifically its 'erase-before-write' feature, Flash Translation Layer is required for using flash memory like hard disk drive. Many FTL schemes have been proposed, but conventional FTL schemes have problems such as block thrashing and block associativity problem. The KAST scheme tried to solve these problems by limiting the number of associations between data block and log block to K. But it has also block thrashing problem in random access I/O pattern. In this paper, we proposed a new FTL scheme, UDA-LBAST. Like KAST, the proposed scheme also limits the log block association, but does not limit data block association. So we could minimize the cost of merge operations, and reduce merge costs by using a new block reclaim scheme, log block garbage collection.

Photo-induced Electrical Properties of Metal-oxide Nanocrystal Memory Devices

  • Lee, Dong-Uk;Cho, Seong-Gook;Kim, Eun-Kyu;Kim, Young-Ho
    • Proceedings of the Korean Vacuum Society Conference
    • /
    • 2011.08a
    • /
    • pp.254-254
    • /
    • 2011
  • The memories with nano-particles are very attractive because they are promising candidates for low operating voltage, long retention time and fast program/erase speed. In recent, various nano-floating gate memories with metal-oxide nanocrystals embedded in organic and inorganic layers have been reported. Because of the carrier generation in semiconductor, induced photon pulse enhanced the program/erase speed of memory device. We studied photo-induced electrical properties of these metal-oxide nanocrystal memory devices. At first, 2~10-nm-thick Sn and In metals were deposited by using thermal evaporation onto Si wafer including a channel with $n^+$ poly-Si source/drain in which the length and width are 10 ${\mu}m$ each. Then, a poly-amic-acid (PAA) was spin coated on the deposited Sn film. The PAA precursor used in this study was prepared by dissolving biphenyl-tetracarboxylic dianhydride-phenylene diamine (BPDA-PDA) commercial polyamic acid in N-methyl-2-pyrrolidon (NMP). Then the samples were cured at 400$^{\circ}C$ for 1 hour in N atmosphere after drying at 135$^{\circ}C$ for 30 min through rapid thermal annealing. The deposition of aluminum layer with thickness of 200 nm was followed by using a thermal evaporator, and then the gate electrode was defined by photolithography and etching. The electrical properties were measured at room temperature using an HP4156a precision semiconductor parameter analyzer and an Agilent 81101A pulse generator. Also, the optical pulse for the study on photo-induced electrical properties was applied by Xeon lamp light source and a monochromator system.

  • PDF

Development of Simulator using RAM Disk for FTL Performance Analysis (RAM 디스크를 이용한 FTL 성능 분석 시뮬레이터 개발)

  • Ihm, Dong-Hyuk;Park, Seong-Mo
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.47 no.5
    • /
    • pp.35-40
    • /
    • 2010
  • NAND flash memory has been widely used than traditional HDD in PDA and other mobile devices, embedded systems, PC because of faster access speed, low power consumption, vibration resistance and other benefits. DiskSim and other HDD simulators has been developed that for find improvements for the software or hardware. But there is a few Linux-based simulators for NAND flash memory and SSD. There is necessary for Windows-based NAND flash simulator because storage devices and PC using Windows. This paper describe for development of simulator-NFSim for FTL performance analysis in NAND flash. NFSim is used to measure performance of various FTL algorithms and FTL wear-level. NAND flash memory model and FTL algorithm developed using Windows Driver Model and class for scalability. There is no need for another tools because NFSim using graph tool for data measure of FTL performance.

Improving the Reliability and Performance of the YAFFS Flash File System (YAFFS 플래시 파일시스템의 성능과 안정성 향상)

  • Son, Ik-Joon;Kim, Yu-Mi;Baek, Seung-Jae;Choi, Jong-Moo
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.9
    • /
    • pp.898-903
    • /
    • 2010
  • Popularity of smartphones such as Google Android phones and Apple iphones, is increasing the demand on more reliable high performance file system for flash memory. In this paper, we propose two techniques to improve the performance of YAFFS (Yet Another Flash File System), while enhancing the reliability of the system. Specifically, we first propose to manage metadata and user data separately on segregated blocks and indexing information piggy-back technique for reducing mount time and also enhancing performance. Second, we tailor the wear-leveling to the segregated metadata and user data blocks. Performance evaluation results based on real hardware system with 1GB NAND flash memory show that the YAFFS with our proposed techniques realized outperforms the original YAFFS by six times in terms of mount speed and five times in terms of benchmark performance, while reducing the average erase count of blocks by 14%.

Adaptive Garbage Collection Technique for Hybrid Flash Memory (하이브리드 플래시 메모리를 위한 적응적 가비지 컬렉션 기법)

  • Im, Soo-Jun;Shin, Dong-Kun
    • The KIPS Transactions:PartA
    • /
    • v.15A no.6
    • /
    • pp.335-344
    • /
    • 2008
  • We propose an adaptive garbage collection technique for hybrid flash memory which has both SLC and MLC. Since SLC area is fast and MLC area has low cost, the proposed scheme utilizes the SLC area as log buffer and the MLC area as data block. Considering the high write cost of MLC flash, the garbage collection for the SLC log buffer moves a page into the MLC data block only when the page is cold or the page migration invokes a small cost. The other pages are moved within the SLC log buffer. Also it adjusts the parameter values which determine the operation of garbage collection adaptively considering I/O pattern. From the experiments, we can know that the proposed scheme provides better performance compared with the previous flash management schemes for the hybrid flash and finds the parameter values of garbage collection close to the optimal values.