• Title/Summary/Keyword: Large-memory data processing

Search Result 192, Processing Time 0.025 seconds

Implementation of the FAT32 File System using PLC and CF Memory (PLC와 CF 메모리를 이용한 FAT32 파일시스템 구현)

  • Kim, Myeong Kyun;Yang, Oh;Chung, Won Sup
    • Journal of the Semiconductor & Display Technology
    • /
    • v.11 no.2
    • /
    • pp.85-91
    • /
    • 2012
  • In this paper, the large data processing and suitable FAT32 file system for industrial system using a PLC and CF memory was implemented. Most of PLC can't save the large data in user data memory. So it's required to the external devices of CF memory or NAND flash memory. The CF memory is used in order to save the large data of PLC system. The file system using the CF memory is NTFS, FAT, and FAT32 system to configure in various ways. Typically, the file system which is widely used in industrial data storage has been implemented as modified FAT32. The conventional FAT 32 file system was not possible for multiple writing and high speed data accessing. The proposed file system was implemented by the large data processing module can be handled that the files are copied at the 40 bytes for 1msec speed logging and creating 8 files at the same time. In a sudden power failure, high reliability was obtained that the problem was solved using a power fail monitor and the non-volatile random-access memory (NVSRAM). The implemented large data processing system was applied the modified file system as FAT32 and the good performance and high reliability was showed.

Trends in Compute Express Link(CXL) Technology (CXL 인터커넥트 기술 연구개발 동향)

  • S.Y. Kim;H.Y. Ahn;Y.M. Park;W.J. Han
    • Electronics and Telecommunications Trends
    • /
    • v.38 no.5
    • /
    • pp.23-33
    • /
    • 2023
  • With the widespread demand from data-intensive tasks such as machine learning and large-scale databases, the amount of data processed in modern computing systems is increasing exponentially. Such data-intensive tasks require large amounts of memory to rapidly process and analyze massive data. However, existing computing system architectures face challenges when building large-scale memory owing to various structural issues such as CPU specifications. Moreover, large-scale memory may cause problems including memory overprovisioning. The Compute Express Link (CXL) allows computing nodes to use large amounts of memory while mitigating related problems. Hence, CXL is attracting great attention in industry and academia. We describe the overarching concepts underlying CXL and explore recent research trends in this technology.

Implementation of External Memory Expansion Device for Large Image Processing (대규모 영상처리를 위한 외장 메모리 확장장치의 구현)

  • Choi, Yongseok;Lee, Hyejin
    • Journal of Broadcast Engineering
    • /
    • v.23 no.5
    • /
    • pp.606-613
    • /
    • 2018
  • This study is concerned with implementing an external memory expansion device for large-scale image processing. It consists of an external memory adapter card with a PCI(Peripheral Component Interconnect) Express Gen3 x8 interface mounted on a graphics workstation for image processing and an external memory board with external DDR(Dual Data Rate) memory. The connection between the memory adapter card and the external memory board is made through the optical interface. In order to access the external memory, both Programmable I/O and DMA(Direct Memory Access) methods can be used to efficiently transmit and receive image data. We implemented the result of this study using the boards equipped with Altera Stratix V FPGA(Field Programmable Gate Array) and 40G optical transceiver and the test result shows 1.6GB/s bandwidth performance.. It can handle one channel of 4K UHD(Ultra High Density) image. We will continue our study in the future for showing bandwidth of 3GB/s or more.

Large-Memory Data Processing on a Remote Memory System using Commodity Hardware (대용량 메모리 데이타 처리를 위한 범용 하드웨어 기반의 원격 메모리 시스템)

  • Jung, Hyung-Soo;Han, Hyuck;Yeom, Heon-Y.
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.34 no.9
    • /
    • pp.445-458
    • /
    • 2007
  • This article presents a novel infrastructure for large-memory database processing using commodity hardware with operating system support. We exploit inexpensive PCs and a high-speed network capable of Remote Direct Memory Access (RDMA) operations to build a new memory hierarchy between fast volatile memory and slow disk storage. The new memory hierarchy guarantees a reasonable response time, and its storage size enables us to run large-memory database systems with little performance degradation. The proposed architecture has two main components: (1) a remote memory system inside the Linux kernel to manage other computers' memory pages efficiently and (2) a remote memory pager responsible for manipulating remote read/write operations on remote memory pages. We insist that the proposed architecture is practical enough to support the rigorous demands of commercial in-memory database systems by demonstrating the performance of publicly available main-memory databases (e.g., MySQL) on our prototyped system. The experimental results show very interesting results from the TPC-C benchmark.

Two-Tier Storage DBMS for High-Performance Query Processing

  • Eo, Sang-Hun;Li, Yan;Kim, Ho-Seok;Bae, Hae-Young
    • Journal of Information Processing Systems
    • /
    • v.4 no.1
    • /
    • pp.9-16
    • /
    • 2008
  • This paper describes the design and implementation of a two-tier DBMS for handling massive data and providing faster response time. In the present day, the main requirements of DBMS are figured out using two aspects. The first is handling large amounts of data. And the second is providing fast response time. But in fact, Traditional DBMS cannot fulfill both the requirements. The disk-oriented DBMS can handle massive data but the response time is relatively slower than the memory-resident DBMS. On the other hand, the memory-resident DBMS can provide fast response time but they have original restrictions of database size. In this paper, to meet the requirements of handling large volumes of data and providing fast response time, a two-tier DBMS is proposed. The cold-data which does not require fast response times are managed by disk storage manager, and the hot-data which require fast response time among the large volumes of data are handled by memory storage manager as snapshots. As a result, the proposed system performs significantly better than disk-oriented DBMS with an added advantage to manage massive data at the same time.

An implementation of the high speed image processing board for contact image sensor (Contact image sensor를 위한 고속 영상 처리 보드 구현)

  • Kang, Hyun-Inn;Ju, Yong-Wan;Baek, Kwang-Ryul
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.5 no.6
    • /
    • pp.691-697
    • /
    • 1999
  • This paper describes the implementation of a high speed image processing board. This image processing board is consist of a image acquisition part and a image processing part. The image acquistion part is digitizing the image input data from CIS and save it to the dual port RAM. By putting on the dual port memory between two parts, during acquistion of image, the image processing part can be effectively processing of large-volume image data. Most of all image preprocessing part are integrated in a large-scaled FPGA. We arwe using ADSP-2181 of the Analog Device Inc., LTD. for a image processing part, and using the available all memory of DSP for the large-volume image data. Especially, using of IDMA exchanges the data with the external microprocessor or the external PC, and can watch the result of image processing and acquired image. Finally, we show that an implemented image processing board used for the simulation of image retreval by the one of the typical application.

  • PDF

A Study on Improvement of Low-power Memory Architecture in IoT/edge Computing (IoT/에지 컴퓨팅에서 저전력 메모리 아키텍처의 개선 연구)

  • Cho, Doosan
    • Journal of the Korean Society of Industry Convergence
    • /
    • v.24 no.1
    • /
    • pp.69-77
    • /
    • 2021
  • The widely used low-cost design methodology for IoT devices is very popular. In such a networked device, memory is composed of flash memory, SRAM, DRAM, etc., and because it processes a large amount of data, memory design is an important factor for system performance. Therefore, each device selects optimized design factors such as function, performance and cost according to market demand. The design of a memory architecture available for low-cost IoT devices is very limited with the configuration of SRAM, flash memory, and DRAM. In order to process as much data as possible in the same space, an architecture that supports parallel processing units is usually provided. Such parallel architecture is a design method that provides high performance at low cost. However, it needs precise software techniques for instruction and data mapping on the parallel architecture. This paper proposes an instruction/data mapping method to support optimized parallel processing performance. The proposed method optimizes system performance by actively using hardware and software parallelism.

Locally weighted linear regression prefetching method for hybrid memory system (하이브리드 메모리 시스템의 지역 가중 선형회귀 프리페치 방법)

  • Tang, Qian;Kim, Jeong-Geun;Kim, Shin-Dug
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2020.11a
    • /
    • pp.12-15
    • /
    • 2020
  • Data access characteristics can directly affect the efficiency of the system execution. This research is to design an accurate predictor by using historical memory access information, where highly accessible data can be migrated from low-speed storage (SSD/HHD) to high-speed memory (Memory/CPU Cache) in advance, thereby reducing data access latency and further improving overall performance. For this goal, we design a locally weighted linear regression prefetch scheme to cope with irregular access patterns in large graph processing applications for a DARM-PCM hybrid memory structure. By analyzing the testing result, the appropriate structural parameters can be selected, which greatly improves the cache prefetching performance, resulting in overall performance improvement.

Design of the new parallel processing architecture for commercial applications (상용 응용을 위한 병렬처리 구조 설계)

  • 한우종;윤석한;임기욱
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.33B no.5
    • /
    • pp.41-51
    • /
    • 1996
  • In this paper, anew parallel processing system based on a cluster architecture which provides scalability of a parallel processing system while maintains shared memory multiprocessor characteristics is proposed. In recent days low cost, high performnce microprocessors have led to construction of large scale parallel processing systems. Such parallel processing systems provides large scalability but are mainly used for scientific applications which have large data parallelism. A shared memory multiprocessor system like TICOM is currently used as aserver for the commercial application, however, the shared memory multiprocessor system is known to have very limited scalability. The proposed architecture can support scalability and performance of the parallel processing system while it provides adaptability for the commerical application, hence it can overcome the limitation of the shared memory multiprocessor. The architecture and characteristics of the proposed system shall be described. A proprietary hierarchical crsossbar network is designed for this system, of which the protocol, routing and switching technique and the signal transfer technique are optimized for the proposed architecture. The design trade-offs for the network are described in this paper and with simulation usihng the SES/workbench, it is explored that the network fits to the proposed architecture.

  • PDF

Cycle-accurate NPU Simulator and Performance Evaluation According to Data Access Strategies (Cycle-accurate NPU 시뮬레이터 및 데이터 접근 방식에 따른 NPU 성능평가)

  • Kwon, Guyun;Park, Sangwoo;Suh, Taeweon
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.17 no.4
    • /
    • pp.217-228
    • /
    • 2022
  • Currently, there are increasing demands for applying deep neural networks (DNNs) in the embedded domain such as classification and object detection. The DNN processing in embedded domain often requires custom hardware such as NPU for acceleration due to the constraints in power, performance, and area. Processing DNN models requires a large amount of data, and its seamless transfer to NPU is crucial for performance. In this paper, we developed a cycle-accurate NPU simulator to evaluate diverse NPU microarchitectures. In addition, we propose a novel technique for reducing the number of memory accesses when processing convolutional layers in convolutional neural networks (CNNs) on the NPU. The main idea is to reuse data with memory interleaving, which recycles the overlapping data between previous and current input windows. Data memory interleaving makes it possible to quickly read consecutive data in unaligned locations. We implemented the proposed technique to the cycle-accurate NPU simulator and measured the performance with LeNet-5, VGGNet-16, and ResNet-50. The experiment shows up to 2.08x speedup in processing one convolutional layer, compared to the baseline.