Search | Korea Science

WARP: Memory Subsystem Effective for Wrapping Bursts of a Cache

Jang, Wooyoung
- ETRI Journal
- /
- v.39 no.3
- /
- pp.428-436
- /
- 2017
State-of-the-art processors require increasingly complicated memory services for high performance and low power consumption. In particular, they request transfers within a burst in a wrap-around order to minimize the miss penalty of a cache. However, synchronous dynamic random access memories (SDRAMs) do not always generate transfers in the wrap-round order required by the processors. Thus, a memory subsystem rearranges the SDRAM transfers in the wrap-around order, but the rearrangement process may increase memory latency and waste the bandwidth of on-chip interconnects. In this paper, we present a memory subsystem that is effective for the wrapping bursts of a cache. The proposed memory subsystem makes SDRAMs generate transfers in an intermediate order, where the transfers are rearranged in the wrap-around order with minimal penalties. Then, the transfers are delivered with priority, depending on the program locality in space. Experimental results showed that the proposed memory subsystem minimizes the memory performance loss resulting from wrapping bursts and, thus, improves program execution time.
https://doi.org/10.4218/etrij.17.0116.0710 인용 PDF KSCI

TCP/IP Using Minimal Resources in IoT Systems

Lee, Seung-Chul;Shin, Dongha
- Journal of the Korea Society of Computer and Information
- /
- v.25 no.10
- /
- pp.125-133
- /
- 2020
In this paper, we design 4-layer TCP/IP that utilizes minimal memory and processor resources in Internet of Things(IoT) systems. The TCP/IP designed in this paper has the following characteristics. First, memory resource is minimized by using minimal memory allocation. Second, processor resource is minimized by using minimal memory copy. Third, the execution time of the TCP/IP can be completed in a deterministic time. Fourth, there is no memory leak problem. The standard in minimal resources for memory and processor derived in this paper can be used to check whether the network subsystems of the already implemented IoT systems are efficiently implemented. As the result of measuring the amount of memory allocation and copy of the network subsystem of Zephyr, an open source IoT kernel recently released by the Linux Foundation, we found that it was bigger than the standard in minimal resources derived in this paper. The network subsystem of Zephyr was improved according to the design proposed in this paper, confirming that the amount of memory allocation and copy were decreased by about 39% and 67%, respectively, and the execution time was also reduced by about 28%.
https://doi.org/10.9708/jksci.2020.25.10.125 인용 PDF KSCI

Automatic Detection of Memory Subsystem Parameters for Embedded Systems (임베디드 시스템을 위한 메모리 서브시스템 파라미터의 자동 검출)

Ha, Tae-Jun;Seo, Sang-Min;Chun, Po-Sung;Lee, Jae-Jin
- Journal of KIISE:Computing Practices and Letters
- /
- v.15 no.5
- /
- pp.350-354
- /
- 2009
To optimize the performance of software programs, it is important to know certain hardware parameters such as the CPU speed, the cache size, the number of TLB entries, and the parameters of the memory subsystem. There exist several ways to obtain the values of various hardware parameters. Firstly. the values can be taken from the hardware manual. Secondly, the parameters can be obtained by calling functions provided by the operating systems. Finally, hardware detection programs can find the desired values. Such programs are usually executed on PC or server systems and report the CPU speed, the cache size, the number of TLB entries, and so on. However, they do not sufficiently detect the parameters of one of the most important parts of the computer concerning performance, namely the memory bank layout in the memory subsystem. In this paper, we present an algorithm to detect the memory bank parameters. We run an implementation of our algorithm on various embedded systems and compare the detected values with the real hardware parameters. The results show that the presented algorithm detects the cache size, the number of TLB entries, and the memory bank layout with high accuracy.
PDF KSCI

The Biological Base of Learing and Memory(I):A Neuropsychological Review (학습과 기억의 생물학적 기초(I):신경심리학적 개관)

MunsooKim
- Korean Journal of Cognitive Science
- /
- v.7 no.3
- /
- pp.7-36
- /
- 1996
Recebt neuropsychological studies on neurobiological bases of learning and memory in humans are reviewed. At present, cognitive psychologists belive that memory is not a unitary system. But copmosed of several independent subsystems. Adoption this perspective,this paper summarized findings regarding what kinds of memory discorders result from lesions of which brain areas and which brain areas are activated by what kind of learning/memory tasks. Short-term memory seems to involve widespread areas around the boundaries among the parietal,occipital,and temporal lobes,depending on the type of the type of the tasks and the way of presentation of the stimuli. Implicit memory,a subsystem of long-term memory,is not a unitary system itself. Thus,brain areas involved in implicit memory tasks used. It is well-known that medial temporal lobe is necessary for formation(i,e.,consolidation)of explicit memory,another subsystem of long-term memory. Storage and/or retrieval of episodic and semantic memory involve temporal neocortex. Perfromtal cortex seemas to be involved in several aspects of memory such as short term memory and retrieval of espisodic and semantic memory. Finally, a popular view on the locus of long-term memory storage is described.
PDF

Distributed Simulator for General Control System in CEMTool

Lee, Tai-Ri;Lee, Young-Sam;Lee, Kwan-Ho;Kwon, Wook-Hyun
- 제어로봇시스템학회:학술대회논문집
- /
- 2003.10a
- /
- pp.2230-2234
- /
- 2003
This paper proposes a distributed simulator for general control system in CEMTool. Systems can be described by SIMTool likes the simulink in Matlab. For distributed simulation, we can seperate any system into several parallel subsystems in SIMTool. The number of parallel subsystem can be determined by the system's property. After seperation, parallel simulator will do initialization, one-step-ahead simulation, block-distribution and ordering and so on. Finally, simulator will create independent C codes and executive files for each subsystem. The whole system is fulfilled by several PCs, and each PC executes one subsystem. There are communications among these subsystem using reflective memory or ethernet. We have made several experiments, and the 5-stand cold rolling mill control system is our main target. The result of parallel simulation has shown effective speedup in comparison with one pc simulation.
PDF

OSEK PS Kernel Mechanisms for Reducing Dynamic Memory Usage (동적 메모리 사용 감소를 위한 OSEK OS 커널 구현 메커니즘)

Lim, Jin-Tack;Keum, Han-Hong;Park, Ji-Yong;Hong, Seong-Soo
- Transactions of the Korean Society of Automotive Engineers
- /
- v.17 no.3
- /
- pp.127-141
- /
- 2009
While the ever-increasing complexity of automotive software systems can be effectively managed through the adoption of a reliable real-time operating system (RTOS), it may incur additional resource usage to a resultant system. Due to the mass production nature of the automotive industry, reducing physical resources used by automotive software is of the utmost importance for cost reduction. OSEK OS is an automotive real-time kernel standard specifically defined to address this issue. Thus, it is very important to develop and exploit kernel mechanisms such that they can achieve minimal resource usage in the OSEK OS implementation. In this paper, we analyze the task subsystem, resource subsystem, application mode and conformance classes of OSEK OS as well as the OSEK Implementation Language (OIL). Based on our analysis, we in turn devise and implement kernel mechanisms to minimize the dynamic memory usage of the OSEK OS implementation. Finally, we show that our mechanisms effectively reduce the memory usage of OSEK OS and applications.
PDF KSCI

An Optimized File System for SSD (SSD를 위한 최적화 파일시스템)

Park, Je-Ho
- Journal of the Semiconductor & Display Technology
- /
- v.9 no.2
- /
- pp.67-72
- /
- 2010
Recently increasing application of flash memory in mobile and ubiquitous related devices is due to its non-volatility, fast response time, shock resistance and low power consumption. Following this trend, SSD(Solid State Disk) using multiple flash chips, instead of hard-drive based storage system, started to widely used for its advantageous features. However, flash memory based storage subsystem should resolve the performance bottleneck for writing in perspective of speed and lifetime according to its disadvantageous physical property. In order to provide tangible performance, solutions are studied in aspect of reclaiming of invalid regions by decreasing the number of erasures and distributing the erasures uniformly over the whole memory space as much as possible. In this paper, we study flash memory recycling algorithms with multiple management units and demonstrate that the proposed algorithm provides feasible performance. The proposed method utilizes the partitions of the memory space by utilizing threshold values and reconfigures the management units if necessary. The performance of the proposed policies is evaluated through a number of simulation based experiments.
PDF KSCI

File System for Performance Improvement in Multiple Flash Memory Chips (다중 플래시 메모리 기반 파일시스템의 성능개선을 위한 파일시스템)

Park, Je-Ho
- Journal of the Semiconductor & Display Technology
- /
- v.7 no.3
- /
- pp.17-21
- /
- 2008
Application of flash memory in mobile and ubiquitous related devices is rapidly being increased due to its low price and high performance. In addition, some notebook computers currently come out into market with a SSD(Solid State Disk) instead of hard-drive based storage system. Regarding this trend, applications need to increase the storage capacity using multiple flash memory chips for larger capacity sooner or later. Flash memory based storage subsystem should resolve the performance bottleneck for writing in perspective of speed and lifetime according to its physical property. In order to make flash memory storage work with tangible performance, reclaiming of invalid regions needs to be controlled in a particular manner to decrease the number of erasures and to distribute the erasures uniformly over the whole memory space as much as possible. In this paper, we study the performance of flash memory recycling algorithms and demonstrate that the proposed algorithm shows acceptable performance for flash memory storage with multiple chips. The proposed cleaning method partitions the memory space into candidate memory regions, to be reclaimed as free, by utilizing threshold values. The proposed algorithm handles the storage system in multi-layered style. The impact of the proposed policies is evaluated through a number of experiments.
PDF

Improving the Read Performance of OneNAND Flash Memory using Virtual I/O Segment (가상 I/O 세그먼트를 이용한 OneNAND 플래시 메모리의 읽기 성능 향상 기법)

Hyun, Seung-Hwan;Koh, Kern
- Journal of KIISE:Computing Practices and Letters
- /
- v.14 no.7
- /
- pp.636-645
- /
- 2008
OneNAND flash is a high-performance hybrid flash memory that combines the advantages of both NAND flash and NOR flash. OneNAND flash has not only all virtues of NAND flash but also greatly enhanced read performance which is considered as a downside of NAND flash. As a result, it is widely used in mobile applications such as mobile phones, digital cameras, PMP, and portable game players. However, most of the general purpose operating systems, such as Linux, can not exploit the read performance of OneNAND flash because of the restrictions imposed by their virtual memory system and block I/O architecture. In order to solve that problem, we suggest a new approach called virtual I/O segment. By using virtual I/O segment, the superior read performance of OneNAND flash can be exploited without modifying the existing block I/O architecture and MTD subsystem. Experiments by implementations show that this approach can reduce read latency of OneNAND flash as much as 54%.
PDF KSCI

Code Transformation Techniques for Scratch-Pad Memory (Scratch-Pad Memory를 위한 코드 변환 기법)

문대경;이재진
- Proceedings of the Korean Information Science Society Conference
- /
- 2004.10a
- /
- pp.577-579
- /
- 2004
전원을 전적으로 배터리에 의존하는 모바일 임베디드 시스템은 배터리 용량의 한계 때문에 효율적인 에너지의 사용이 매우 중요하다. 특히 memory subsystem은 전체 system에서 소모되는 에너지에서 큰 비중을 차지한다. 이 논문은 성능 면에서 cache의 대안이 되고, cache보다 간단한 구조 때문에 전력소모가 훨씬 적은 on-chip scratch-pad memory(SPM)를 효율적으로 이용할 수 있는 소스 코드 변환 방법 및 SPM 관리방법을 제안한다. 각 함수 단위로 코드 변환을 하며, 어떤 변수를 SPM에 할당하기 위한 소스코드 변환을 했을 때, 소스코드 분석만으로 알 수 있는 변수의 정적인 참조 횟수를 가중치로 고려하여, 코드 변환 후 메모리 참조에 의한 실행 시간과 에너지 소모를 계산하고 이를 바탕으로 SPM에 할당한 변수를 결정한 다음 실제 그 코드 변환을 적용한다. 제안된 코드 변환은 컴파일러에 의해 자동화 될 수 있다. 10개의 임베디드 벤치마크 프로그램을 이용하여 본 논문에서 제안하는 방법의 성능 평가를 한 결과, 실행 시간은 평균 23% 향상되고 에너지 소모는 평균 49% 감소함을 알 수 있다.
PDF

Search Result 35, Processing Time 0.026 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)