Search | Korea Science

Exploiting Multi Data Memory Banks in Embedded Systems (임베디드 시스템에서 다중 데이터 메모리 뱅크의 활용)

Cho, Doosan;Yang, Seungjun;Kwon, Yongin;Yi, Hayoon;Kwon, Donghyun;Paek, Yunheung
- Proceedings of the Korea Information Processing Society Conference
- /
- 2013.11a
- /
- pp.46-47
- /
- 2013
지난 수십년 동안 휴대기기 시장의 다양한 요구에 맞추어 임베디드 시스템 기술이 발전되어 왔다. 현재의 임베디드 시스템은 작은 크기의 특화된 하드웨어를 차용하면서도 높은 효율의 성능을 저가에 제공할 수 있는 기술들이 핵심을 이루고 있다. 이러한 핵심 기술들 중의 하나가 다중 메모리 뱅크이다. 예를 들면, 이중 메모리 뱅크는 같은 공간에 두 배의 메모리 대역폭의 제공할 수 있는 특징을 갖는다. 이러한 특징은 이중포트 메모리에 비하여 적은 비용으로 동일한 대역폭을 제공할 수 있는 장점을 제공한다. 그러나 현재까지도 다중 메모리 뱅크의 효율적인 사용을 지원하는 소프트웨어 기술은 부족한 실정이다. 본 연구에서는 다중 메모리 뱅크의 활용 문제를 간섭 그래프 (interference graph)를 이용하여 효과적으로 해결하였다.
https://doi.org/10.3745/PKIPS.y2013m11a.46 인용 PDF

Run-time Memory Optimization Algorithm for the DDMB Architecture (DDMB 구조에서의 런타임 메모리 최적화 알고리즘)

Cho, Jeong-Hun;Paek, Yun-Heung;Kwon, Soo-Hyun
- The KIPS Transactions:PartA
- /
- v.13A no.5 s.102
- /
- pp.413-420
- /
- 2006
Most vendors of digital signal processors (DSPs) support a Harvard architecture, which has two or more memory buses, one for program and one or more for data and allow the processor to access multiple words of data from memory in a single instruction cycle. We already addressed how to efficiently assign data to multi-memory banks in our previous work. This paper reports on our recent attempt to optimize run-time memory. The run-time environment for dual data memory banks (DBMBs) requires two run-time stacks to control activation records located in two memory banks corresponding to calling procedures. However, activation records of two memory banks for a procedure are able to have different size. As a consequence, dual run-time stacks can be unbalanced whenever a procedure is called. This unbalance between two memory banks causes that usage of one memory bank can exceed the extent of on-chip memory area although there is free area in the other memory bank. We attempt balancing dual run-time slacks to enhance efficiently utilization of on-chip memory in this paper. The experimental results have revealed that although our algorithm is relatively quite simple, it still can utilize run-time memories efficiently; thus enabling our compiler to run extremely fast, yet minimizing the usage of un-time memory in the target code.
https://doi.org/10.3745/KIPSTA.2006.13A.5.413 인용 PDF KSCI

Design and Performance Analysis of High Performance Processor-Memory Integrated Architectures (고성능 프로세서-메모리 혼합 구조의 설계 및 성능 분석)

Kim, Young-Sik;Kim, Shin-Dug;Han, Tack-Don
- The Transactions of the Korea Information Processing Society
- /
- v.5 no.10
- /
- pp.2686-2703
- /
- 1998
The widening pClformnnce gap between processor and memory causes an emergence of the promising architecture, processor-memory (PM) integration In this paper, various design issues for P-M integration are studied, First, an analytical model of the DRAM access time is constructed considering both the bank conflict ratio and the DRAM page hit ratio. Then the points of both the performance improvement and the perfonnance bottle neck are found by the proposed model as designing on-chip DRAM architectures. This paper proposes the new architecture, called the delayed precharge bank architecture, to improve the perfonnance of memory system as increasing the DRAM page hit ratio. This paper also adapts an efficient bank interleaving mechanism to the proposed architecture. This architecture is verified !II he better than the hierarchical multi-bank architecture as well as the conventional bank architecture by executiun driven simulation. Eight SPEC95 benchmarks are used for simulation as changing parameters for the cache architecture, the number of DRAM banks, and the delayed time quantum.
PDF

Bank Level Simulator to Analysis Memory System (메모리 시스템 구조 분석을 위한 시뮬레이터)

Kang, Dongwoo;Choi, Jongmoo
- Proceedings of the Korea Information Processing Society Conference
- /
- 2014.04a
- /
- pp.40-42
- /
- 2014
최근의 컴퓨터 시스템은 멀티 코어를 기반으로 병렬성 향상을 추구 하고 있지만 코어의 개수가 증가함에 따라 메모리가 새로운 병목 지점으로 지적되고 있다. 메모리 시스템은 가상 메모리, 물리 메모리, 뱅크 메모리 3계층으로 나눌 수 있으며, 각 계층은 상호연관 관계가 있어서 분석하기에 어려움이 있다. 본 논문에서는 이를 위해 계층 구조를 지원하는 시뮬레이터를 제안한다. 제안하는 시뮬레이터는 총 5개의 구성 요소로 이루어져 있으며, CPU 개수, 캐시 정책, 뱅크 개수등 다양한 설정을 지원한다. 또한 시뮬레이터를 통하여 운영체제 수준의 물리 메모리 관리자가 메모리 접근 지연에 영향이 있음을 보인다.
https://doi.org/10.3745/PKIPS.y2014m04a.40 인용 PDF

A method for improving wear-leveling of flash file systems in workload of access locality (접근 지역성을 가지는 작업부하에서 플래시 파일시스템의 wear-leveling 향상 기법)

Jang, Si-Woong
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.12 no.1
- /
- pp.108-114
- /
- 2008
Since flash memory cannot be overwritten, new data are updated in new area. If data are frequently updated, garbage collection which is achieved by erasing blocks, should be performed to reclaim new area. Hence, because the count of erase operations is limited due to characteristics of flash memory, every block should be evenly written and erased. However, if data with access locality are processed by cost benefit algorithm with separation of hot block ad cold block though the performance of processing is hight wear-leveling is not even. In this paper, we propose CB-MB (Cost Benefit between Multi Bank) algorithm in which hot data are allocated in one bank and cold data in another bank, and in which role of hot bank and cold bank is exchanged every period. CB-MB shows that its performance is 30% better than cost benefit algorithm with separation of cold block and hot block its wear-leveling is about a third of that in standard deviation.
https://doi.org/10.6109/JKIICE.2008.12.1.108 인용 PDF KSCI

Music retrieval system implementation based on multi-level quantization scheme (다중 레벨 양자화 기법 기반의 음악 검색기 구현)

Song, Won-Sik;Park, Man-Soo;Kim, Hoi-Rin
- 한국HCI학회:학술대회논문집
- /
- 2006.02a
- /
- pp.1182-1187
- /
- 2006
본 논문은 필립스의 오디오 핑거프린트 추출 방식을 기반으로 기존의 방식이 주파수 영역을 너무 조밀하게 분석하는 특징을 지적하고 개선 방안으로 양자화를 통해 필터 뱅크의 에너지 변화율을 오디오 핑거프린트 추출시 반영하는 방법을 제안하였다. 또한 제안된 알고리즘을 사용하여 PDA 로 실제 어플리케이션을 구현하는 것을 목적으로 하고 있다. 제안된 방식은 필립스 방식과 동일한 메모리 크기를 유지하기 위하여 필터 뱅크의 개수를 33 개에서 17 개로 줄이고 필터 뱅크의 변화량을 2 비트로 할당하는 방식을 사용하였다. 변화량을 비트에 할당하기 위하여 음악 데이터 베이스로부터 추출된 각 밴드의 pmf를 통해 음악의 고유성을 최대로 증진 시킬 수 있는 임계치를 찾아내고 이것을 바탕으로 필터 뱅크의 변화량을 2 비트로 할당하였다. 이 같이 추출된 오디오 핑거프린트를 기반으로 PDA 와 음악 검색기 서버와의 통신을 이용하여 사용자가 요청한 쿼리 음악에 관련된 정보를 제공하는 시스템을 구현했다. 제안된 방식은 다양한 주변 잡음 환경에서 평가되어 기존의 필립스 방식 보다 성능 향상 물론 검색 속도 또한 개선되는 특징을 확인할 수 있었다.
PDF

A High Speed FFT Processor for OFDM Systems (OFDM 시스템을 위한 고속 FFT 프로세서)

조병각;손병수;선우명훈
- Proceedings of the IEEK Conference
- /
- 2001.09a
- /
- pp.873-876
- /
- 2001
본 논문에서는 고속 데이터 전송을 위한 OFDM (Orthogonal Frequency Division Multiplex) 시스템용 고속 FFT 프로세서를 제안한다. 적은 하드웨어 크기를 만족하기 위해 단일 메모리 구조를 채택하였으며 고속 연산을 위해 Radix-4 알고리즘과 메모리 뱅크 구조를 사용하였다. 또한, 버터 플라이 출력이 입력 데이터의 위치에 저장되는 In-place 메모리 구조를 갖는다. 설계한 프로세서는 VHDL로 모델링하여 삼성 0.5㎛ SOG 공정으로 합성하였으며 메모리를 제외한 전체 게이트 수가 98,326개를 보였다. 동작속도는 42MHz로 256-포인트 연산이 6㎲에 처리 가능한 구조이다.
PDF

A new direct-mapped cache with fully associative buffer for low power consumption by using bank-selection mechanism (저 전력을 위한 뱅크 선택 메커니즘과 새로운 동작 메커니즘을 이용한 직접사상 캐쉬 및 버퍼 시스템)

이종성;이정훈;김신덕
- Proceedings of the Korean Information Science Society Conference
- /
- 2003.10a
- /
- pp.223-225
- /
- 2003
본 논문은 서로 다른 두 구조의 캐쉬와 새로운 뱅크선별기를 이용하여, 보다 효율적인 뱅크관리 메커니즘을 응용한 새로운 개념의 캐쉬 구조에 대한 설명을 한다. 크기가 작음에도 불구하고, 낮은 접근 실패율(Miss ratio)와 높은 저전력 효과는 기존의 일반적인 직접사상 캐쉬와 비교했을 때, 성능면에서 월등한 차이를 나타내고 있다. 이러한 결과의 원인은 직접사상 캐쉬와 완전연관 버퍼의 최적화된 구성과. 효과적인 뱅크선별기를 사용하여 적은 전력에도 높은 성능을 발휘하는 새로운 메커니즘을 사용하였기 때문이다. 제안한 구조의 성능은 다양한 크기의 직접사상 캐쉬와 비교하였으며, 접근 실패율, 평균 메모리 접근 시간, 전력소비, Energy * Delay Product 등 모두 4가지의 지표를 사용하였다.
PDF

Automatic Detection of Memory Subsystem Parameters for Embedded Systems (임베디드 시스템을 위한 메모리 서브시스템 파라미터의 자동 검출)

Ha, Tae-Jun;Seo, Sang-Min;Chun, Po-Sung;Lee, Jae-Jin
- Journal of KIISE:Computing Practices and Letters
- /
- v.15 no.5
- /
- pp.350-354
- /
- 2009
To optimize the performance of software programs, it is important to know certain hardware parameters such as the CPU speed, the cache size, the number of TLB entries, and the parameters of the memory subsystem. There exist several ways to obtain the values of various hardware parameters. Firstly. the values can be taken from the hardware manual. Secondly, the parameters can be obtained by calling functions provided by the operating systems. Finally, hardware detection programs can find the desired values. Such programs are usually executed on PC or server systems and report the CPU speed, the cache size, the number of TLB entries, and so on. However, they do not sufficiently detect the parameters of one of the most important parts of the computer concerning performance, namely the memory bank layout in the memory subsystem. In this paper, we present an algorithm to detect the memory bank parameters. We run an implementation of our algorithm on various embedded systems and compare the detected values with the real hardware parameters. The results show that the presented algorithm detects the cache size, the number of TLB entries, and the memory bank layout with high accuracy.
PDF KSCI

A Block Allocation Policy to Enhance Wear-leveling in a Flash File System (플래시 파일시스템에서 wear-leveling 개선을 위한 블록 할당 정책)

Jang, Si-Woong
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2007.10a
- /
- pp.574-577
- /
- 2007
While disk can be overwritten on updating data, because flash memory can not be overwritten on updating data, new data are updated in new area. If data are frequently updated, garbage collection, which is achieved by erasing blocks, should be performed to reclaim new area. Hence, because the number of erase operations is limited due to characteristics of flash memory, every block should be evenly written and erased. However, if data with access locality are processed by cost benefit algorithm with separation of hot block and cold block, though the performance of processing is high, wear-leveling is not even. In this paper, we propose CB-MB (Cost Benefit between Multi Bank) algorithm in which hot data are allocated in one bank and cold data in another bank, and in which role of hot bank and cold bank is exchanged every period. CB-MB showed that its performance was similar to that of others for uniform workload, however, the method provides much better performance than that of others for workload of access locality.
PDF

Search Result 38, Processing Time 0.029 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)