• Title/Summary/Keyword: 공유 메모리 구조

Search Result 143, Processing Time 0.025 seconds

Construction Methods of Switching Network for a Small and a Large Capacity AMT Switching System (소용량 및 대용량의 ATM시스템에 적합한 스위칭 망의 구성 방안)

  • Yang, Chung-Ryeol;Kim, Jin-Tae
    • The Transactions of the Korea Information Processing Society
    • /
    • v.3 no.4
    • /
    • pp.947-960
    • /
    • 1996
  • The primary goal for developing high performance ATM switching systems is to minimized the probability of cell loss, cell delay and deterioration of throughput. ATM switching element that is the most suitable for this purpose is the shared buffer memory switch executed by common random access memory and control logic. Since it is difficult to manufacture VLIS(Very Large Scale Integrated circuit) as the number of input ports increased, the used of switching module method the realizes 32$\times$32, 150 Mb/s switch utilizing 8$\times$8, 600Mb/s os 16$\times$16, 150Mb/s unit switch is latest ATM switching technology for small and large scale. In this paper, buffer capacity satisfying total-memory-reduction effect by buffer sharing in a shared buffer memory switch are analytically evalu ated and simulated by computer with cell loss level at traffic conditions, and also features of switching network utilizing the switching module methods in small and large-capacity ATM switching system is analized. Based on this results, the structure in outline of 32$\times$32(4.9Gb/s throughput), 150Mb/s switches under research in many countries is proposed, and eventually, switching-network structure for ATM switching system of small and large and capacity satisfying with above primary goals is suggested.

  • PDF

Page replication mechanism using adjustable DELAY counter in NUMA multiprocessors (NUMA 다중처리기에서 조정가능한 지연 카운터를 이용한 페이집 복사 기법)

  • 이종우;조유곤
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.33B no.6
    • /
    • pp.23-33
    • /
    • 1996
  • The exploitation of locality of reference in shared memory NUMA multiprocessors is one of the improtant problems in parallel processing today. In this paper, we propose a revised hardeare reference counter to help operating system to manage locality. In contrast to the previous one, the value of counter can abe adjusted dynamically and periodically to adapt the page replication policy to the various memory reference patterns of processors. We use execution-driven simulation of real applications to evaluate the effectiveness of our adjustable DELAY counter. Our main conclusijon is that by using the adjustable DELAY counter the t normalized average memory access costs and the variance of them become smaller for most applications than the previous one and more robust memory management policies can be provided for the operating systems.

  • PDF

Sharing Pattern Analysis of an OLTP Application

  • Lee, Kangwoo;Kim, Hiecheol
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.7 no.5
    • /
    • pp.121-128
    • /
    • 2002
  • Although multiprocessor systems are widely used in recent years to run commercial workloads, data sharing patterns are rarely explored due to several difficulties. In this paper, we made in-depth sharing pattern analysis for a representative OLTP application, the TPC-B benchmark, running on a cache-coherent shared-memory multiprocessor system. In addition, to illustrate their effects on the performance, the number of cache misses were measured for various numbers of processors, cache sizes and cache block sizes. From these measurements, we found out the shared data in TPC-B largely bear quite different sharing characteristics from those in scientific applications.

  • PDF

Remote Cache Replacement Policy using Processor Locality in Multi-Processor System (다중 프로세서 시스템에서 프로세서 지역성을 이용한 원격 캐쉬 교체 정책)

  • Han Sang Yoon;Kwak Jong Wook;Jhang Seong Tae;Jhon Chu Shik
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.32 no.11_12
    • /
    • pp.541-556
    • /
    • 2005
  • The memory access latency of the system has been a primary factor of performance degradation in single-processor system and multi-processor system. The remote memory access latency takes a lot of overhead over the local memory access latency especially in the distributed shared-memory system. To resolve this problem, the multi-level cache architecture that contains a remote cache in the multi-processor system has been proposed. In this paper, we propose a new cache replacement policy that improves the performance of the multi-processor system with the remote cache. If the multi-level cache keeps the multi-level inclusion(MLI) property and uses the LRU(Least Recently Used) cache replacement policy, the LRU information of the higher-level cache(a processor cache) would be different with that of the lower-level cache(a remote cache). In this situation, the replacement of a remote cache line can induce the exchange of a processor cache line that is used by the processor. It is a main factor of performance degradation in a whole system. To alleviate this disadvantage of the LRU replacement polity, the new policy analyses tht processor's remote memory access pattern of each node and uses this information to reduce the number of invalidations of the useful cache line in the higher-level cache. The new replacement policy of the remote cache can improve the performance by $3.5\%$ in maximum and $2.5\%$ in average on SPLASH-2 benchmarks, compared to the general LRU cache replacement policy.

A High Speed 2D-DWT Parallel Hardware Architecture Using the Lifting Scheme (Lifting scheme을 이용한 고속 병렬 2D-DWT 하드웨어 구조)

  • 김종욱;정정화
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.40 no.7
    • /
    • pp.518-525
    • /
    • 2003
  • In this paper, we present a fast hardware architecture to implement a parallel 2-dimensional discrete wavelet transform(DWT)based on the lifting scheme DWT framework. The conventional 2-D DWT had a long initial and total latencies to get the final 2D transformed coefficients because the DWT used an entire input data set for the transformation and transformed sequentially The proposed architecture increased the parallel performance at computing the row directional transform using new data splitting method. And, we used the hardware resource sharing architecture for improving the total throughput of 2D DWT. Finally, we proposed a scheduling of hardware resource which is optimized to the proposed hardware architecture and splitting method. Due to the use of the proposed architecture, the parallel computing efficiency is increased. This architecture shows the initial and total latencies are improved by 50% and 66%.

2-D DCT/IDCT Processor Design Reducing Adders in DA Architecture (DA구조 이용 가산기 수를 감소한 2-D DCT/IDCT 프로세서 설계)

  • Jeong Dong-Yun;Seo Hae-Jun;Bae Hyeon-Deok;Cho Tae-Won
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.43 no.3 s.345
    • /
    • pp.48-58
    • /
    • 2006
  • This paper presents 8x8 two dimensional DCT/IDCT processor of adder-based distributed arithmetic architecture without applying ROM units in conventional memories. To reduce hardware cost in the coefficient matrix of DCT and IDCT, an odd part of the coefficient matrix was shared. The proposed architecture uses only 29 adders to compute coefficient operation in the 2-D DCT/IDCT processor, while 1-D DCT processor consists of 18 adders to compute coefficient operation. This architecture reduced 48.6% more than the number of adders in 8x8 1-D DCT NEDA architecture. Also, this paper proposed a form of new transpose network which is different from the conventional transpose memory block. The proposed transpose network block uses 64 registers with reduction of 18% more than the number of transistors in conventional memory architecture. Also, to improve throughput, eight input data receive eight pixels in every clock cycle and accordingly eight pixels are produced at the outputs.

Implementation of Parallel Task Creation Mechanism in POSOX Thread (POSIX Thread를 이용한 병렬 태스크 생성 메커니즘의 구현)

  • 윤봉식;황선태
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 1999.10c
    • /
    • pp.759-761
    • /
    • 1999
  • 본 논문에서는 동적 병렬 태스크를 추출하는 메커니즘을 공유 메모리 구조상에서 구현하는데 있어서 최적화 기법에 대해서 소개한다. 최대의 호환성을 고려하여 POSIX thread와 C언어를 사용하였는데 이에 따른 오버헤드를 여러 가지 구현 기법을 사용하여 줄이고자 노력하였다. 그리고 병렬 처리시 대부분의 계산은 각 처리기에서 순차 계산에 의해 이루어지므로 이 순차 계산시에 발생하는 오버헤드를 줄이는 것을 최우선으로 하였다.

  • PDF

Design and Implementation of KDSM(KAIST Distributed Shared Memory) System (KDSM(KAIST Distributed Shared Memory) 시스템의 설계 및 구현)

  • Lee, Sang-Kwon;Yun, Hee-Chul;Lee, Joon-Won;Maeng, Seung-Ryoul
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.29 no.5
    • /
    • pp.257-264
    • /
    • 2002
  • In this paper, we give a detailed description of KDSM(KAIST Distributed Shared Memory) system. KDSM is implemented as a user-level library running on Linux 2.2.13, and TCP/IP is used for communication. KDSM uses page-based invalidation protocol, multiple-writer protocol, and supports HLRC(Home-based Lazy Release Consistency) memory consistency model. To evaluate performance of KDSM, we executed 4 scientific applications and compared the result to JLAJLA. The results showed that performance of KDSM almost equal to JIAJIA for 2 applications and performance of KDSM is better than JIAJIA for 2 applications.

Analysis of Performance Requirement for Large-Scale InfiniBand-based DVSM System (대용량의 InfiniBand 기반 DVSM 시스템 구현을 위한 성능 요구 분석)

  • Cho, Myeong-Jin;Kim, Seon-Wook
    • The KIPS Transactions:PartA
    • /
    • v.14A no.4
    • /
    • pp.215-226
    • /
    • 2007
  • For past years, many distributed virtual shared-memory(DVSM) systems have been studied in order to develop a low-cost shared memory system with a fast interconnection network. But the DVSM needs a lot of data and control communication between distributed processing nodes in order to provide memory consistency in software, and this communication overhead significantly dominates the overall performance. In general, the communication overhead also increases as the number of processing nodes increase, so communication overhead is a very important performance factor for developing a large-scale DVSM system. In this paper, we study the performance scalability quantitatively and qualitatively for developing a large-scale DVSM system based on the next generation interconnection network, called the InfiniBand. Based on the study, we analyze a performance requirement of the next-coming interconnection network to be used for developing a performance-scalable DVSM system in the future.

Design of ABR Service Control Functions for ATM Virtual End Systems (ATM 가상종단시스템의 ABR 서비스 제어 기능 설계)

  • 이숭희
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.2 no.1
    • /
    • pp.71-79
    • /
    • 2001
  • ATM networks with long-distance links, causing long propagation delays, may require virtual end system functions to provide ABR services. We survey several issues for implementing the control functions for ABR service in multiple VCs, required in the virtual ends, and propose a structure for implementation. In the proposed design, shared memory structure is used to control the cell emission of the multiple VCs for the ABR services in the virtual end systems.

  • PDF