• Title/Summary/Keyword: NUMA 시스템

Search Result 39, Processing Time 0.021 seconds

Page replication mechanism using adjustable DELAY counter in NUMA multiprocessors (NUMA 다중처리기에서 조정가능한 지연 카운터를 이용한 페이집 복사 기법)

  • 이종우;조유곤
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.33B no.6
    • /
    • pp.23-33
    • /
    • 1996
  • The exploitation of locality of reference in shared memory NUMA multiprocessors is one of the improtant problems in parallel processing today. In this paper, we propose a revised hardeare reference counter to help operating system to manage locality. In contrast to the previous one, the value of counter can abe adjusted dynamically and periodically to adapt the page replication policy to the various memory reference patterns of processors. We use execution-driven simulation of real applications to evaluate the effectiveness of our adjustable DELAY counter. Our main conclusijon is that by using the adjustable DELAY counter the t normalized average memory access costs and the variance of them become smaller for most applications than the previous one and more robust memory management policies can be provided for the operating systems.

  • PDF

Performance Evaluation for a Multiprocessor Computer System Using a Commercial Workload (상용 작업부하를 이용한 다중프로세서 컴퓨터 시스템 성능 평가)

  • 박진원
    • Journal of the Korea Society for Simulation
    • /
    • v.8 no.1
    • /
    • pp.35-49
    • /
    • 1999
  • The CC-NUMA based, distributed shared memory is an emerging architecture for multiprocessor computer systems because of its scalability and easy of programming. In this paper, we analyzed performance of a ring-based, CC-NUMA multiprocessor computer system using a commercial workload targeted for popular OLTP applications. Based on the traces collected from real machines, the characteristics of the commercial workload could be obtained. The simulation results showed that the bottleneck on the ring could be effectively removed by using a dual ring structure. We believe our simulation methodology and results will help us to design better multiprocessor computer systems for commercial application domains.

  • PDF

Efficient Processing of Grouped Aggregation on Non-Uniformed Memory Access Architecture (비균등 메모리 접근 구조에서의 효율적인 그룹화 집단 연산의 처리)

  • Choe, Seongjun;Min, Jun-Ki
    • Database Research
    • /
    • v.34 no.3
    • /
    • pp.14-27
    • /
    • 2018
  • Recently, to alleviate the memory bottleneck problme occurred in Symmetric Multiprocessing (SMP) architecture, Non-Uniform Memory Access (NUMA) architecture was proposed. In addition, since an aggregation operator is an important operator providing properties and summary of data, the efficiency of the aggregation operator is crucial to overall performance of a system. Thus, in this paper, we propose an efficient aggregation processing technique on NUMA architecture. Our proposed technique consists of partition phase and merge phase. In the partition phase, the target relation is partitioned into several partial relations according to grouping attribute. Thus, since each thread can process aggregation operator on partial relation independently, we prevent the remote memory access during the merge phase. Furthermore, at the merge phase, we improve the performance of the aggregation processing by letting each thread compute aggregation with a local hash table as well as avoiding lock contention to merge aggregation results generated by all threads into one.

A Study on DRACO Architecture Applied to the Cluster Systems (클러스터 시스템에 대한 DRACO 구조의 적용성 연구)

  • Suh, Hyo-Joong
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2004.05a
    • /
    • pp.929-932
    • /
    • 2004
  • DRACO 구조는 이중 링 연결형태에 대비하여 노드간 접근 경로를 단축할 수 있는 구조로, CC-NUMA 시스템에 대하여 그 적용성과 프로토콜이 시험되어 그 효율성이 입증되었다. 본 논문은 이러한 DRACO 구조가 보다 많은 프로세서를 수용할 수 있는 클러스터 시스템에 대하여 동일한 경로 단축과 대역폭 확장을 적용할 수 있음에 기반하여, 이중 연결 링크를 가지고 있는 점대 점 연결 형태의 클러스터 시스템에 대한 적용성을 연구하였다. 노드 내의 네트워크 인터페이스 및 소프트웨어만을 이용한 DRACO 구조 적용은 많은 수의 노드를 필요로 하는 시스템에 적합하고, 하드웨어를 이용한 연결 경로를 이용할 경우 상대적으로 적은 수의 노드를 채용하고 고속의 링크 성능을 필요로 하는 시스템에 적합할 것이다.

  • PDF

An efficient interconnection network topology in dual-link CC-NUMA systems (이중 연결 구조 CC-NUMA 시스템의 효율적인 상호 연결망 구성 기법)

  • Suh, Hyo-Joong
    • The KIPS Transactions:PartA
    • /
    • v.11A no.1
    • /
    • pp.49-56
    • /
    • 2004
  • The performance of the multiprocessor systems is limited by the several factors. The system performance is affected by the processor speed, memory delay, and interconnection network bandwidth/latency. By the evolution of semiconductor technology, off the shelf microprocessor speed breaks beyond GHz, and the processors can be scalable up to multiprocessor system by connecting through the interconnection networks. In this situation, the system performances are bound by the latencies and the bandwidth of the interconnection networks. SCI, Myrinet, and Gigabit Ethernet are widely adopted as a high-speed interconnection network links for the high performance cluster systems. Performance improvement of the interconnection network can be achieved by the bandwidth extension and the latency minimization. Speed up of the operation clock speed is a simple way to accomplish the bandwidth and latency betterment, while its physical distance makes the difficulties to attain the high frequency clock. Hence the system performance and scalability suffered from the interconnection network limitation. Duplicating the link of the interconnection network is one of the solutions to resolve the bottleneck of the scalable systems. Dual-ring SCI link structure is an example of the interconnection network improvement. In this paper, I propose a network topology and a transaction path algorism, which optimize the latency and the efficiency under the duplicated links. By the simulation results, the proposed structure shows 1.05 to 1.11 times better latency, and exhibits 1.42 to 2.1 times faster execution compared to the dual ring systems.

A Packet Processing of Handling Large-capacity Traffic over 20Gbps Method Using Multi Core and Huge Page Memory Approache

  • Kwon, Young-Sun;Park, Byeong-Chan;Chang, Hoon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.6
    • /
    • pp.73-80
    • /
    • 2021
  • In this paper, we propose a packet processing method capable of handling large-capacity traffic over 20Gbps using multi-core and huge page memory approaches. As ICT technology advances, the global average monthly traffic is expected to reach 396 exabytes by 2022. With the increase in network traffic, cyber threats are also increasing, increasing the importance of traffic analysis. Traffic analyzed as an existing high-cost foreign product simply stores statistical data and visually shows it. Network administrators introduce and analyze many traffic analysis systems to analyze traffic in various sections, but they cannot check the aggregated traffic of the entire network. In addition, since most of the existing equipment is of the 10Gbps class, it cannot handle the increasing traffic every year at a fast speed. In this paper, as a method of processing large-capacity traffic over 20Gbps, the process of processing raw packets without copying from single-core and basic SMA memory approaches to high-performance packet reception, packet detection, and statistics using multi-core and NUMA memory approaches suggest When using the proposed method, it was confirmed that more than 50% of the traffic was processed compared to the existing equipment.

Formal Verification of I-Link Bus arbiter Protocol Using VIS (VIS를 이용한 I-Link Bus 중재 프로토콜의 정형검증)

  • Um, Hyun-Sun;Choi, Jin-Young;Han, Woo-Jong;Ki, An-Do
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2000.04a
    • /
    • pp.149-154
    • /
    • 2000
  • 시스템이 복잡해짐에 따라 현재 사용되고 있는 무작위적 테스트나 시뮬레이션은 프로토콜의 정확성을 확인하기에 충분하지 못하므로 보다 효율적이고 믿을 만한 검증 방법이 필요하다. 본 논문은 ETRI에서 개발한 디렉토리 기반 CC-NUMA시스템의 CCA(Cache Coherent Agent)보드 내부 버스인 I-Link(Inside Link) 버스의 중재 프로토콜을 정형 검증에 쓰이는 도구 중의 하나인 VIS(Verification Interacting with Synthesis)를 이용하여 검증한다. VIS는 Verilog 입력을 받는 도구이므로 개발 단계에서 만들어진 소스를 그대로 이용하여 검증하는 기법을 사용하였고 이를 통해 보다 정확한 명세와 검증을 할 수 있었다.

  • PDF

Impact of Process Scheduling on Network Performance over Multi-Core Systems (멀티 코어 시스템에서 통신 프로세스의 스케줄링에 따른 성능 분석)

  • Jang, Hye-Churn;Jin, Hyun-Wook
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2009.04a
    • /
    • pp.827-829
    • /
    • 2009
  • 현재 멀티 코어 프로세서는 많은 서버에 적용되어 사용되고 있으며, 향후에는 하나의 프로세서 패키지에 포함될 코어의 개수는 계속해서 증가할 것이다. 그러나 현재 운영체제들은 멀티 코어 시스템을 멀티 프로세서 환경과 거의 동일하게 다루고 있으며 아직 멀티 코어 특성을 고려한 성능 최적화 시도는 미흡한 상태이다. 본 논문은 SMP와 NUMA 구조의 멀티 코어 프로세서 환경에서 통신 프로세스와 네트워크 인터럽트의 프로세서 친화도를 변화시키며 네트워크 처리율과 코어의 유휴 자원 양을 정량적으로 분석한다. 측정 결과 프로세서 친화도에 따라 통신 처리율은 크게 변하지 않지만 프로세서 자원의 요구량에는 크게 영향을 주는 것을 보인다. 또한 이러한 프로세서 자원의 영향은 멀티 코어 프로세서의 캐쉬 공유 구조 및 메모리 분산 구조와 밀접한 관계를 갖고 있음을 밝힌다.

Cost-effective multistage interconnection network for UNMA model system (NUMA(non-uniform memory access) 모델 시스템을 위한 cost-effective한 다단계 상호연결망)

  • 최창훈;김성천
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.34C no.5
    • /
    • pp.19-32
    • /
    • 1997
  • So far, the multiple path MINs to provide redundant paths in the traditional UPP MINs have been realized by adding additional hardware such as extra stages, duplicated data links, or multiple copies of sthe MIN. And the traditional MINs do not exploit locality: communication with all processor-memory paris takes the same amount of time. Also so far there has been little progress for exploiting locality of reference in MINs. In this paper, we present a new topology MIN, hybrid MIN that is constructed with 2N-3 SEs which is far fewer SEs than that of traditional MINs. Although the hybrid MIN is constructed with 2N-3 SEs, the hybrid MIN satisfies full access capability (FAC) and has redundant paths(but providing single path for 2 memory modules of each processor). Moreover the has redundant paths (but providing single path for 2 memory modules of each processor). Moreover the Hybrid MIN provides shortcut path between pairs which have frequent dat acommunication (locality of reference). Its performance under varing degrees of localized communication is analyzed.

  • PDF