Search | Korea Science

Parallelization of Multifrontal Solution Method for Shared Memory Architecture (다중프론트 해법의 공유메모리 병렬화)

Kim, Min Ki;Kim, Jeong Ho;Park, Chan Yik;Kim, Seung Jo
- Journal of the Korean Society for Aeronautical & Space Sciences
- /
- v.40 no.11
- /
- pp.972-978
- /
- 2012
This paper discusses the parallelization of multifrontal solution method, widely used for finite element structural analyses, for a shared memory architecture. Multifrontal method is easier than other linear solution methods because the solution procedure implies that unknowns can be eliminated simultaneously. Two innovative ideas are introduced to achieve optimal solver performance on a shared memory computer. Those are pairing two frontal matrices and splitting the frontal matrix in order to reduce the temporal memory space required by independent computing tasks. Performance comparisons between original algorithm and proposed one prove that proposed method is more computationally efficient on current multicore machines.
https://doi.org/10.5139/JKSAS.2012.40.11.972 인용 PDF KSCI

다중 프로세서의 캐쉬 메모리

Won, Cheol-Ho;Han, U-Jong;Ham, Jong-Sik;Jeon, Geum-Suk;Yun, Yong-Ho
- ETRI Journal
- /
- v.10 no.3
- /
- pp.92-100
- /
- 1988
다중프로세서의 캐쉬 메모리 구현에서 가장 중요한 부분은 데이터의 동일성을 유지하는 방법이다. 그러나 프로세서-메모리 연결 방법(interconnection network)과 프로세서의 수에 따라 동일성 유지는 다양한 형태로 구현된다. 또한 그것은 시스팀의 성능과 구현의 난이도에 중대한 영향을 주므로 시스팀의 여러 면을 고려하여 형태가 정해진다. 본 논문은 공유 버스를 갖는 밀결합 다중프로세서의 캐쉬메모리의 구현에 관한 것이다. 이미 여러 다중프로세서에서 캐쉬메모리가 개발된 예가 있지만 한국전자통신연구소에서 개발중인 행정전산망 주전산기에 사용될 캐쉬 메모리는 Illinosis 캐쉬 코히어런스 프로토콜과 copy-back 방법을 사용하는 특징을 갖는다. 캐쉬 메모리의 설계 목표를 공유 버스에 최대 20개의 MC68030를 연결할 수 있는 다중 프로세서에서 프로세서수가 증가함에 따라 버스 사용량이 급증하는 현상을 막고 각 프로세서의 메모리 요구를 고속으로 처리해 줌으로써 프로세서의 처리 능력을 최대한 살리는데 두었다.
PDF

Filtering Accesses for Detecting Races in Parallel Programs with Locking (임계구역을 가진 공유메모리 병렬프로그램에서 효율적인 경합 탐지를 위한 사건 선택기법)

김영주;이승렬;전용기
- Proceedings of the Korean Information Science Society Conference
- /
- 2000.04a
- /
- pp.630-632
- /
- 2000
경합은 공유메모리 병렬프로그램의 비결정적인 수행결과를 초래하므로, 디버깅을 위해서 경합탐지는 중요하다. 임계구역을 가진 병렬프로그램을 위한 수행중 경합 탐지 기법은 공유 자료구조를 사용하므로, 매 접근 사건 시에 병목현상을 유발한다. 본 연구에서는 동기화가 있는 병렬프로그램에서 매 반복을 수행할 때마다 공유 자료구조의 접근 횟수를 기껏해야 임계구역의 수에 비례하도록 매 접근사건을 검사한다. 그러므로 이 기법은 수행중 경합탐지의 확장성과 효율성을 제공한다.
PDF

High-Speed Pipelined Memory Architecture for Gigabit ATM Packet Switching (Gigabit ATM Packet 교환을 위한 파이프라인 방식의 고속 메모리 구조)

Gab Joong Jeong;Mon Key Lee
- Journal of the Korean Institute of Telematics and Electronics C
- /
- v.35C no.11
- /
- pp.39-47
- /
- 1998
This paper describes high-speed pipelined memory architecture for a shared buffer ATM switch. The memory architecture provides high speed and scalability. It eliminates the restriction of memory cycle time in a shared buffer ATM switch. It provides versatile performance in a shared buffer ATM switch using its scalability. It consists of a 2-D array configuration of small memory banks. Increasing the array configuration enlarges the entire memory capacity. Maximum cycle time of the designed pipelined memory is 4 ns with 5 V V$\_$dd/ and 25$^{\circ}C$. It is embedded in the prototype chip of a shared scalable buffer ATM switch with 4 x 4 configuration of 4160-bit SRAM memory banks. It is integrated in 0.6 $\mu\textrm{m}$ 2-metal 1-poly CMOS technology.
PDF

An Implementation of Fault Tolerant Software Distributed Shared Memory with Remote Logging (원격 로깅 기법을 이용하는 고장 허용 소프트웨어 분산공유메모리 시스템의 구현)

박소연;김영재;맹승렬
- Journal of KIISE:Computer Systems and Theory
- /
- v.31 no.5_6
- /
- pp.328-334
- /
- 2004
Recently, Software DSMs continue to improve its performance and scalability As Software DSMs become attractive on larger clusters, the focus of attention is likely to move toward improving the reliability of a system. A popular approach to tolerate failures is message logging with checkpointing, and so many log-based rollback recovery schemes have been proposed. In this work, we propose a remote logging scheme which uses the volatile memory of a remote node assigned to each node. As our remote logging does not incur frequent disk accesses during failure-free execution, its logging overhead is not significant especially over high-speed communication network. The remote logging tolerates multiple failures if the backup nodes of failed nodes are alive. It makes the reliability of DSMs grow much higher. We have designed and implemented the FT-KDSM(Fault Tolerant KAIST DSM) with the remote logging and showed the logging overhead and the recovery time.
PDF KSCI

Design and Implementation of Efficient Memory Allocator using Contiguous Allocation Scheme (연속할당 기법을 이용한 효과적인 lock-free 메모리 할당자 설계 및 구현)

Kim, In-Hyuk;Kim, Tae-Hyoung;Eom, Young-Ik
- Proceedings of the Korean Information Science Society Conference
- /
- 2011.06a
- /
- pp.559-561
- /
- 2011
멀티코어 환경에서는 공유 데이터에 대한 동기화로 인한 병목 현상이 중요한 문제점 중의 하나이다. 그리고 동적 메모리 할당자는 대량의 메모리를 할당 및 해제하는 프로그램에서 공유 데이터에 대한 동기화 문제로 성능 저하를 유발시키고 있다. 이를 해결하기 위해 다양한 lock-free 메모리 할당 기법들이 소개되었지만 false sharing과 heap blow-up과 같은 여러 가지 문제점들을 가지고 있다. 이에 본 논문에서는 새로운 연속할당 기법을 제안하고, 이를 이용하여 동일 블록 내의 오브젝트 할당/해제에 따른 동기화 문제를 해결함으로써 효과적인 lock-free 메모리 할당 기법을 제안하였다. 그리고 제안 기법을 구현하여 기존의 메모리 할당 기법들과 실험을 통하여 검증하였으며, 대량의 메모리를 사용하는 멀티 스레드 환경에서 특히 좋은 성능을 보이는 것을 확인하였다.

A Dedicated Bus System for Cache Coherence (캐시 일관성 유지를 위한 전용 버스 시스템)

천희식;김우완
- Proceedings of the Korean Information Science Society Conference
- /
- 1998.10a
- /
- pp.30-32
- /
- 1998
멀티프로세서 시스템을 설계할 경우에는 공유메모리 구조와 메시지 전달방법의 두 가지의 패러다임을 바탕으로 하게 된다. 데이터 분할과 동적 부하 분산 문제를 단순화시틸 수 있으며 확장성을 용이하게 지원하는 장점을 가지고 있는 공유메모리 구조의 멀티프로세서 시스템에서 각 프로세서가 자신의 전용 캐시를 가지는 경우에는 메인 메모리와 이러한 전용 캐시내에 존재하는 데이터사본간에 일관성 문제가 발생한다. 본 논문에서는 일관성 유지를 위해 제안되어 있는 여러 알고리즘 중 처리 노드와 고대역 저지연 인터커넥션 네트워크로 구성되는 공유메모리 구조의 멀티프로세서 프로토타입인 DASH 프로토콜을 지원하기 위한 전용 버스 시스템을 완전 개방형인 IEEE Futurebus+ 스탠다드에 준비하여 설계한 다음, 이 시스템이 DASH 프로토콜을 지원하려 캐시의 일관성을 유지하기 위해 필요한 각종 행동과 기존의 범용 버스 시스템이 수행하는 행동의 병렬 처리를 지원할 수 있음을 시뮬레이션으로 증명한다.
PDF

A Study of Performance Enhancement for the Shared Memory in the Linux O.S (Linux 운영체제에서 Shared Memory 성능 개선 방안 연구)

Jang, Seung-Ju;Choi, Eun-Seok;Kang, Dong-Uk;Lee, Gwang-Yong;Kim, Dong-Han;Kim, Jae-Myeong
- Proceedings of the Korean Information Science Society Conference
- /
- 2007.10b
- /
- pp.324-329
- /
- 2007
본 논문은 대부분의 Linux 운영체제에서 지원해 주는 System V의 IPC 중 하나인 Shared Memo의 성능을 개선하는 방안을 연구한다. Linux에서 사용되는 Shared Memory는 동일한 메모리 영역에 여러 개의 프로세스가 접근할 수 있도록 해 주는 기술이다. 본 논문에서는 Shared Memory의 큰 두 갈래 중 커널 단계에서 처리 되는 SVR 형식의 Shared Memory를 다룬다. 본 논문에서는 리눅스 운영체제의 공유 메모리 성능 개선 방안을 제안한다. 본 논문에서 제안하는 공유 메모리 성능 개선 방안은 듀얼 코어를 활용하여 기존의 단일 처리기 시스템에서보다 성능을 향상시킬 수 있도록 한다. 공유 메모리를 이용한 프로세스의 동작이 별개의 CPU에서 동작되도록 함으로써 성능 향상을 꾀한다.
PDF

Implementation of parallel blocked LU decomposition program for utilizing cache memory on GP-GPUs (GP-GPU의 캐시메모리를 활용하기 위한 병렬 블록 LU 분해 프로그램의 구현)

Kim, Youngtae;Kim, Doo-Han;Yu, Myoung-Han
- Journal of Internet Computing and Services
- /
- v.14 no.6
- /
- pp.41-47
- /
- 2013
GP-GPUs are general purposed GPUs for numerical computation based on multiple threads which are originally for graphic processing. GP-GPUs provide cache memory in a form of shared memory which user programs can access directly, unlikely typical cache memory. In this research, we implemented the parallel block LU decomposition program to utilize cache memory in GP-GPUs. The parallel blocked LU decomposition program designed with Nvidia CUDA C run 7~8 times faster than nun-blocked LU decomposition program in the same GP-GPU computation environment.
https://doi.org/10.7472/jksii.2013.14.6.41 인용 PDF KSCI

Size Reduction and Performance Analysis of the Bit-map Table Used in the Bus-based Shared Memory System (버스기반의 공유메모리 시스템에서 사용된 비트맵 테이블의 크기 축소와 성능 분석)

Woo, Jong-Jung;Lee, Ka-Young
- The Transactions of the Korea Information Processing Society
- /
- v.5 no.1
- /
- pp.24-32
- /
- 1998
The bus contention among bus-based shared-memory multiprocessors limits their performance. In addition, under split bus transaction environment, multiprocessors may make some memory requests unnecessary stand by in the memory access buffer, which makes system performance worse. This unnecessary stand-by can be eliminated by maintaining the bitmap table which contains the status bit for each memory block. However, this mechanism requires a great size of SRAM for the status information, which is fully mapped from the whole memory blocks. To solve this problem, we propose a bitmap cache which exploits partial mapping and locality of references. The simulation results show that the proposed system can greatly reduce the capacity of SRAM for the status information with little deteriorating its performance.
PDF

Search Result 447, Processing Time 0.03 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)