Search | Korea Science

A Message Transfer Scheme for Efficient Message Passing in the Highly Parallel Computer SPAX (고속병렬컴퓨터(SPAX)에서의 효율적인 메시지 전달을 위한 메시지 전송 기법)

모상만;신상석;윤석한;임기욱
- Journal of the Korean Institute of Telematics and Electronics B
- /
- v.32B no.9
- /
- pp.1162-1170
- /
- 1995
In this paper, we present a message transfer scheme for efficient message passing in the hierarchically structured multiprocessor computer SPAX(Scalable Parallel Architecture computer based on X-bar network). The message transfer scheme provides interface not only with operating system but also with end users. In order to transfer two types of control message and data message efficiently, it supports both of memory-mapped transfer and DMA-based transfer. Dual-port RAMs are used as message buffers, and control and status registers provide efficient programming interface. Interlaced parity scheme is adopted for error control. If any error is detected at receiving node, errored packet is resent by sender according to retry mechanism. In conjunction with retry mechanism, watchdog timers are used to protect infinite waiting and repeated retry. The proposed message transfer scheme can be applied to input/output nodes and communication connection nodes as well as processing nodes in the SPAX.
PDF

A Dynamic Co-scheduling Scheme for MPI-based Parallel Programs on Linux Clusters (리눅스 클러스터에서 MPI 기반 병렬 프로그램의 동적 동시 스케줄링 기법)

Kim, Hyuk;Rhee, Yun-Seok
- Journal of the Korea Society of Computer and Information
- /
- v.13 no.1
- /
- pp.29-35
- /
- 2008
For efficient message passing of Parallel programs, it is required to schedule the involved two processes at the same time which are executed on different nodes, that is called 'co-scheduling' However, each node of cluster systems is built on top of general purpose multitasking OS. which autonomously manages local Processes. Thus it is not so easy to co-schedule two (or more) processes in such computing environment. Our work proposes a co-scheduling scheme for MPI-based parallel programs which exploits message exchange information between two parties. We implement the scheme on Linux cluster which requires slight kernel hacking and MPI library modification. The experiment with NPB parallel suite shows that our scheme results in 33-56% reduction in the execution time compared to the typical scheduling case. and especially better Performance in more communication-bound applications.
PDF

A High Speed 2D-DWT Parallel Hardware Architecture Using the Lifting Scheme (Lifting scheme을 이용한 고속 병렬 2D-DWT 하드웨어 구조)

김종욱;정정화
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.40 no.7
- /
- pp.518-525
- /
- 2003
In this paper, we present a fast hardware architecture to implement a parallel 2-dimensional discrete wavelet transform(DWT)based on the lifting scheme DWT framework. The conventional 2-D DWT had a long initial and total latencies to get the final 2D transformed coefficients because the DWT used an entire input data set for the transformation and transformed sequentially The proposed architecture increased the parallel performance at computing the row directional transform using new data splitting method. And, we used the hardware resource sharing architecture for improving the total throughput of 2D DWT. Finally, we proposed a scheduling of hardware resource which is optimized to the proposed hardware architecture and splitting method. Due to the use of the proposed architecture, the parallel computing efficiency is increased. This architecture shows the initial and total latencies are improved by 50% and 66%.
PDF KSCI

Fault-tolerant Scheduling of Real-time Parallel Tasks with Energy Efficiency on Multicore Processors (멀티코어 프로세서 상에서 에너지 효율을 고려한 실시간 병렬 작업들의 결함 포용 스케쥴링)

Lee, Kwanwoo
- KIPS Transactions on Computer and Communication Systems
- /
- v.3 no.6
- /
- pp.173-178
- /
- 2014
By exploiting parallel processing, the proposed scheduling scheme enhances energy saving capability of multicore processors for real-time tasks while satisfying deadline and fault tolerance constraints. The scheme searches for a near minimum-energy schedule within a polynomial time, because finding the minimum-energy schedule on multicore processors is a NP-hard problem. The scheme consumes manifestly less energy than the state-of-the-arts method even with low parallel processing speedup as well as with high parallel processing speedup, and saves the energy consumption up to 86%.
https://doi.org/10.3745/KTCCS.2014.3.6.173 인용 PDF KSCI

A PARALLEL PRECONDITIONER FOR GENERALIZED EIGENVALUE PROBLEMS BY CG-TYPE METHOD

MA, SANGBACK;JANG, HO-JONG
- Journal of the Korean Society for Industrial and Applied Mathematics
- /
- v.5 no.2
- /
- pp.63-69
- /
- 2001
In this study, we shall be concerned with computing in parallel a few of the smallest eigenvalues and their corresponding eigenvectors of the eigenvalue problem, $Ax={\lambda}Bx$, where A is symmetric, and B is symmetric positive definite. Both A and B are large and sparse. Recently iterative algorithms based on the optimization of the Rayleigh quotient have been developed, and CG scheme for the optimization of the Rayleigh quotient has been proven a very attractive and promising technique for large sparse eigenproblems for small extreme eigenvalues. As in the case of a system of linear equations, successful application of the CG scheme to eigenproblems depends also upon the preconditioning techniques. A proper choice of the preconditioner significantly improves the convergence of the CG scheme. The idea underlying the present work is a parallel computation of the Multi-Color Block SSOR preconditioning for the CG optimization of the Rayleigh quotient together with deflation techniques. Multi-Coloring is a simple technique to obatin the parallelism of order n, where n is the dimension of the matrix. Block SSOR is a symmetric preconditioner which is expected to minimize the interprocessor communication due to the blocking. We implemented the results on the CRAY-T3E with 128 nodes. The MPI(Message Passing Interface) library was adopted for the interprocessor communications. The test problems were drawn from the discretizations of partial differential equations by finite difference methods.
PDF

A fault current analysis and parallel FCL scheme on superconducting new power system (초전도(신)전력계통 고장전류 분석 및 병렬한류시스템)

Yoon, Jae-Young;Lee, Seung-Ryul;Kim, Jong-Yul
- Progress in Superconductivity and Cryogenics
- /
- v.8 no.1
- /
- pp.49-53
- /
- 2006
This paper specifies the new power supply paradigm converting 154kV voltage level into 22.9kV class with equivalent capacity using superconducting rower facilities and analyze the fault current characteristics with and without HTS-FCL (High Temperature Superconducting-Fault Current Limiter). Superconducting new power system is the power system to which applies the 22.9kV HTS cable in parallel to HTS transformer and HTS-FCL with low-voltage and mass-capacity characteristics replacing 154kV conventional cable and transformer. The fault current of superconducting new power system will increase greatly because of the mass capacity and low impedance of HTS transformer and cable. This means that the HTS-FCL is necessary to reduce the fault current below the breaking current of circuit breaker. This paper analyze the fault current and suggests the parallel HTS-FCL scheme complementing the inherent problem of HTS-FCL, that is recovery after quenching is impossible within shorter than a few seconds.
https://doi.org/10.9714/psac.2006.8.1.049 인용 PDF KSCI

Control and Design of Input Series-Output Parallel Connected Converter for High Speed Train Power System (고속전철 보조전원 장치용 입력직렬-출력병렬 컨버터의 제어 및 설계)

Kim, Jeong-Won;Yu, Jeong-Sik;Jo, Bo-Hyeong
- The Transactions of the Korean Institute of Electrical Engineers B
- /
- v.49 no.4
- /
- pp.282-290
- /
- 2000
In this paper, the charge control with the input voltage feedback is proposed for the input series-output series-output parallel connected converter configuration for the high speed train power system application. This control scheme accomplishes the output current sharing for the output-parallel connected modules as well as the input voltage sharing for the input series connected modules for all operating conditions including the transients. It also offers the robustness for the input voltage sharing control according to the component value mismatches among the modules. And this configuration enables the usage of MOSFET for a high voltage system allowing a higher switching frequency for lighter system weight and smaller size. The performance of the proposed scheme is verified through the experimental results.
PDF

A Data Prefetching Scheme Exploiting the Grain Size in Parallel Programs using Data Arrays (데이타 배열을 사용하는 병렬 프로그램에서 그레인 크기를 이용한 데이타 선인출 기법)

Jung, In-Bum;Lee, Joon-Won
- Journal of KIISE:Computer Systems and Theory
- /
- v.27 no.1
- /
- pp.101-108
- /
- 2000
The data prefetching scheme is an effective technique to reduce the main memory access latency by exploiting the overlap of processor computations with data accesses. However, if the prefetched data replicate the useful existing data in the cache memory and they are not being used in computations. performances of programs are aggravated. This phenomenon results from the lack of correct predictions for data being used in the future. When parallel programs exploit the data arrays for computations, the grain size is useful information for data prefetching scheme because it implies the range of data using in computations. Based on this information, we suggest a new data prefetching scheme exploited by the grain size of the parallel program. Simulation results show that the suggested prefetching scheme improves the performance of the simulated parallel programs due to the reduction of bus transactions as well as useful prefetching operations.
PDF

Parallel processing in structural reliability

Pellissetti, M.F.
- Structural Engineering and Mechanics
- /
- v.32 no.1
- /
- pp.95-126
- /
- 2009
The present contribution addresses the parallelization of advanced simulation methods for structural reliability analysis, which have recently been developed for large-scale structures with a high number of uncertain parameters. In particular, the Line Sampling method and the Subset Simulation method are considered. The proposed parallel algorithms exploit the parallelism associated with the possibility to simultaneously perform independent FE analyses. For the Line Sampling method a parallelization scheme is proposed both for the actual sampling process, and for the statistical gradient estimation method used to identify the so-called important direction of the Line Sampling scheme. Two parallelization strategies are investigated for the Subset Simulation method: the first one consists in the embarrassingly parallel advancement of distinct Markov chains; in this case the speedup is bounded by the number of chains advanced simultaneously. The second parallel Subset Simulation algorithm utilizes the concept of speculative computing. Speedup measurements in context with the FE model of a multistory building (24,000 DOFs) show the reduction of the wall-clock time to a very viable amount (<10 minutes for Line Sampling and ${\approx}$ 1 hour for Subset Simulation). The measurements, conducted on clusters of multi-core nodes, also indicate a strong sensitivity of the parallel performance to the load level of the nodes, in terms of the number of simultaneously used cores. This performance degradation is related to memory bottlenecks during the modal analysis required during each FE analysis.
https://doi.org/10.12989/sem.2009.32.1.095 인용 KSCI

Efficient Mapping Scheme for Parallel Processing (병렬처리를 위한 효율적인 사상 기법)

Kim, Seok-Su;Jeon, Mun-Seok
- The Transactions of the Korea Information Processing Society
- /
- v.3 no.4
- /
- pp.766-780
- /
- 1996
This paper presents a mapping scheme for parallel processing using an accurate characterization of the communication overhead. A set of objective functions is formulated to evaluate the optimality of mapping a problem graph into a system graph. One of them is especially suitable for real-time applications of parallel processing. These objective functions are different from the conventional objective functions in that the edges in the problem graph are weighted and the actual distance rather than the nominal distance for the edges in the system graph is employed. This facilitates a more accurate qualification of the communication overhead. An efficient mapping scheme has been developed for the objective functions, where two levels of assignment optimization procedures are employed: initial assignment and pairwise exchange. The mapping scheme has been tested using the hypercube as a system graph.
PDF

Search Result 795, Processing Time 0.028 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)