Search | Korea Science

A Study on the Performance Analysis of Cache Coherence Protocols in a Multiprocessor System Using HiPi Bus (HiPi 버스를 사용한 멀티프로세서 시스템에서 캐쉬 코히어런스 프로토콜의 성능 평가에 관한 연구)

김영천;강인곤;황승욱;최진규
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.18 no.1
- /
- pp.57-68
- /
- 1993
In this paper, we describe a multiprocessor system using the HiPi bus with pended protocol and multiple cache memories, and evalute the performance of the multiprocessor system in terms of processor utilization for various cache coherence protocols. The HiPi bus is delveloped as the shared bus of TICOM II which is a main computer system to establish a nation-wide computing network in ETRI. The HiPi bus has high data transfer rate, but it doesn't allow cache-to-cache transfer. In order to evaluate the effect of cache-to-cache transfer upon the performance of system and to choose a best-performed protocol for HiPi bus, we simulate as follows: First, we analyze the performance of multiprocessor system with HiPi bus in terms of processor utilizatIOn through simulation. Each of cache coherence protocol is described by state transition diagram, and then the probability of each state is calculated by Markov steady state. The calculated probability of each state is used as input parameters of simulation, and modeling and simulation are implemented and performed by using SLAM II graphic symbols and language. Second, we propose the HiPi bus which supports cache-to-cache transfer, and analyze the performance of multiprocessor system with proposed HiPi bus in terms of processor utilization through simulation. Considered cache coherence protocols for the simulation are Write-through, Write-once, Berkely, Synapse, Illinois, Firefly, and Dragon.
PDF

Fast GPU Implementation for the Solution of Tridiagonal Matrix Systems (삼중대각행렬 시스템 풀이의 빠른 GPU 구현)

Kim, Yong-Hee;Lee, Sung-Kee
- Journal of KIISE:Computer Systems and Theory
- /
- v.32 no.11_12
- /
- pp.692-704
- /
- 2005
With the improvement of computer hardware, GPUs(Graphics Processor Units) have tremendous memory bandwidth and computation power. This leads GPUs to use in general purpose computation. Especially, GPU implementation of compute-intensive physics based simulations is actively studied. In the solution of differential equations which are base of physics simulations, tridiagonal matrix systems occur repeatedly by finite-difference approximation. From the point of view of physics based simulations, fast solution of tridiagonal matrix system is important research field. We propose a fast GPU implementation for the solution of tridiagonal matrix systems. In this paper, we implement the cyclic reduction(also known as odd-even reduction) algorithm which is a popular choice for vector processors. We obtained a considerable performance improvement for solving tridiagonal matrix systems over Thomas method and conjugate gradient method. Thomas method is well known as a method for solving tridiagonal matrix systems on CPU and conjugate gradient method has shown good results on GPU. We experimented our proposed method by applying it to heat conduction, advection-diffusion, and shallow water simulations. The results of these simulations have shown a remarkable performance of over 35 frame-per-second on the 1024x1024 grid.
PDF KSCI

Remote Measuring System for Automobile′s ECU Self Diagnostic Signal (자동차 ECU 자기진단 신호의 원격계측 시스템)

Jeong, Jin-Ho;Yun, Yeo-Heung;Lee, Young-Choon;Kwon, Tae-Kyu;Lee, Seong-Cheol
- Journal of the Korean Society for Precision Engineering
- /
- v.19 no.5
- /
- pp.159-167
- /
- 2002
In this paper. we present a new method for monitoring of ECU's self diagnostic signals of vehicle without wire. In order to measure the ECU's self diagnostic signals, the interfaced circuit is designed to communicate ECU and designed terminal according to the IOS, SAE regulation of communication protocol standard. Micro-processor 80C196KC is used for communicating ECU's self diagnositc signals and the results are sent to the wireless terminal and PC monitoring system. Wireless terminal is also developed by 80C196KC, LCD, RF module, and keypad. The command from the keypad is sent to ECU through RF module and the result show on the Graphic LCD in real time. Software on PC is developed to monitor the ECU's self diagnostic signals using the Visual C++ complier in which RS232 port is programmed by half duplex method. The algorithms for measuring the ECU's self diagnostic signals are verified to monitor both ECU and portable terminal state. At the same time, the information to fix the vehicle's problem can be shown on the developed software. The possibility for remote measurement of ECU self diagnostic signal is verified through the developed systems and algorithms.
PDF KSCI

Oil Cooler Design Automation on the Cooling of Machine Tool Cutting Oil (공작기계 절삭유 냉각용 오일쿨러 설계 자동화)

권혁홍
- Journal of the Korean Society of Manufacturing Technology Engineers
- /
- v.8 no.1
- /
- pp.89-99
- /
- 1999
The automatic design of shell & tube type oil cooler can be used in real industrial environments. Since the automatic design system is intended to be used in small companies, it is designed to be operated well under environments of CAD package in the personal computer. It has adopted GUI in design system, and has employed DCl language. Design parameters to be considered in the design stage of shell and tube type oil cooler are type of oil cooler, outer diameter, thickness, length of tube, tube arrangement, tube pitch, flow rate, inlet and outlet temperature, physical properties, premissive pressure loss on both sides, type of baffle plate, baffle plate cutting ratio, clearance between baffle plate outer diameter and shell inner diameter and clearance between baffle plate holes. As a result, the automatic design system of shell & tube type oil cooler is constructed by the environment of CAD software using LISP. We have built database of design data for various kinds of shell & tube type oil coolers. The automatic design system have been assessed and compared with existing specification of design. Good agreement with Handbook of heat exchanger and design dta of real industrial environments has been found.
PDF

Design of Open Vector Graphics Accelerator for Mobile Vector Graphics (모바일 벡터 그래픽을 위한 OpenVG 가속기 설계)

Kim, Young-Ouk;Roh, Young-Sup
- Journal of Korea Multimedia Society
- /
- v.11 no.10
- /
- pp.1460-1470
- /
- 2008
As the performance of recent mobile systems increases, a vector graphic has been implemented to represent various types of dynamic menus, mails, and two-dimensional maps. This paper proposes a hardware accelerator for open vector graphics (OpenVG), which is widely used for two-dimensional vector graphics. We analyze the specifications of an OpenVG and divide the OpenVG into several functions suitable for hardware implementation. The proposed hardware accelerator is implemented on a field programmable gate array (FPGA) board using hardware description language (HDL) and is about four times faster than an Alex processor.
PDF

Parallel Computation of FDTD algorithm using CUDA (CUDA를 이용한 FDTD 알고리즘의 병렬처리)

Lee, Ho-Young;Park, Jong-Hyun;Kim, Jun-Seong
- Journal of the Institute of Electronics Engineers of Korea CI
- /
- v.47 no.4
- /
- pp.82-87
- /
- 2010
Modern GPUs(Graphic Processing Units) provide computing capability higher than that of the general CPUs(Central Processor Units). With supports of programmability of graphics pipeline GP-GPU(General Purpose computation on GPU) has gained much attention expanding its application area. This paper compares sequential and massively parallel implementations of FDTD(Finite Difference Time Domain) algorithm using CUDA(Compute Unified Device Architecture). Experimental results show upto 45X speedup over conventional CPU execution.
PDF KSCI

Molecular Interaction Interface Computing Based on Voxel Map (복셀맵을 기반으로 한 분자 간 상호작용 인터페이스의 계산)

Choi, Jihoon;Kim, Byungjoo;Kim, Ku-jin
- Journal of the Korea Computer Graphics Society
- /
- v.18 no.3
- /
- pp.1-7
- /
- 2012
In this paper, we propose a method to compute the interface between protein molecules. When a molecules is represented as a set of spheres with van der Waals radii, the distance from a spatial point p to the molecule corresponds to the distance from p to the closet sphere. The molecular interface is composed of equi-distant points from two molecules. Our algorithm decomposes the space into a set of voxels, and then constructs a voxel map by storing the information of spheres intersecting each voxel. By using the voxel map, we compute the distance between a point and the molecule. We also use GPU for the parallel processing, and efficiently approximate the interface of a pair of molecules.
PDF KSCI

Adaptive Finite Element Method by Selective p-Distribution (선택적 p-분배에 의한 적응적 유한 요소법)

조준형;우광성;박진환;안재석
- Proceedings of the Computational Structural Engineering Institute Conference
- /
- 2003.04a
- /
- pp.288-295
- /
- 2003
An adaptive procedure in finite element analysis is presented by p-refinement of meshes in conjunction with a posteriori error estimator that is based on the recovery technique. In case of the recovery technique, the SPR(superconvergent patch recovery) approach has been modified for p-adaptive mesh refinement. The strategy of finding a nearly optimal distribution of polynomial degrees on a fixed finite element mesh is discussed such that a particular element has to be refined automatically to obtain an acceptable level of accuracy by increasing p-levels non-uniformly. To verify the proposed algorithm, the limit value approach is proposed which utilizes the exact strain energy computed from the extrapolation equation. A new pre-processor is developed for the p-version finite element program in which the vector graphic editor is used for the automatic generation of node connection and coordinate by halfedge solid data structure according to uniform or nonuniform p-distribution. The general 2-D algorithm is also developed to generate face modes and internal modes in accordance with different mesh types. The quality of the error estimator is investigated with the help of two mumerical examples. The results show that the sequences of p-distributions obtained by the proposed error indicator closely follow the optimal trajectory.
PDF

The study on the Efficient methodology to apply the GPU for military information system improvement (국방정보시스템 성능향상을 위한 효율적인 GPU적용방안 연구)

Kauh, Janghyuk;Lee, Dongho
- Journal of Korea Society of Digital Industry and Information Management
- /
- v.11 no.1
- /
- pp.27-35
- /
- 2015
Increasing the number of GPU (Graphic Processor Unit) cores, the studies on High Performance Computing Platform using GPU have actively been made in recent. This trend has led to the development of GPGPU (General Purpose GPU) and CUDA (Compute Unified Device Architecture) Framework. In this paper, we explain the many benefits of the GPU based system, and propose the ICIDF(Identify Compute-Intensive Data set and Function) methodology to apply GPU technology to legacy military information system for performance improvement. To demonstrate the efficiency of this methodology, we applied this method to AES CPU based program obtained from the Internet web site. Simply changing the data structure made improved the performance of AES program. As a result, the performance of AES based GPU program is improved gradually up to 10 times. Depending on the developer's ability, additional performance improvement can be expected. The problem to be solved is heat issue, but this problem has been much improved by the development of the cooling technology.
https://doi.org/10.17662/ksdim.2015.11.1.027 인용 PDF KSCI

Architecture Exploration of Optimal Many-Core Processors for a Vector-based Rasterization Algorithm (래스터화 알고리즘을 위한 최적의 매니코어 프로세서 구조 탐색)

Son, Dong-Koo;Kim, Cheol-Hong;Kim, Jong-Myon
- IEMEK Journal of Embedded Systems and Applications
- /
- v.9 no.1
- /
- pp.17-24
- /
- 2014
In this paper, we implement and evaluate the performance of a vector-based rasterization algorithm for 3D graphics by using a SIMD (single instruction multiple data) many-core processor architecture. In addition, we evaluate the impact of a data-per-processing elements (DPE) ratio that is defined as the amount of data directly mapped to each processing element (PE) within many-core in terms of performance, energy efficiency, and area efficiency. For the experiment, we utilize seven different PE configurations by varying the DPE ratio (or the number PEs), which are implemented in the same 130 nm CMOS technology with a 500 MHz clock frequency. Experimental results indicate that the optimal PE configuration is achieved as the DPE ratio is in the range from 16,384 to 256 (or the number of PEs is in the range from 16 and 1,024), which meets the requirements of mobile devices in terms of the optimal performance and efficiency.
https://doi.org/10.14372/IEMEK.2014.9.1.17 인용 PDF KSCI

Search Result 120, Processing Time 0.026 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)