Search | Korea Science

Performance Improvement of Prediction-Based Parallel Gate-Level Timing Simulation Using Prediction Accuracy Enhancement Strategy (예측정확도 향상 전략을 통한 예측기반 병렬 게이트수준 타이밍 시뮬레이션의 성능 개선)

Yang, Seiyang
- KIPS Transactions on Computer and Communication Systems
- /
- v.5 no.12
- /
- pp.439-446
- /
- 2016
In this paper, an efficient prediction accuracy enhancement strategy is proposed for improving the performance of the prediction-based parallel event-driven gate-level timing simulation. The proposed new strategy adopts the static double prediction and the dynamic prediction for input and output values of local simulations. The double prediction utilizes another static prediction data for the secondary prediction once the first prediction fails, and the dynamic prediction tries to use the on-going simulation result accumulated dynamically during the actual parallel simulation execution as prediction data. Therefore, the communication overhead and synchronization overhead, which are the main bottleneck of parallel simulation, are maximally reduced. Throughout the proposed two prediction enhancement techniques, we have observed about 5x simulation performance improvement over the commercial parallel multi-core simulation for six test designs.
https://doi.org/10.3745/KTCCS.2016.5.12.439 인용 PDF KSCI

Prediction-Based Parallel Gate-Level Timing Simulation Using Spatially Partial Simulation Strategy (공간적 부분시뮬레이션 전략이 적용된 예측기반 병렬 게이트수준 타이밍 시뮬레이션)

Han, Jaehoon;Yang, Seiyang
- KIPS Transactions on Computer and Communication Systems
- /
- v.8 no.3
- /
- pp.57-64
- /
- 2019
In this paper, an efficient prediction-based parallel simulation method using spatially partial simulation strategy is proposed for improving both the performance of the event-driven gate-level timing simulation and the debugging efficiency. The proposed method quickly generates the prediction data on-the-fly, but still accurately for the input values and output values of parallel event-driven local simulations by applying the strategy to the simulation at the higher abstraction level. For those six designs which had used for the performance evaluation of the proposed strategy, our method had shown about 3.7x improvement over the most general sequential event-driven gate-level timing simulation, 9.7x improvement over the commercial multi-core based parallel event-driven gate-level timing simulation, and 2.7x improvement over the best of previous prediction-based parallel simulation results, on average.
https://doi.org/10.3745/KTCCS.2019.8.3.57 인용 PDF KSCI HTML

A Study On Improving the Performance of One Dimensional Systolic Array Processor for Matrix.Vector Operation using Sub-Matrix (부분행렬을 사용한 행렬.벡터 연산용 1차원 시스톨릭 어레이 프로세서 설계에 관한 연구)

Kim, Yong-Sung
- The Journal of Information Technology
- /
- v.10 no.3
- /
- pp.33-45
- /
- 2007
Systolic Array Processor is used for designing the special purpose processor in Digital Signal Processing, Computer Graphics, Neural Network Applications etc., since it has the characteristic of parallelism, pipeline processing and architecture of regularity. But, in case of using general design method, it has intial waiting period as large as No. of PE-1. And if the connected system needs parallel and simultaneous outputs, processor has some problems of the performance, since it generates only one output at each clock in output state. So in this paper, one dimensional Systolic Array Processor that is designed according to the dependance of data and operations using the partitioned sub-matrix is proposed for the purpose of improving the performance. 1-D Systolic Array using 4 partitioned sub-matrix has efficient method in case of considering those two problems.
PDF

Design and Implementation of High-Performance Parallel Fuzzy Architecture (고성능 병렬 퍼지 아키텍처의 설계 및 구현)

Lee, Sang-Gu
- The Transactions of the Korea Information Processing Society
- /
- v.5 no.7
- /
- pp.1791-1800
- /
- 1998
본 논문에서는 Mamdani 방법과 Koczy 방법의 퍼지 추론 알고리즘에 대햇 병렬머신에 적합한 병렬 퍼지 추론 방법을 제안하고, 효율적인 병렬 퍼지 아키텍처를 설계한다. 제안된 아키텍처는 비교적 높은 성능을 갖고, 확장이 용이한 구조로서, 여러개의 FPE(Fuzzy Processing Element), CP(Control Processor), 메모리 모듈, 상호연결망 및 Min 회로로 구성되어 있다. 이러한 구조의 특징은 iqjsWo의 FPE는 I번째의 전건부 및 I번째의 후건부의 처리만을 수행하기 때문에 전건부, 변수들의 처리는 각각 병렬도 수행되고, 후건부의 처리도 또한 각각 병렬로 수행된다. 따라서 프로세서의 활용도가 높아지며, 전건부와 후건부의 변수, 퍼지규칙의수에 관계없이 쉽게 구성할 수 있다. 이러한 구조는 실시간에 고속추론을 요하는 시스템 또는 전건부와 후건부의 변수가 많은 대규모 전문가 시스템에 사용되어 질 수 있으며, MISO(Multiple-input, Single-output) 시스템보다 MIMO(Multiple-input, Multiple-output) 시스템에 특히 적합하다.
PDF

The Construction of Universal Mulitple Processing Unit based on De Bruijn Graph

Park, Chun-Myoung;Song, Hong-Bok
- Proceedings of the IEEK Conference
- /
- 2002.07b
- /
- pp.959-962
- /
- 2002
This paper presents a method of constructing the universal multiple processing element unit(UMPEU) based on De Bruijn Graph. The proposed method is as following. Firstly we propose transformation operators in order to construct the De Bruijn graph using properties of graph. Secondly we construct the transformation table of De Bruijn graph using above transformation operators. Finally we construct the De Bruijn graph using transformation table. The proposed UMPEU is capable of constructing the De Bruijn geraph for any prime number and integer value of finite fields. Also the UMPEU is applied to fault-tolerant computing system, pipeline class, parallel processing network, switching function and its circuits.
PDF

Highly Efficient and Precise DOA Estimation Algorithm

Yang, Xiaobo
- Journal of Information Processing Systems
- /
- v.18 no.3
- /
- pp.293-301
- /
- 2022
Direction of arrival (DOA) estimation of space signals is a basic problem in array signal processing. DOA estimation based on the multiple signal classification (MUSIC) algorithm can theoretically overcome the Rayleigh limit and achieve super resolution. However, owing to its inadequate real-time performance and accuracy in practical engineering applications, its applications are limited. To address this problem, in this study, a DOA estimation algorithm with high parallelism and precision based on an analysis of the characteristics of complex matrix eigenvalue decomposition and the coordinate rotation digital computer (CORDIC) algorithm is proposed. For parallel and single precision, floating-point numbers are used to construct an orthogonal identity matrix. Thus, the efficiency and accuracy of the algorithm are guaranteed. Furthermore, the accuracy and computation of the fixed-point algorithm, double-precision floating-point algorithm, and proposed algorithm are compared. Without increasing complexity, the proposed algorithm can achieve remarkably higher accuracy and efficiency than the fixed-point algorithm and double-precision floating-point calculations, respectively.
https://doi.org/10.3745/JIPS.03.0177 인용 PDF KSCI

Privacy-preserving Parallel Range Query Processing Algorithm based on Garbled Circuit in Cloud Computing (클라우드 상에서 정보 보호를 지원하는 garbled circuit 기반 병렬 영역 질의처리 알고리즘)

Kim, Hyeong-Jin;Chang, Jae-Woo
- Annual Conference of KIPS
- /
- 2021.05a
- /
- pp.159-162
- /
- 2021
최근 클라우드 컴퓨팅이 발전함에 따라 데이터베이스 아웃소싱에 대한 관심이 증가하였다. 그러나 데이터베이스를 아웃소싱하는 경우, 데이터 소유자의 정보가 내외부 공격자에게 노출되는 문제점을 지닌다. 따라서 본 논문에서는 정보 보호를 지원하는 병렬 영역 질의처리 알고리즘을 제안한다. 제안하는 알고리즘은 garbled circuit 및 thread pool을 통해 암호화 연산 프로토콜의 효율성을 향상시키고, 알고리즘의 처리과정을 병렬화함으로써 높은 질의 처리 성능을 제공한다. 성능평가를 통해, 제안하는 알고리즘이 고수준의 정보 보호를 지원하는 동시에 기존 알고리즘에 비해 약 20배의 우수한 질의 처리 성능을 보인다.
https://doi.org/10.3745/PKIPS.y2021m05a.159 인용 PDF

Join Operation of Parallel Database System with Large Main Memory (대용량 메모리를 가진 병렬 데이터베이스 시스템의 조인 연산)

Park, Young-Kyu
- Journal of the Korea Society of Computer and Information
- /
- v.12 no.3
- /
- pp.51-58
- /
- 2007
The shared-nothing multiprocessor architecture has advantages in scalability, this architecture has been adopted in many multiprocessor database system. But, if the data are not uniformly distributed across the processors, load will be unbalanced. Therefore, the whole system performance will deteriorate. This is the data skew problem, which usually occurs in processing parallel hash join. Balancing the load before performing join will resolve this problem efficiently and the whole system performance can be improved. In this paper, we will present an algorithm using merit of very large memory to reduce disk access overhead in performing load balancing and to efficiently solve the data skew problem. Also, we will present analytical model of our new algorithm and present the result of some performance study we made comparing our algorithm with the other algorithms in handling data skew.
PDF

Improved Tracking System and Realistic Drawing for Real-Time Water-Based Sign Pen (향상된 트래킹 시스템과 실시간 수성 사인펜을 위한 사실적 드로잉)

Hur, Hyejung;Lee, Ju-Young
- Journal of the Korea Society of Computer and Information
- /
- v.19 no.2
- /
- pp.125-132
- /
- 2014
In this paper, we present marker-less fingertip and brush tracking system with inexpensive web camera. Parallel computation using CUDA is applied to the tracking system. This tracking system can run on inexpensive environment such as a laptop or a desktop and support for real-time application. We also present realistic water-based sign pen drawing model and implementation. The realistic drawing application with our inexpensive real-time fingertip and brush tracking system shows us the art class of the future. The realistic drawing application, along with our inexpensive real-time fingertip and brush tracking system, would be utilized in test-bed for the future high-technology education environment.
https://doi.org/10.9708/jksci.2014.19.2.125 인용 PDF KSCI

Building a Dynamic Analyzer for CUDA based System.

SALAH T. ALSHAMMARI
- International Journal of Computer Science & Network Security
- /
- v.23 no.8
- /
- pp.77-84
- /
- 2023
The utilization of GPUs on general-purpose computers is currently on the rise due to the increase in its programmability and performance requirements. The utility of tools like NVIDIA's CUDA have been designed to allow programmers to code algorithms by using C-like language for the execution process on the graphics processing units GPU. Unfortunately, many of the performance and correctness bugs will happen on parallel programs. The CUDA tool support for the parallel programs has not yet been actualized. The use of a dynamic analyzer to find performance and correctness bugs in CUDA programs facilitates the execution of sophisticated processes, especially in modern computing requirements. Any race conditions bug it will impact of program correctness and the share memory bank conflicts to improve the overall performance. The technique instruments the programs in a way that promotes accessibility of the memory locations accessed by different threads well as to check for any bugs in the code of a program. The instrumented source code will be used initiated directly in the device emulation code of CUDA to send report for the user about all errors. The current degree of automation helps programmers solve subtle bugs in highly complex programs or programs that cannot be analyzed manually.
https://doi.org/10.22937/IJCSNS.2023.23.8.9 인용 PDF

Search Result 652, Processing Time 0.031 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)