• 제목/요약/키워드: parallel computer processing

검색결과 651건 처리시간 0.035초

질량스프링 시뮬레이션을 위한 병렬 구조 설계 방법 (Parallel Structure Design Method for Mass Spring Simulation)

  • 성낙준;최유주;홍민
    • 한국컴퓨터그래픽스학회논문지
    • /
    • 제25권3호
    • /
    • pp.55-63
    • /
    • 2019
  • 최근 물리 시뮬레이션 분야의 성능 개선을 위해 GPU 컴퓨팅 방식이 활용되고 있다. 특히 많은 연산의 양을 요구하는 변형물체 시뮬레이션의 경우 실시간성 보장을 위해 GPU 기반 병렬처리 알고리즘을 필요로 한다. 본 연구진은 변형물체 시뮬레이션을 구현하는 방법 중 하나인 질량스프링 시뮬레이션 기법의 성능을 향상시키기 위한 병렬 구조 설계 방법에 대한 연구를 수행하였다. 이를 위해 GPU에 직접 접근이 가능한 그래픽 라이브러리인 OpenGL의 GLSL을 사용하였으며, 독립적인 파이프라인인 컴퓨트 쉐이더를 활용해 GPGPU 환경을 구현하였다. 병렬 구조 설계 방법의 효과를 검증하기 위해 스프링 기반 질량스프링 시스템을 CPU기반과 GPU기반으로 구현하였으며, 실험의 결과 본 설계 방법을 적용하였을 때 CPU 환경에 비해 연산 속도가 약 6,000% 개선됨을 보였다. 추후 본 연구에서 제안한 설계 방법을 활용한다면 경량화 시뮬레이션 기술이 필요한 증강현실 및 가상현실 분야에 효과적으로 적용이 가능할 것으로 기대한다.

Direct Methods for Linear System on Distributed Memory Parallel Computers

  • Nishimura, S.;Shigehara, T.;Mizoguchi, H.;Mishima, T.;Kobayashi, H.
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2000년도 ITC-CSCC -1
    • /
    • pp.333-336
    • /
    • 2000
  • We discuss the direct methods (Gauss-Jordan and Gaussian eliminations) to solve linear systems on distributed memory parallel computers. It will be shown that the so-called row-cyclic storage gives rise to the best performance among the standard three (row-cyclic, column-cyclic and cyclic-cyclic) data storages. We also show that Gauss-Jordan elimination, rather than Gaussian elimination, is highly efficient for the direct solution of linear systems in parallel processing, though Gauss-Jordan elimination requires a larger number of arithmetic operations than Gaussian elimination. Numerical experiment is performed on HITACHI SR12201 with the standard libraries MPI and BLAS.

  • PDF

Parallel Multithreaded Processing for Data Set Summarization on Multicore CPUs

  • Ordonez, Carlos;Navas, Mario;Garcia-Alvarado, Carlos
    • Journal of Computing Science and Engineering
    • /
    • 제5권2호
    • /
    • pp.111-120
    • /
    • 2011
  • Data mining algorithms should exploit new hardware technologies to accelerate computations. Such goal is difficult to achieve in database management system (DBMS) due to its complex internal subsystems and because data mining numeric computations of large data sets are difficult to optimize. This paper explores taking advantage of existing multithreaded capabilities of multicore CPUs as well as caching in RAM memory to efficiently compute summaries of a large data set, a fundamental data mining problem. We introduce parallel algorithms working on multiple threads, which overcome the row aggregation processing bottleneck of accessing secondary storage, while maintaining linear time complexity with respect to data set size. Our proposal is based on a combination of table scans and parallel multithreaded processing among multiple cores in the CPU. We introduce several database-style and hardware-level optimizations: caching row blocks of the input table, managing available RAM memory, interleaving I/O and CPU processing, as well as tuning the number of working threads. We experimentally benchmark our algorithms with large data sets on a DBMS running on a computer with a multicore CPU. We show that our algorithms outperform existing DBMS mechanisms in computing aggregations of multidimensional data summaries, especially as dimensionality grows. Furthermore, we show that local memory allocation (RAM block size) does not have a significant impact when the thread management algorithm distributes the workload among a fixed number of threads. Our proposal is unique in the sense that we do not modify or require access to the DBMS source code, but instead, we extend the DBMS with analytic functionality by developing User-Defined Functions.

Parallel Fuzzy Information Processing System - KAFA : KAist Fuzzy Accelerator -

  • Kim, Young-Dal;Lee, Hyung-Kwang;Park, Kyu-Ho
    • 한국지능시스템학회:학술대회논문집
    • /
    • 한국퍼지및지능시스템학회 1993년도 Fifth International Fuzzy Systems Association World Congress 93
    • /
    • pp.981-984
    • /
    • 1993
  • During the past decade, several specific hardwares for fast fuzzy inference have been developed. Most of them are dedicated to a specific inference method and thus cannot support other inference methods. In this paper, we present a hardware architecture called KAFA(KAist Fuzzy Accelerator) which provides various fuzzy inference methods and fuzzy set operators. The architecture has SIMD structure, which consists of two parts; system control/interface unit(Main Controller) and arithmetic units(FPEs). Using the parallel processing technology, the KAFA has the high performance for fuzzy information processing. The speed of the KAFA holds promise for the development of the new fuzzy application systems.

  • PDF

Parallel Processing of the Fuzzy Fingerprint Vault based on Geometric Hashing

  • Chae, Seung-Hoon;Lim, Sung-Jin;Bae, Sang-Hyun;Chung, Yong-Wha;Pan, Sung-Bum
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제4권6호
    • /
    • pp.1294-1310
    • /
    • 2010
  • User authentication using fingerprint information provides convenience as well as strong security. However, serious problems may occur if fingerprint information stored for user authentication is used illegally by a different person since it cannot be changed freely as a password due to a limited number of fingers. Recently, research in fuzzy fingerprint vault system has been carried out actively to safely protect fingerprint information in a fingerprint authentication system. In addition, research to solve the fingerprint alignment problem by applying a geometric hashing technique has also been carried out. In this paper, we propose the hardware architecture for a geometric hashing based fuzzy fingerprint vault system that consists of the software module and hardware module. The hardware module performs the matching for the transformed minutiae in the enrollment hash table and verification hash table. On the other hand, the software module is responsible for hardware feature extraction. We also propose the hardware architecture which parallel processing technique is applied for high speed processing. Based on the experimental results, we confirmed that execution time for the proposed hardware architecture was 0.24 second when number of real minutiae was 36 and number of chaff minutiae was 200, whereas that of the software solution was 1.13 second. For the same condition, execution time of the hardware architecture which parallel processing technique was applied was 0.01 second. Note that the proposed hardware architecture can achieve a speed-up of close to 100 times compared to a software based solution.

고정 그리드를 이용한 병렬 공간 조인을 위한 비용 모델 (Cost Model for Parallel Spatial Joins using Fixed Grids)

  • 김진덕;홍봉희
    • 한국정보과학회논문지:데이타베이스
    • /
    • 제28권4호
    • /
    • pp.665-676
    • /
    • 2001
  • 공간 데이타베이스에서 가장 비용이 큰 공간 연산자는 공간 조인이다. 공간 조인은 두개의 데이타 집합으로부터 공간적인 조건을 만족하는 두 객체 쌍의 집함을 구하는 것이다. 지난 수년동안 공간 조인의 순차 수행 시간은 많이 향상되었지만, 그 웅답시간은 사용자의 요구를 만족시키지 못하고 있다. 그래서 공간조 인의 병렬 수행에 대한 연구가 자연스럽게 대두되고 있다. 공간 데이타베이스 관리 시스템에서 공간 데이타 의 관리의 용이성 및 부분 지역 검색의 효율성 등을 위해 고정 크기의 격자 구조를 갖는 고정 그리드를 이용 할 수 있다. 그러나 지금가지 고정 그리드를 이용한 공간조인의 병렬 처리에 관한 연구는 거의 없다. 이 논문에서는 고정 그리드를 이용한 병렬 공간 조인 알고리즘의 성능을 예측하는 비용 모델을 제시하 였는데, 이는 최소 경계 사각형(Minimum Bounding Rectangle : MBR)의 비교 횟수. 디스크 접근 횟수,메시지 전송 횟수 등을 근거로 하였다. 실제 데이타 및 인위 데이타 집합을 이용한 실험은 제안한 비용 모델이 정확함을 보여주었다. 이 비용 모델은 복합 공간 질의의 비용을 예측할 필요가 있는 공간 질의 최 적화를 위한 유용한 도구가 될 것으로 기대된다.

  • PDF

얼굴 추적을 위한 병렬처리 시스템의 설계 (Design of Parallel Processing System for Face Tracking)

  • 김상호;서영진;김경남;고종국
    • 한국정보과학회:학술대회논문집
    • /
    • 한국정보과학회 1998년도 가을 학술발표논문집 Vol.25 No.2 (3)
    • /
    • pp.765-767
    • /
    • 1998
  • Many application in human computer interaction(HCI) require tacking a human face and facial features. In this paper we propose efficient parallel processing system for face tracking under heterogeneous networked. To track a face in the video image we use the skin color information and connected components. In terms of parallelism we choose the master-slave model which has thread for each processes, master and slaves, The threads are responsible for real computation in each process. By placing queues between the threads we give flexibility of data flowing

  • PDF

병렬형 스트림 암호 시스템 설계에 관한 연구 (A Study on Designs for a Parallel Stream Cipher System)

  • 이훈재
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2000년도 추계학술발표논문집 (상)
    • /
    • pp.805-808
    • /
    • 2000
  • 통신망의 급격한 발전과 통신 속도의 향상에 따라 암호 알고리듬의 고속화 필요성이 절실하다. 본 논문에서는 LFSR을 고속화하기 위하여 한 클럭에 m번의 이동이 이루어지는 고속형 HS-LFSR을 제안하였고, 이를 기본으로 다수의 키 수열 발생기를 병렬 연결하여 속도를 개선시킨 병렬형 스트림암호를 제안하였다. 그리고 병렬형 스트림 암호 예로서 m-병렬 합산 수열 발생기(m-parallel SUM-BSG)를 제안하여 m = 8인 병렬 발생기를 세부 설계 예시하였으며, 제안된 발생기는 기존의 비도 수준을 유지하면서 처리 속도를 m배 높일 수 있음을 확인하였다.

  • PDF

개선된 세포 외곽선 추출 알고리즘의 병렬화 (Improved Parallelization of Cell Contour Extraction Algorithm)

  • 유숙현;조우현;권희용
    • 한국멀티미디어학회논문지
    • /
    • 제20권5호
    • /
    • pp.740-747
    • /
    • 2017
  • A fast cell contour extraction method using CUDA parallel processing technique is presented. The cell contour extraction is one of important processes to analyze cell information in pathology. However, conventional sequential contour extraction methods are slow for a huge high-resolution medical image, so they are not adequate to use in the field. We developed a parallel morphology operation algorithm to extract cell contour more quickly. The algorithm can create an inner contour and fail to extract the contour from the concave part of the cell. We solved these problems by subdividing the contour extraction process into four steps: morphology operation, labeling, positioning and contour extraction. Experimental results show that the proposed method is four times faster than the conventional one.

Development of Real time Air Quality Prediction System

  • Oh, Jai-Ho;Kim, Tae-Kook;Park, Hung-Mok;Kim, Young-Tae
    • 한국환경과학회:학술대회논문집
    • /
    • 한국환경과학회 2003년도 International Symposium on Clean Environment
    • /
    • pp.73-78
    • /
    • 2003
  • In this research, we implement Realtime Air Diffusion Prediction System which is a parallel Fortran model running on distributed-memory parallel computers. The system is designed for air diffusion simulations with four-dimensional data assimilation. For regional air quality forecasting a series of dynamic downscaling technique is adopted using the NCAR/Penn. State MM5 model which is an atmospheric model. The realtime initial data have been provided daily from the KMA (Korean Meteorological Administration) global spectral model output. It takes huge resources of computation to get 24 hour air quality forecast with this four step dynamic downscaling (27km, 9km, 3km, and lkm). Parallel implementation of the realtime system is imperative to achieve increased throughput since the realtime system have to be performed which correct timing behavior and the sequential code requires a large amount of CPU time for typical simulations. The parallel system uses MPI (Message Passing Interface), a standard library to support high-level routines for message passing. We validate the parallel model by comparing it with the sequential model. For realtime running, we implement a cluster computer which is a distributed-memory parallel computer that links high-performance PCs with high-speed interconnection networks. We use 32 2-CPU nodes and a Myrinet network for the cluster. Since cluster computers more cost effective than conventional distributed parallel computers, we can build a dedicated realtime computer. The system also includes web based Gill (Graphic User Interface) for convenient system management and performance monitoring so that end-users can restart the system easily when the system faults. Performance of the parallel model is analyzed by comparing its execution time with the sequential model, and by calculating communication overhead and load imbalance, which are common problems in parallel processing. Performance analysis is carried out on our cluster which has 32 2-CPU nodes.

  • PDF