• Title/Summary/Keyword: 병렬 연산 처리

Search Result 552, Processing Time 0.03 seconds

Hardware Implementation of Genetic Algorithm for Evolvable Hardware (진화하드웨어 구현을 위한 유전알고리즘 설계)

  • Dong, Sung-Soo;Lee, Chong-Ho
    • 전자공학회논문지 IE
    • /
    • v.45 no.4
    • /
    • pp.27-32
    • /
    • 2008
  • This paper presents the implementation of simple genetic algorithm using hardware description language for evolvable hardware embedded system. Evolvable hardware refers to hardware that can change its architecture and behavior dynamically and autonomously by interacting with its environment. So, it is especially suited to applications where no hardware specifications can be given in advance. Evolvable hardware is based on the idea of combining reconfigurable hardware device with evolutionary computation, such as genetic algorithm. Because of parallel, no function call overhead and pipelining, a hardware genetic algorithm give speedup over a software genetic algorithm. This paper suggests the hardware genetic algorithm for evolvable embedded system chip. That includes simulation results for several fitness functions.

Parallel Accessed Mirroring based on Stripping (스트라이핑 기반의 병렬 접근 미러링 기법)

  • Kang, Dong-Jae;Kim, Chang-Soo;Shin, Bum-Joo;Kim, Hag-Young
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2002.11a
    • /
    • pp.539-542
    • /
    • 2002
  • 멀티미디어와 인터넷의 대중화가 야기한 급격한 데이터의 증가는 테라(Tera)바이트 이상의 대용량 저장공간과 대용량 정보의 효율적인 공유를 지원하는 스토리지 시스템을 요구하고 있으며 이를 위하여 SAN 기반의 스토리지 클러스터링 시스템들이 많이 사용되고 있다. 이러한 환경에서 하드웨어 또는 소프트웨어 RAID(Redundant Array of Independent Disks)는 대용량 정보의 고성능의 입출력과 신뢰성을 위해서 필수적이 되었다. 범용적인 RAID로는 RAE-0, RAID-1, RAID-5가 주로 사용되고 있으며 각각의 레벨은 장단점을 갖는다. 본 논문에서는 RAID-0와 RAID-1이 갖는 문제점들의 보완을 위하여 변형된 RAID 레벨인 RAID-SM을 제안한다. RAID-SM은 기존의 RAID-1이 가지는 데이터의 가용성을 유지하면서 추가적인 비용 없이 RAID-0의 우수한 입출력 성능을 얻기 위한 RAID-1의 변형된 방식이다. RAID-SM의 구현을 위하여 디스크상의 데이터의 배치 및 데이터 맵핑 탕식을 정의하고 RAID-SM에서의 I/O방법을 기술한다. 제안하는 RAID-SM은 멀티미디어나 GIS 데이터와 같은 읽기 연산 집약적인 시스템을 대상으로 하는 안정적인 레이드 방식이며 RAID-SM의 장점 및 성능은 본 논문에서의 실험을 통한 결과로서 제시한다.

  • PDF

Evaluation of Alignment Methods for Genomic Analysis in HPC Environment (HPC 환경의 대용량 유전체 분석을 위한 염기서열정렬 성능평가)

  • Lim, Myungeun;Jung, Ho-Youl;Kim, Minho;Choi, Jae-Hun;Park, Soojun;Choi, Wan;Lee, Kyu-Chul
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.2 no.2
    • /
    • pp.107-112
    • /
    • 2013
  • With the progress of NGS technologies, large genome data have been exploded recently. To analyze such data effectively, the assistance of HPC technique is necessary. In this paper, we organized a genome analysis pipeline to call SNP from NGS data. To organize the pipeline efficiently under HPC environment, we analyzed the CPU utilization pattern of each pipeline steps. We found that sequence alignment is computing centric and suitable for parallelization. We also analyzed the performance of parallel open source alignment tools and found that alignment method utilizing many-core processor can improve the performance of genome analysis pipeline.

Request Two-Phase Locking Method for Series Sequence Re-adjustment of Concurrency Control in Multi-Level Secure DBMS (다단계 보안 데이터베이스 시스템에서 병행수행 제어의 직렬화 순서를 재조정하기 위한 요청 2단계 로킹기법)

  • Lee, Seungsoo;Cho, Jinsung;Jeong, Byungsoo
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2004.05a
    • /
    • pp.105-108
    • /
    • 2004
  • 다단계 보안데이터베이스 시스템에서 기본적인 병행수행 제어 기법들은 비밀채널과 교착상태등과 같은 문제들이 발생하였다. 이에 직렬화 순서를 동적으로 재조정함으로서 해결하려는 방안이 있었지만, 알고리즘의 복잡성으로 인해 오버 헤드와 많은 수행시간이 필요하게 되었고, 이에 따라 많은 양의 시스템 자원과 높은 사양의 시스템을 요구하게 되었다. 또한 이러한 방법은 다중 버전을 사용함으로서 추가적인 관리비용이 높게 되었고, 각각의 트랜잭션이 지연 및 재수행이란 불필요한 과정을 반복하게 되었다. 따라서 본 논문에서는 제안한 알고리즘은 데이터베이스의 용도에 맞게 직렬화 순서를 보장하여 스케줄을 관리하는 요청 2단계 로킹기법(Request Two-phase Locking)으로서 이는 2단계 로킹기법의 기본원리에 요청로크를 사용함으로 보다 효율적으로 병행제어를 할 수 있다. 여기서 요청로크는 각각의 트랜잭션 스케줄에 로크획득 및 해제를 병행수행제어의 필요에 따라 유동적으로 할 수 있으며, 읽기로크, 쓰기로크, 요청로크라는 3가지 로킹모드를 통해 대처방안을 마련함으로서, 충돌을 방지하며, 충돌연산의 특성에 따라 직렬화 순서를 동적으로 조정함으로 블록킹을 막는 병행제어를 응용하여 병렬성을 유지한다.

  • PDF

Parallel Processing of Multiple Queries in a Declustered Spatial Database (디클러스터된 공간 데이터베이스에서 다중 질의의 병렬 처리)

  • Seo, Yeong-Deok;Park, Yeong-Min;Jeon, Bong-Gi;Hong, Bong-Hui
    • Journal of KIISE:Databases
    • /
    • v.29 no.1
    • /
    • pp.44-57
    • /
    • 2002
  • Multiple spatial queries are defined as two or more spatial range queries to be executed at the same time. The primary processing of internet-based map services is to simultaneously execute multiple spatial queries. To improve the throughput of multiple queries, the time of disk I/O in processing spatial queries significantly should be reduced. The declustering scheme of a spatial dataset of the MIMD architecture cannot decrease the disk I/O time because of random seeks for processing multiple queries. This thesis presents query scheduling strategies to ease the problem of inter-query random seeks. Query scheduling is achieved by dynamically re-ordering the priority of the queued spatial queries. The re-ordering of multiple queries is based on the inter-query spatial relationship and the latency of query processing. The performance test shows that the time of multiple query processing with query scheduling can be significantly reduced by easing inter-query random seeks as a consequence of enhanced hit ratio of disk cache.

A Hardware Implementation of the Underlying Field Arithmetic Processor based on Optimized Unit Operation Components for Elliptic Curve Cryptosystems (타원곡선을 암호시스템에 사용되는 최적단위 연산항을 기반으로 한 기저체 연산기의 하드웨어 구현)

  • Jo, Seong-Je;Kwon, Yong-Jin
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.8 no.1
    • /
    • pp.88-95
    • /
    • 2002
  • In recent years, the security of hardware and software systems is one of the most essential factor of our safe network community. As elliptic Curve Cryptosystems proposed by N. Koblitz and V. Miller independently in 1985, require fewer bits for the same security as the existing cryptosystems, for example RSA, there is a net reduction in cost size, and time. In this thesis, we propose an efficient hardware architecture of underlying field arithmetic processor for Elliptic Curve Cryptosystems, and a very useful method for implementing the architecture, especially multiplicative inverse operator over GF$GF (2^m)$ onto FPGA and futhermore VLSI, where the method is based on optimized unit operation components. We optimize the arithmetic processor for speed so that it has a resonable number of gates to implement. The proposed architecture could be applied to any finite field $F_{2m}$. According to the simulation result, though the number of gates are increased by a factor of 8.8, the multiplication speed We optimize the arithmetic processor for speed so that it has a resonable number of gates to implement. The proposed architecture could be applied to any finite field $F_{2m}$. According to the simulation result, though the number of gates are increased by a factor of 8.8, the multiplication speed and inversion speed has been improved 150 times, 480 times respectively compared with the thesis presented by Sarwono Sutikno et al. [7]. The designed underlying arithmetic processor can be also applied for implementing other crypto-processor and various finite field applications.

Design and Implementation of Accelerator Architecture for Binary Weight Network on FPGA with Limited Resources (한정된 자원을 갖는 FPGA에서의 이진가중치 신경망 가속처리 구조 설계 및 구현)

  • Kim, Jong-Hyun;Yun, SangKyun
    • Journal of IKEEE
    • /
    • v.24 no.1
    • /
    • pp.225-231
    • /
    • 2020
  • In this paper, we propose a method to accelerate BWN based on FPGA with limited resources for embedded system. Because of the limited number of logic elements available, a single computing unit capable of handling Conv-layer, FC-layer of various sizes must be designed and reused. Also, if the input feature map can not be parallel processed at one time, the output must be calculated by reading the inputs several times. Since the number of available BRAM modules is limited, the number of data bits in the BWN accelerator must be minimized. The image classification processing time of the BWN accelerator is superior when compared with a embedded CPU and is faster than a desktop PC and 50% slower than a GPU system. Since the BWN accelerator uses a slow clock of 50MHz, it can be seen that the BWN accelerator is advantageous in performance versus power.

A Design of Parallel Processing System for Management of Moving Objects (이동체 관리를 위한 다중 처리 시스템의 설계)

  • 김진덕;강구안;육정수;박연식
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2004.05b
    • /
    • pp.345-349
    • /
    • 2004
  • In order to index exactly moving objects(vehicle, mobile phone, PDA, etc.) in the mobile database, continuous updates of their locations are inevitable as well as time-consuming. The studies of pure spatial indices have focused on the efficient retrievals. However, the acquisition and management of the terminal Location of moving objects are more important than the efficiency of the query processing in the moving object databases. Therefore, it will be need to adopt parallel processing system for the moving object databases which should maintain the object's current location as precise as possible. This paper proposes a architecture of spatial indexing mobile objects using multiple processors. More precisely, we newly propose a method of splitting buckets using the properties of moving objects in order to minimize the number of database updates. We also propose a acquisition method for gathering the location information of moving objects and passing the information of the bucket extents in order to reduce the amount of passed messages between processors.

  • PDF

Optimized Hardware Design of Deblocking Filter for H.264/AVC (H.264/AVC를 위한 디블록킹 필터의 최적화된 하드웨어 설계)

  • Jung, Youn-Jin;Ryoo, Kwang-Ki
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.47 no.1
    • /
    • pp.20-27
    • /
    • 2010
  • This paper describes a design of 5-stage pipelined de-blocking filter with power reduction scheme and proposes a efficient memory architecture and filter order for high performance H.264/AVC Decoder. Generally the de-blocking filter removes block boundary artifacts and enhances image quality. Nevertheless filter has a few disadvantage that it requires a number of memory access and iterated operations because of filter operation for 4 time to one edge. So this paper proposes a optimized filter ordering and efficient hardware architecture for the reduction of memory access and total filter cycles. In proposed filter parallel processing is available because of structured 5-stage pipeline consisted of memory read, threshold decider, pre-calculation, filter operation and write back. Also it can reduce power consumption because it uses a clock gating scheme which disable unnecessary clock switching. Besides total number of filtering cycle is decreased by new filter order. The proposed filter is designed with Verilog-HDL and functionally verified with the whole H.264/AVC decoder using the Modelsim 6.2g simulator. Input vectors are QCIF images generated by JM9.4 standard encoder software. As a result of experiment, it shows that the filter can make about 20% total filter cycles reduction and it requires small transposition buffer size.

A Study on the Underwater Channel Model based on a High-Order Finite Difference Method using GPUs (그래픽 프로세서를 이용한 고차 유한 차분식 기반 수중채널모델 연구)

  • Bae, Ho Seuk;Kim, Won-Ki;Son, Su-Uk;Ha, Wansoo
    • Journal of the Korea Society for Simulation
    • /
    • v.30 no.1
    • /
    • pp.11-20
    • /
    • 2021
  • As unmanned underwater systems have recently emerged, a high-speed underwater channel modeling technique, which is one of the most important techniques in the system, has received a lot of attention. In this paper, we proposed a high-speed sound propagation model and verified the applicability through quantitative performance analyses. We used a high-order finite difference method (FDM) for wave propagation modeling in the water, and a domain decomposition method was adopted using multiple general-purpose graphics processing units (GPUs) to increase the calculation efficiency. We compared the results of the model we proposed with the analytic solution in the half-infinite media and results of the Virtual Timeseries Experiment (VirTEX) model, which is based on the ray method. Finally, we analyzed the performance of the model quantitatively using numerical examples. Through quantitative analyses of the improvement in computational performance, we confirmed that the computational speed increases linearly as the number of GPUs increases. The computation times are increased by 2 times and 8 times, respectively, when the domain size of computation and the maximum frequency are doubled. We expect that the proposed high-speed underwater channel modeling technique is able to contribute to the enhancement of national defense as an underwater communication channel model and analysis tool to develop the underwater communication technique for the unmanned underwater system.