• Title/Summary/Keyword: 병렬 구현

Search Result 1,474, Processing Time 0.025 seconds

Improving the Performance of Document Similarity by using GPU Parallelism (GPU 병렬성을 이용한 문서 유사도 계산 성능 개선)

  • Park, Il-Nam;Bae, Byung-Gurl;Im, Eun-Jin;Kang, Seung-Shik
    • The KIPS Transactions:PartB
    • /
    • v.19B no.4
    • /
    • pp.243-248
    • /
    • 2012
  • In the information retrieval systems like vector model implementation and document clustering, document similarity calculation takes a great part on the overall performance of the system. In this paper, GPU parallelism has been explored to enhance the processing speed of document similarity calculation in a CUDA framework. The proposed method increased the similarity calculation speed almost 15 times better compared to the typical CPU-based framework. It is 5.2 and 3.4 times better than the methods by using CUBLAS and Thrust, respectively.

Unequal Dualband Wilkinson Divider Using CPW and Shunt Connected Open Stub Transmission Lines (CPW와 개방 스터브가 병렬 연결된 전송선로를 이용한 비대칭 이중대역 Wilkinson 분배기)

  • Kwon, Sang-Keun;Kim, Young;Yoon, Young-Chul
    • Journal of Advanced Navigation Technology
    • /
    • v.20 no.1
    • /
    • pp.59-64
    • /
    • 2016
  • This paper presents a high dividing ratio unequal dualband divider using coplanar waveguide (CPW) and shunt connected open stub transmission lines. In order to implement transmission lines for a dualband divider using design equations, the low impedance lines of divider can be realized a shunt connected open stub transmission line. Also, the high impedance lines are realized by CPW transmission lines. To certify the validity of an unequal dualband divider using CPW and shunt connected open stub transmission lines, the 10:1 unequal dualband divider is implemented at operating frequency of 1 and 2 GHz. Good experimental performance at each frequency are obtained, which are in good agreement with the simulated results.

Parallel Computation of FDTD algorithm using CUDA (CUDA를 이용한 FDTD 알고리즘의 병렬처리)

  • Lee, Ho-Young;Park, Jong-Hyun;Kim, Jun-Seong
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.47 no.4
    • /
    • pp.82-87
    • /
    • 2010
  • Modern GPUs(Graphic Processing Units) provide computing capability higher than that of the general CPUs(Central Processor Units). With supports of programmability of graphics pipeline GP-GPU(General Purpose computation on GPU) has gained much attention expanding its application area. This paper compares sequential and massively parallel implementations of FDTD(Finite Difference Time Domain) algorithm using CUDA(Compute Unified Device Architecture). Experimental results show upto 45X speedup over conventional CPU execution.

A Study on the Performance of Multicode CDMA Scheme for Wireless LAN Modem System (무선 LAN 모뎀시스템을 위한 다중부호 CDMA 방식의 성능에 관한 연구)

  • 김관옥;박화세
    • Journal of the Institute of Electronics Engineers of Korea TE
    • /
    • v.37 no.5
    • /
    • pp.85-92
    • /
    • 2000
  • In this paepr, a multicode CDMA scheme with serial and parallel structure as the transmission scheme of wireless LAN which can transmit high speed data under an indoor channel environment is modeled and optimal values of several parameters needed for implementing wireless LAN modern system are derived through computer simulation. it is verified that given the transmission bandwidth and maximum data rate, the system performance is improved if increasing spreading gain anf the number of channels or decreasing the data rate of each channel. Especially the parallel structure makes not only the system performance much more improved but also the hardware implementation easier than serial structure under the same condition because the effective chip rate is decreased.

  • PDF

A Virtual Microscope System for Educational Applications (교육 분야 응용을 위한 가상 현미경 시스템)

  • Cho, Seung-Ho;Beynon, Mike;Saltz, Joel
    • The KIPS Transactions:PartD
    • /
    • v.10D no.1
    • /
    • pp.117-124
    • /
    • 2003
  • The system implemented in this paper partitions and stores specimen data captured by a light microscope on distributed or parallel systems. Users ran observe images on computers as we use a physical microscope. Based on the client-server computing model, the system consists of client, coordinator, and data manager. Three components communicate messages. For retrieving images, we implemented the client program with necessary functions for educational applications such at image mark and text annotation, and defined the communication protocol. We performed the experiment for introducing a tape storage which stores a large volume of data. The experiment results showed performance improvement by data partitioning and indexing technique.

Implementation of Parallel Processing Interpolation Algorithm for Multicore GPU (다중코어 GPU를 위한 병렬처리 보간 알고리즘 구현)

  • Lee, Kwang-Yeob;Kim, Chi-Yong
    • Journal of IKEEE
    • /
    • v.16 no.4
    • /
    • pp.304-309
    • /
    • 2012
  • As resolution for displays is recently more and more increasing, the amount of data abd calculation that graphic hardware needs to process are also increasing. Especially the amount of data processing by rasterizer is rapidly increasing. This paper used an algorism using coordinates in center of gravity and area for triangle instead of using bilinear algorism[1] used by conventional interpolation, which is to make it easier for parallel processing by rasterizer. This paper implemented designed rasterizer under FPGA environment, and compared it with conventional rasterizer and verified it. This rasterizer is proved to have approximately 50% higher performance compared to conventional one.

Implementation of IQ/IDCT in H.264/AVC Decoder Using Mobile Multi-Core GPGPU (모바일 멀티 코어 GP-GPU를 이용한 H.264/AVC 디코더 구현)

  • Kim, Dong-Han;Lee, Kwang-Yeob;Jeong, Jun-Mo
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2010.10a
    • /
    • pp.321-324
    • /
    • 2010
  • There have been lots of researches on a multi-core processor. The enhancement has been performed through parallelization method. Multi-core architecture in the mobile environment has emerged. But, there is a limit to a mobile CPU's performance. GP-GPU(General-Purpose computing on Graphics Processing Units) can improve performance without adding other dedicated hardware. This paper presents the implementation of Inverse Quantization, Inverse DCT and Color Space Conversion module in H.264/AVC decoder using Multi-Core GP-GPU for a mobile environments. The proposed architecture improves approximately 50% of performance when it use all the features.

  • PDF

Design and Implementation of Parallel MPEG Encoder with MPI on Cluster System (클러스터환경에서 MPI를 이용한 병렬 MPEG인코더의 설계 및 구현)

  • Lee, Joa-Hyoung;Jung, In-Bum
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.12 no.10
    • /
    • pp.1744-1750
    • /
    • 2008
  • As the computing and network technique move rm and spread widly, the usage of multimedia application becomes in general while the usage of text based application becomes low. Especially the application which treats the streaming media such as video or movie, one of multimedia data, holds a majority in the usage of computing. MPEG, one of the typical compression standard of streaming media, provides very high compression ratio so that general users could be close to the streaming media with easy usage. However, the encoding of MPEG requires lots of computing power and time. In the paper, we design and implement a parallel MPEG encoder with MPI in cluster envrionment to reduce the encoding time of MPEG.

Parallel Distributed Implementation of GHT on Ethernet Multicluster (이더넷 다중 클러스터에서 GHT의 병렬 분산 구현)

  • Kim, Yeong-Soo;Kim, Myung-Ho;Choi, Heung-Moon
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.46 no.3
    • /
    • pp.96-106
    • /
    • 2009
  • Extending the scale of the distributed processing in a single Ethernet cluster is physically restricted by maximum ports per switch. This paper presents an implementation of MPI-based multicluster consisting of multiple Ethernet switches for extending the scale of distributed processing, and a asymptotical analysis for communication overhead through execution-time analysis model. To determine an optimum task partitioning, we analyzed the processing time for various partitioning schemes, and AAP(accumulator array partitioning) scheme was finally chosen to minimize the overall communication overhead. The scope of data partitioned in AAP was modified to fit for incremented nodes, and suitable load balancing algorithm was implemented. We tried to alleviate the communication overhead through exploiting the pipelined broadcast and flat-tree based result gathering, and overlapping of the communication and the computation time. We used the linear pipeline broadcast to reduce the communication overhead in intercluster which is interconnected by a single link. Experimental results shows nearly linear speedup by the proposed parallel distributed GHT implemented on MPI-based Ethernet multicluster with four 100Mbps Ethernet switches and up to 128 nodes of Pentium PC.

The Parallel Processing of Permutation and Substitution for the High-Speed DES (DES의 고속화 실현을 위한 치환연산과 대치 연산의 병렬처리 방법)

  • 손기욱;박응기
    • Proceedings of the Korea Institutes of Information Security and Cryptology Conference
    • /
    • 1997.11a
    • /
    • pp.214-220
    • /
    • 1997
  • DES 암호 알고리즘은 정보의 기밀성 서비스와 무결성 서비스 실현을 위해 널리 사용되고 있다. DES를 하드웨어로 실현이 곤란한 분야에서는 소프트웨어로 구현하여 사용되고 있으나 처리 속도의 문제로 인해 사용하지 못하는 경우도 존재한다. 본 논문에서는 소프트웨어의 처리 속도 문제를 해결하기 위해 DES 암호 알고리즘의 치환 연산과 대치 연산을 병렬로 처리하는 방법을 제시하여 고속으로 정보를 실시간으로 보호하고자 하는 분야에 적용할 수 있도록 하였다.

  • PDF