• Title/Summary/Keyword: 병렬처리 알고리즘

Search Result 697, Processing Time 0.028 seconds

Implementation of computer-generated hologram using TCP network communication (TCP 네트워크 통신을 이용한 디지털 홀로그램 생성 시스템의 구현)

  • Kim, Changseob;Song, Joongseok;Park, Jong-Il
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2015.07a
    • /
    • pp.444-446
    • /
    • 2015
  • 컴퓨터 생성 홀로그램(CGH: computer generated hologram) 기법은 기존의 홀로그램의 광학적 장치의 단점을 보완하여 범용 컴퓨터에서 홀로그램을 생성할 수 있도록 하는 기술이다. CGH는 입력으로 주어지는 물체의 3차원 정보와 출력으로 나오는 디지털 홀로그램의 해상도에 따라 그 연산량이 결정 된다. CGH는 단순하고 반복적인 수학적 계산을 통하여 디지털 홀로그램을 생성하게 되는데, 기존의 연구들에서는 GPU(graphic processing unit)를 이용하여 알고리즘들을 병렬적으로 처리한다. 본 논문에서는 기존연구에서 쓰인 GPU를 이용한 CGH을 개선하여 GPU가 장착되지 않은 상용 컴퓨터에서 GPU가 장착된 다른 컴퓨터들의 연산 자원을 활용하여 CGH를 수행 할 수 있는 프로그램의 개발 방법을 제안 한다. 본 시스템은 GPU가 요구되지 않는 한 개의 서버 컴퓨터와 GPU가 장착된 다수의 클라이언트들로 구성되어 있다. 서버 측에서 물체의 3차원 정보를 입력 받아 각각의 클라이언트들에게 적절한 연산량을 분배하고, 각 클라이언트들은 이미 알려진 GPU 기반 CGH를 통하여 연산을 수행 한 뒤, 그 결과를 서버로 다시 전송하게 된다. 서버는 수신한 각 결과들을 누적하여 입력 받은 물체에 대한 하나의 온전한 홀로그램을 생성할 수 있게 된다.

  • PDF

Design of a Sentiment Analysis System to Prevent School Violence and Student's Suicide (학교폭력과 자살사고를 예방하기 위한 감성분석 시스템의 설계)

  • Kim, YoungTaek
    • The Journal of Korean Association of Computer Education
    • /
    • v.17 no.6
    • /
    • pp.115-122
    • /
    • 2014
  • One of the problems with current youth generations is increasing rate of violence and suicide in their school lives, and this study aims at the design of a sentiment analysis system to prevent suicide by uising big data process. The main issues of the design are economical implementation, easy and fast processing for the users, so, the open source Hadoop system with MapReduce algorithm is used on the HDFS(Hadoop Distributed File System) for the experimentation. This study uses word count method to do the sentiment analysis with informal data on some sns communications concerning a kinds of violent words, in terms of text mining to avoid some expensive and complex statistical analysis methods.

  • PDF

Implementation of 2,048-bit RSA Based on RNS(Residue Number Systems) (RNS(Residue Number Systems) 기반의 2,048 비트 RSA 설계)

  • 권택원;최준림
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.41 no.4
    • /
    • pp.57-66
    • /
    • 2004
  • This paper proposes the design of a 2,048-bit RSA based on RNS(residue number systems) Montgomery modular multiplier As the systems that RNS processes a fast parallel modular multiplication for a large word partitioned into small words, we introduce Montgomery reduction method(MRM)[1]based on Wallace tree modular multiplier and 33 RNS bases with 64-bit size for RNS Montgomery modular multiplication in this paper. Also, for fast RNS modular multiplication, a modified method based on Chinese remainder theorem(CRT)[2] is presented. We have verified 2,048-bit RSA based on RNS using Samsung 0.35${\mu}{\textrm}{m}$ technology and the 2,048-bit RSA is performed in 2.54㎳ at 100MHz.

Analysis of Large-Scale Network using a new Network Tearing Method (새로운 분할법에 의한 회로망해석)

  • 김준현;송현선
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.12 no.3
    • /
    • pp.267-275
    • /
    • 1987
  • This paper concerns a study on the theory of tearing which analyzes a large scale network by partitioning it into a number of small subnetworks by cutting through some of the existing nodes and branches in the network. By considering of the relationship its voltage and current of node cutting before and after, the consititutive equations of tearing method is equvalent to renumbering the nodes of untorn network equations. Therefore the analysis of network is conveniently applied as same algorithm that is used in untorn network. Also the proposed nodal admittnace matrix is put in block diagonal form, therefore this method permit parallel processing analysis of subnetworks. 30 nodes network was tested and the effectiveness of the proposed algorithm was proved.

  • PDF

The Implementation of Fast Object Recognition Using Parallel Processing on CPU and GPU (CPU와 GPU의 병렬 처리를 이용한 고속 물체 인식 알고리즘 구현)

  • Kim, Jun-Chul;Jung, Young-Han;Park, Eun-Soo;Cui, Xue-Nan;Kim, Hak-Il;Huh, Uk-Youl
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.15 no.5
    • /
    • pp.488-495
    • /
    • 2009
  • This paper presents a fast feature extraction method for autonomous mobile robots utilizing parallel processing and based on OpenMP, SSE (Streaming SIMD Extension) and CUDA programming. In the first step on CPU version, the algorithms and codes are optimized and then implemented by parallel processing. The parallel algorithms are debugged to maintain the same level of performance and the process for extracting key points and obtaining dominant orientation with respect to key points is parallelized. After extraction, a parallel descriptor via SSE instructions is constructed. And the GPU version also implemented by parallel processing using CUDA based on the SIFT. The GPU-Parallel descriptor achieves an acceleration up to five times compared with the CPU-Parallel descriptor, but it shows the lower performance than CPU version. CPU version also speed-up the four and half times compared with the original SIFT while maintaining robust performance.

Low-area Bit-parallel Systolic Array for Multiplication and Square over Finite Fields

  • Kim, Keewon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.2
    • /
    • pp.41-48
    • /
    • 2020
  • In this paper, we derive a common computational part in an algorithm that can simultaneously perform multiplication and square over finite fields, and propose a low-area bit-parallel systolic array that reduces hardware through sequential processing. The proposed systolic array has less space and area-time (AT) complexity than the existing related arrays. In detail, the proposed systolic array saves about 48% and 44% of Choi-Lee and Kim-Kim's systolic arrays in terms of area complexity, and about 74% and 44% in AT complexity. Therefore, the proposed systolic array is suitable for VLSI implementation and can be applied as a basic component in hardware constrained environment such as IoT.

A Study on High Speed Face Tracking using the GPGPU-based Depth Information (GPGPU 기반의 깊이 정보를 이용한 고속 얼굴 추적에 대한 연구)

  • Kim, Woo-Youl;Seo, Young-Ho;Kim, Dong-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.17 no.5
    • /
    • pp.1119-1128
    • /
    • 2013
  • In this paper, we propose an algorithm to detect and track the human face with a GPU-based high speed. Basically the detection algorithm uses the existing Adaboost algorithm but the search area is dramatically reduced by detecting movement and skin color region. Differently from detection process, tracking algorithm uses only depth information. Basically it uses a template matching method such that it searches a matched block to the template. Also, In order to fast track the face, it was computed in parallel using GPU about the template matching. Experimental results show that the GPU speed when compared with the CPU has been increased to up to 49 times.

Multi-Core Processor for Real-Time Sound Synthesis of Gayageum (가야금의 실시간 음 합성을 위한 멀티코어 프로세서 구현)

  • Choi, Ji-Won;Cho, Sang-Jin;Kim, Cheol-Hong;Kim, Jong-Myon;Chong, Ui-Pil
    • The KIPS Transactions:PartA
    • /
    • v.18A no.1
    • /
    • pp.1-10
    • /
    • 2011
  • Physical modeling has been widely used for sound synthesis since it synthesizes high quality sound which is similar to real-sound for musical instruments. However, physical modeling requires a lot of parameters to synthesize a large number of sounds simultaneously for the musical instrument, preventing its real-time processing. To solve this problem, this paper proposes a single instruction, multiple data (SIMD) based multi-core processor that supports real-time processing of sound synthesis of gayageum which is a representative Korean traditional musical instrument. The proposed SIMD-base multi-core processor consists of 12 processing elements (PE) to control 12 strings of gayageum in which each PE supports modeling of the corresponding string. The proposed SIMD-based multi-core processor can generate synthesized sounds of 12 strings simultaneously after receiving excitation signals and parameters of each string as an input. Experimental results using a sampling reate 44.1 kHz and 16 bits quantization show that synthesis sound using the proposed multi-core processor was very similar to the original sound. In addition, the proposed multi-core processor outperforms commercial processors(TI's TMS320C6416, ARM926EJ-S, ARM1020E) in terms of execution time ($5.6{\sim}11.4{\times}$ better) and energy efficiency (about $553{\sim}1,424{\times}$ better).

Implementation of High-radix Modular Exponentiator for RSA using CRT (CRT를 이용한 하이래딕스 RSA 모듈로 멱승 처리기의 구현)

  • 이석용;김성두;정용진
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.10 no.4
    • /
    • pp.81-93
    • /
    • 2000
  • In a methodological approach to improve the processing performance of modulo exponentiation which is the primary arithmetic in RSA crypto algorithm, we present a new RSA hardware architecture based on high-radix modulo multiplication and CRT(Chinese Remainder Theorem). By implementing the modulo multiplier using radix-16 arithmetic, we reduced the number of PE(Processing Element)s by quarter comparing to the binary arithmetic scheme. This leads to having the number of clock cycles and the delay of pipelining flip-flops be reduced by quarter respectively. Because the receiver knows p and q, factors of N, it is possible to apply the CRT to the decryption process. To use CRT, we made two s/2-bit multipliers operating in parallel at decryption, which accomplished 4 times faster performance than when not using the CRT. In encryption phase, the two s/2-bit multipliers can be connected to make a s-bit linear multiplier for the s-bit arithmetic operation. We limited the encryption exponent size up to 17-bit to maintain high speed, We implemented a linear array modulo multiplier by projecting horizontally the DG of Montgomery algorithm. The H/W proposed here performs encryption with 15Mbps bit-rate and decryption with 1.22Mbps, when estimated with reference to Samsung 0.5um CMOS Standard Cell Library, which is the fastest among the publications at present.

Prototype based Classification by Generating Multidimensional Spheres per Class Area (클래스 영역의 다차원 구 생성에 의한 프로토타입 기반 분류)

  • Shim, Seyong;Hwang, Doosung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.20 no.2
    • /
    • pp.21-28
    • /
    • 2015
  • In this paper, we propose a prototype-based classification learning by using the nearest-neighbor rule. The nearest-neighbor is applied to segment the class area of all the training data into spheres within which the data exist from the same class. Prototypes are the center of spheres and their radii are computed by the mid-point of the two distances to the farthest same class point and the nearest another class point. And we transform the prototype selection problem into a set covering problem in order to determine the smallest set of prototypes that include all the training data. The proposed prototype selection method is based on a greedy algorithm that is applicable to the training data per class. The complexity of the proposed method is not complicated and the possibility of its parallel implementation is high. The prototype-based classification learning takes up the set of prototypes and predicts the class of test data by the nearest neighbor rule. In experiments, the generalization performance of our prototype classifier is superior to those of the nearest neighbor, Bayes classifier, and another prototype classifier.