• Title/Summary/Keyword: 병렬처리 알고리즘

Search Result 697, Processing Time 0.029 seconds

A Predicate-Sensitive Scheduling Algorithm in Instruction-Level Parallelism Processors (ILP 프로세서를 위한 조건실행 지원 스케쥴링 알고리즘)

  • Yoo, Byung-Kang;Lee, Sang-Jeong
    • The Transactions of the Korea Information Processing Society
    • /
    • v.5 no.1
    • /
    • pp.202-214
    • /
    • 1998
  • Exploitation of instruction-level parallelism(ILP) is an effective mechanism for improving the performance of modern super-scalar and VLIW processors. Various software techniques can be applied to increase ILP. Among these techniques, predicated execution is the one that increases the degree of ILP by allowing instructions from different basic blocks to be converted to a single basic block by removing branch instructions. In this paper, a global predicate-sensitive scheduling algorithm is proposed to improve the performance for ILP processors that support predicated execution. In order to examine the performance of proposed algorithm, a C compiler and a simulator are developed. By simulating various benchmark programs with the compiler and the simulator, the performance results of this algorithm are measured and the effectiveness of the algorithm is verified. As a result of measure performance with I, 2, 4 issue execution, this study was confirmed average performance by 20% or more.

  • PDF

Performance Enhancement of a DVA-tree by the Independent Vector Approximation (독립적인 벡터 근사에 의한 분산 벡터 근사 트리의 성능 강화)

  • Choi, Hyun-Hwa;Lee, Kyu-Chul
    • The KIPS Transactions:PartD
    • /
    • v.19D no.2
    • /
    • pp.151-160
    • /
    • 2012
  • Most of the distributed high-dimensional indexing structures provide a reasonable search performance especially when the dataset is uniformly distributed. However, in case when the dataset is clustered or skewed, the search performances gradually degrade as compared with the uniformly distributed dataset. We propose a method of improving the k-nearest neighbor search performance for the distributed vector approximation-tree based on the strongly clustered or skewed dataset. The basic idea is to compute volumes of the leaf nodes on the top-tree of a distributed vector approximation-tree and to assign different number of bits to them in order to assure an identification performance of vector approximation. In other words, it can be done by assigning more bits to the high-density clusters. We conducted experiments to compare the search performance with the distributed hybrid spill-tree and distributed vector approximation-tree by using the synthetic and real data sets. The experimental results show that our proposed scheme provides consistent results with significant performance improvements of the distributed vector approximation-tree for strongly clustered or skewed datasets.

SHA-1 Pipeline Configuration According to the Maximum Critical Path Delay (최대 임계 지연 크기에 따른 SHA-1 파이프라인 구성)

  • Lee, Je-Hoon;Choi, Gyu-Man
    • Convergence Security Journal
    • /
    • v.16 no.7
    • /
    • pp.113-120
    • /
    • 2016
  • This paper presents a new high-speed SHA-1 pipeline architecture having a computation delay close to the maximum critical path delay of the original SHA-1. The typical SHA-1 pipelines are based on either a hash operation or unfolded hash operations. Their throughputs are greatly enhanced by the parallel processing in the pipeline, but the maximum critical path delay will be increased in comparison with the unfolding of all hash operations in each round. The pipeline stage logics in the proposed SHA-1 has the latency is similar with the result of dividing the maximum threshold delay of a round by the number of iterations. Experimental results show that the proposed SHA-1 pipeline structure is 0.99 and 1.62 at the operating speed ratio according to circuit size, which is superior to the conventional structure. The proposed pipeline architecture is expected to be applicable to various cryptographic and signal processing circuits with iterative operations.

A Efficient Architecture of MBA-based Parallel MAC for High-Speed Digital Signal Processing (고속 디지털 신호처리를 위한 MBA기반 병렬 MAC의 효율적인 구조)

  • 서영호;김동욱
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.41 no.7
    • /
    • pp.53-61
    • /
    • 2004
  • In this paper, we proposed a new architecture of MAC(Multiplier-Accumulator) to operate high-speed multiplication-accumulation. We used the MBA(Modified radix-4 Booth Algorithm) which is based on the 1's complement number system, and CSA(Carry Save Adder) for addition of the partial products. During the addition of the partial product, the signed numbers with the 1's complement type after Booth encoding are converted in the 2's complement signed number in the CSA tree. Since 2-bit CLA(Carry Look-ahead Adder) was used in adding the lower bits of the partial product, the input bit width of the final adder and whole delay of the critical path were reduced. The proposed MAC was applied into the DWT(Discrete Wavelet Transform) filtering operation for JPEG2000, and it showed the possibility for the practical application. Finally we identified the improved performance according to the comparison with the previous architecture in the aspect of hardware resource and delay.

Generating a Korean Sentiment Lexicon Through Sentiment Score Propagation (감정점수의 전파를 통한 한국어 감정사전 생성)

  • Park, Ho-Min;Kim, Chang-Hyun;Kim, Jae-Hoon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.9 no.2
    • /
    • pp.53-60
    • /
    • 2020
  • Sentiment analysis is the automated process of understanding attitudes and opinions about a given topic from written or spoken text. One of the sentiment analysis approaches is a dictionary-based approach, in which a sentiment dictionary plays an much important role. In this paper, we propose a method to automatically generate Korean sentiment lexicon from the well-known English sentiment lexicon called VADER (Valence Aware Dictionary and sEntiment Reasoner). The proposed method consists of three steps. The first step is to build a Korean-English bilingual lexicon using a Korean-English parallel corpus. The bilingual lexicon is a set of pairs between VADER sentiment words and Korean morphemes as candidates of Korean sentiment words. The second step is to construct a bilingual words graph using the bilingual lexicon. The third step is to run the label propagation algorithm throughout the bilingual graph. Finally a new Korean sentiment lexicon is generated by repeatedly applying the propagation algorithm until the values of all vertices converge. Empirically, the dictionary-based sentiment classifier using the Korean sentiment lexicon outperforms machine learning-based approaches on the KMU sentiment corpus and the Naver sentiment corpus. In the future, we will apply the proposed approach to generate multilingual sentiment lexica.

Development of a SAD Correlater for Real-time Stereo Vision (실시간 스테레오 비젼 시스템을 위한 SAD 정합연산기 설계)

  • Yi, Jong-Su;Yang, Seung-Gu;Kim, Jun-Seong
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.45 no.1
    • /
    • pp.55-61
    • /
    • 2008
  • A real-time three-dimensional vision is a passive system, which would support various applications including collision avoidance, home networks. It is a good alternative of active systems, which are subject to interference in noisy environments. In this paper, we designed a SAD correlator with respect to resource usage for a real-time three-dimensional vision system. Regular structures, linear data flow and abundant parallelism make the correlation algorithm a good candidate for a reconfigurable hardware. We implemented two versions of SAD correlator in HDL and synthesized them to determine resource requirements and performance. From the experiment we show that the SAD correlator fits into reconfigurable hardware in marginal cost and can handle about 30 frames/sec with $640{\times}480$ images.

A Study on Adaptive Parallel Computability in Many-Task Computing on Hadoop Framework (하둡 기반 대규모 작업처리 프레임워크에서의 Adaptive Parallel Computability 기술 연구)

  • Jik-Soo, Kim
    • Journal of Broadcast Engineering
    • /
    • v.24 no.6
    • /
    • pp.1122-1133
    • /
    • 2019
  • We have designed and implemented a new data processing framework called MOHA(Mtc On HAdoop) which can effectively support Many-Task Computing(MTC) applications in a YARN-based Hadoop platform. MTC applications can be composed of a very large number of computational tasks ranging from hundreds of thousands to millions of tasks, and each MTC application may have different resource usage patterns. Therefore, we have implemented MOHA-TaskExecutor(a pilot-job that executes real MTC application tasks)'s Adaptive Parallel Computability which can adaptively execute multiple tasks simultaneously, in order to improve the parallel computability of a YARN container and the overall system throughput. We have implemented multi-threaded version of TaskExecutor which can "independently and dynamically" adjust the number of concurrently running tasks, and in order to find the optimal number of concurrent tasks, we have employed Hill-Climbing algorithm.

A Development of JPEG-LS Platform for Mirco Display Environment in AR/VR Device. (AR/VR 마이크로 디스플레이 환경을 고려한 JPEG-LS 플랫폼 개발)

  • Park, Hyun-Moon;Jang, Young-Jong;Kim, Byung-Soo;Hwang, Tae-Ho
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.14 no.2
    • /
    • pp.417-424
    • /
    • 2019
  • This paper presents the design of a JPEG-LS codec for lossless image compression from AR/VR device. The proposed JPEG-LS(: LosSless) codec is mainly composed of a context modeling block, a context update block, a pixel prediction block, a prediction error coding block, a data packetizer block, and a memory block. All operations are organized in a fully pipelined architecture for real time image processing and the LOCO-I compression algorithm using improved 2D approach to compliant with the SBT coding. Compared with a similar study in JPEG-LS, the Block-RAM size of proposed STB-FLC architecture is reduced to 1/3 compact and the parallel design of the predication block could improved the processing speed.

Implementation of RSA modular exponentiator using Division Chain (나눗셈 체인을 이용한 RSA 모듈로 멱승기의 구현)

  • 김성두;정용진
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.12 no.2
    • /
    • pp.21-34
    • /
    • 2002
  • In this paper we propos a new hardware architecture of modular exponentiation using a division chain method which has been proposed in (2). Modular exponentiation using the division chain is performed by receding an exponent E as a mixed form of multiplication and addition with divisors d=2 or $d=2^I +1$ and respective remainders r. This calculates the modular exponentiation in about $1.4log_2$E multiplications on average which is much less iterations than $2log_2$E of conventional Binary Method. We designed a linear systolic array multiplier with pipelining and used a horizontal projection on its data dependence graph. So, for k-bit key, two k-bit data frames can be inputted simultaneously and two modular multipliers, each consisting of k/2+3 PE(Processing Element)s, can operate in parallel to accomplish 100% throughput. We propose a new encoding scheme to represent divisors and remainders of the division chain to keep regularity of the data path. When it is synthesized to ASIC using Samsung 0.5 um CMOS standard cell library, the critical path delay is 4.24ns, and resulting performance is estimated to be abort 140 Kbps for a 1024-bit data frame at 200Mhz clock In decryption process, the speed can be enhanced to 560kbps by using CRT(Chinese Remainder Theorem). Futhermore, to satisfy real time requirements we can choose small public exponent E, such as 3,17 or $2^{16} +1$, in encryption and verification process. in which case the performance can reach 7.3Mbps.

A Dynamical Load Balancing Method for Data Streaming and User Request in WebRTC Environment (WebRTC 환경에 데이터 스트리밍 및 사용자 요청에 따른 동적로드 밸런싱 방법)

  • Ma, Linh Van;Park, Sanghyun;Jang, Jong-hyun;Park, Jaehyung;Kim, Jinsul
    • Journal of Digital Contents Society
    • /
    • v.17 no.6
    • /
    • pp.581-592
    • /
    • 2016
  • WebRTC has quickly grown to be the world's advanced real-time communication in several platforms such as web and mobile. In spite of the advantage, the current technology in WebRTC does not handle a big-streaming efficiently between peers and a large amount request of users on the Signaling server. Therefore, in this paper, we put our work to handle the problem by delivering the flow of data with dynamical load balancing algorithms. We analyze the request source users and direct those streaming requests to a load balancing component. More specifically, the component determines an amount of the requested resource and available resource on the response server, then it delivers streaming data to the requesting user parallel or alternately. To show how the method works, we firstly demonstrate the load-balancing algorithm by using a network simulation tool OPNET, then, we seek to implement the method into an Ubuntu server. In addition, we compare the result of our work and the original implementation of WebRTC, it shows that the method performs efficiently and dynamically than the origin.