• Title/Summary/Keyword: and Parallel Processing

Search Result 2,013, Processing Time 0.035 seconds

Parallelizing Feature Point Extraction in the Multi-Core Environment for Reducing Panorama Image Generation Time (파노라마 이미지 생성시간을 단축하기 위한 멀티코어 환경에서 특징점 추출 병렬화)

  • Kim, Geon-Ho;Choi, Tai-Ho;Chung, Hee-Jin;Kwon, Bom-Jun
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.14 no.3
    • /
    • pp.331-335
    • /
    • 2008
  • In this paper, we parallelized a feature point extraction algorithm to reduce panorama image generation time in multi-core environment. While we compose a panorama image with several images, the step to extract feature points of each picture is needed to find overlapped region of pictures. To perform rapidly feature extraction stage which requires much calculation, we developed a parallel algorithm to extract feature points and examined the performance using CBE(Cell Broadband Engine) which is asymmetric multi-core architecture. As a result of the exam, the algorithm we proposed has a property of linear scalability-the performance is increased in proportion the number of processors utilized. In this paper, we will suggest how Image processing operation can make high performance result in multi-core environment.

Speech Recognition Performance Improvement using a convergence of GMM Phoneme Unit Parameter and Vocabulary Clustering (GMM 음소 단위 파라미터와 어휘 클러스터링을 융합한 음성 인식 성능 향상)

  • Oh, SangYeob
    • Journal of Convergence for Information Technology
    • /
    • v.10 no.8
    • /
    • pp.35-39
    • /
    • 2020
  • DNN error is small compared to the conventional speech recognition system, DNN is difficult to parallel training, often the amount of calculations, and requires a large amount of data obtained. In this paper, we generate a phoneme unit to estimate the GMM parameters with each phoneme model parameters from the GMM to solve the problem efficiently. And it suggests ways to improve performance through clustering for a specific vocabulary to effectively apply them. To this end, using three types of word speech database was to have a DB build vocabulary model, the noise processing to extract feature with Warner filters were used in the speech recognition experiments. Results using the proposed method showed a 97.9% recognition rate in speech recognition. In this paper, additional studies are needed to improve the problems of improved over fitting.

A High-speed Pattern Matching Acceleration System for Network Intrusion Prevention Systems (네트워크 침입방지 시스템을 위한 고속 패턴 매칭 가속 시스템)

  • Kim Sunil
    • The KIPS Transactions:PartA
    • /
    • v.12A no.2 s.92
    • /
    • pp.87-94
    • /
    • 2005
  • Pattern matching is one of critical parts of Network Intrusion Prevention Systems (NIPS) and computationally intensive. To handle a large number of attack signature fattens increasing everyday, a network intrusion prevention system requires a multi pattern matching method that can meet the line speed of packet transfer. In this paper, we analyze Snort, a widely used open source network intrusion prevention/detection system, and its pattern matching characteristics. A multi pattern matching method for NIPS should efficiently handle a large number of patterns with a wide range of pattern lengths and case insensitive patterns matches. It should also be able to process multiple input characters in parallel. We propose a multi pattern matching hardware accelerator based on Shift-OR pattern matching algorithm. We evaluate the performance of the pattern matching accelerator under various assumptions. The performance evaluation shows that the pattern matching accelerator can be more than 80 times faster than the fastest software multi-pattern matching method used in Snort.

Design and Implementation of a Large-Scale Spatial Reasoner Using MapReduce Framework (맵리듀스 프레임워크를 이용한 대용량 공간 추론기의 설계 및 구현)

  • Nam, Sang Ha;Kim, In Cheol
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.3 no.10
    • /
    • pp.397-406
    • /
    • 2014
  • In order to answer the questions successfully on behalf of the human in DeepQA environments such as Jeopardy! of the American quiz show, the computer is required to have the capability of fast temporal and spatial reasoning on a large-scale commonsense knowledge base. In this paper, we present a scalable spatial reasoning algorithm for deriving efficiently new directional and topological relations using the MapReduce framework, one of well-known parallel distributed computing environments. The proposed reasoning algorithm assumes as input a large-scale spatial knowledge base including CSD-9 directional relations and RCC-8 topological relations. To infer new directional and topological relations from the given spatial knowledge base, it performs the cross-consistency checks as well as the path-consistency checks on the knowledge base. To maximize the parallelism of reasoning computations according to the principle of the MapReduce framework, we design the algorithm to partition effectively the large knowledge base into smaller ones and distribute them over multiple computing nodes at the map phase. And then, at the reduce phase, the algorithm infers the new knowledge from distributed spatial knowledge bases. Through experiments performed on the sample knowledge base with the MapReduce-based implementation of our algorithm, we proved the high performance of our large-scale spatial reasoner.

Implementation of RSA modular exponentiator using Division Chain (나눗셈 체인을 이용한 RSA 모듈로 멱승기의 구현)

  • 김성두;정용진
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.12 no.2
    • /
    • pp.21-34
    • /
    • 2002
  • In this paper we propos a new hardware architecture of modular exponentiation using a division chain method which has been proposed in (2). Modular exponentiation using the division chain is performed by receding an exponent E as a mixed form of multiplication and addition with divisors d=2 or $d=2^I +1$ and respective remainders r. This calculates the modular exponentiation in about $1.4log_2$E multiplications on average which is much less iterations than $2log_2$E of conventional Binary Method. We designed a linear systolic array multiplier with pipelining and used a horizontal projection on its data dependence graph. So, for k-bit key, two k-bit data frames can be inputted simultaneously and two modular multipliers, each consisting of k/2+3 PE(Processing Element)s, can operate in parallel to accomplish 100% throughput. We propose a new encoding scheme to represent divisors and remainders of the division chain to keep regularity of the data path. When it is synthesized to ASIC using Samsung 0.5 um CMOS standard cell library, the critical path delay is 4.24ns, and resulting performance is estimated to be abort 140 Kbps for a 1024-bit data frame at 200Mhz clock In decryption process, the speed can be enhanced to 560kbps by using CRT(Chinese Remainder Theorem). Futhermore, to satisfy real time requirements we can choose small public exponent E, such as 3,17 or $2^{16} +1$, in encryption and verification process. in which case the performance can reach 7.3Mbps.

A Scalable Montgomery Modular Multiplier (확장 가능형 몽고메리 모듈러 곱셈기)

  • Choi, Jun-Baek;Shin, Kyung-Wook
    • Journal of IKEEE
    • /
    • v.25 no.4
    • /
    • pp.625-633
    • /
    • 2021
  • This paper describes a scalable architecture for flexible hardware implementation of Montgomery modular multiplication. Our scalable modular multiplier architecture, which is based on a one-dimensional array of processing elements (PEs), performs word parallel operation and allows us to adjust computational performance and hardware complexity depending on the number of PEs used, NPE. Based on the proposed architecture, we designed a scalable Montgomery modular multiplier (sMM) core supporting eight field sizes defined in SEC2. Synthesized with 180-nm CMOS cell library, our sMM core was implemented with 38,317 gate equivalents (GEs) and 139,390 GEs for NPE=1 and NPE=8, respectively. When operating with a 100 MHz clock, it was evaluated that 256-bit modular multiplications of 0.57 million times/sec for NPE=1 and 3.5 million times/sec for NPE=8 can be computed. Our sMM core has the advantage of enabling an optimized implementation by determining the number of PEs to be used in consideration of computational performance and hardware resources required in application fields, and it can be used as an IP (intellectual property) in scalable hardware design of elliptic curve cryptography (ECC).

Design of a MapReduce-Based Mobility Pattern Mining System for Next Place Prediction (다음 장소 예측을 위한 맵리듀스 기반의 이동 패턴 마이닝 시스템 설계)

  • Kim, Jongwhan;Lee, Seokjun;Kim, Incheol
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.3 no.8
    • /
    • pp.321-328
    • /
    • 2014
  • In this paper, we present a MapReduce-based mobility pattern mining system which can predict efficiently the next place of mobile users. It learns the mobility pattern model of each user, represented by Hidden Markov Models(HMM), from a large-scale trajectory dataset, and then predicts the next place for the user to visit by applying the learned models to the current trajectory. Our system consists of two parts: the back-end part, in which the mobility pattern models are learned for individual users, and the front-end part, where the next place for a certain user to visit is predicted based on the mobility pattern models. While the back-end part comprises of three distinct MapReduce modules for POI extraction, trajectory transformation, and mobility pattern model learning, the front-end part has two different modules for candidate route generation and next place prediction. Map and reduce functions of each module in our system were designed to utilize the underlying Hadoop infrastructure enough to maximize the parallel processing. We performed experiments to evaluate the performance of the proposed system by using a large-scale open benchmark dataset, GeoLife, and then could make sure of high performance of our system as results of the experiments.

Efficient polynomial exponentiation in $GF(2^m)$with a trinomial using weakly dual basis ($GF(2^m)$에서 삼항 기약 다항식을 이용한 약한 쌍대 기저 기반의 효율적인 지수승기)

  • Kim, Hee-Seok;Chang, Nam-Su;Lim, Jong-In;Kim, Chang-Han
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.44 no.8
    • /
    • pp.30-37
    • /
    • 2007
  • An exponentiation in $GF(2^m)$ is a basic operation for several algorithms used in cryptography, digital signal processing, error-correction code and so on. Existing hardware implementations for the exponentiation operation organize by Right-to-Left method since a merit of parallel circuit. Our paper proposes a polynomial exponentiation structure with a trinomial that is organized by Left-to-Right method and that utilizes a weakly dual basis. The basic idea of our method is to decrease time delay using precomputation tables because one of two inputs in the Left-to-Right method is fixed. Since $T_{sqr}$ (squarer time delay) + $T_{mul}$(multiplier time delay) of ow method is smaller than $T_{mul}$ of existing methods, our method reduces time delays of existing Left-to-Right and Right-to-Left methods by each 17%, 10% for $x^m+x+1$ (irreducible polynomial), by each 21%, 9% $x^m+x^k+1(1, by each 15%, 1% for $x^m+x^{m/2}+1$.

Hardware Design of High Performance HEVC Deblocking Filter for UHD Videos (UHD 영상을 위한 고성능 HEVC 디블록킹 필터 설계)

  • Park, Jaeha;Ryoo, Kwangki
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.19 no.1
    • /
    • pp.178-184
    • /
    • 2015
  • This paper proposes a hardware architecture for high performance Deblocking filter(DBF) in High Efficiency Video Coding for UHD(Ultra High Definition) videos. This proposed hardware architecture which has less processing time has a 4-stage pipelined architecture with two filters and parallel boundary strength module. Also, the proposed filter can be used in low-voltage design by using clock gating architecture in 4-stage pipeline. The segmented memory architecture solves the hazard issue that arises when single port SRAM is accessed. The proposed order of filtering shortens the delay time that arises when storing data into the single port SRAM at the pre-processing stage. The DBF hardware proposed in this paper was designed with Verilog HDL, and was implemented with 22k logic gates as a result of synthesis using TSMC 0.18um CMOS standard cell library. Furthermore, the dynamic frequency can process UHD 8k($7680{\times}4320$) samples@60fps using a frequency of 150MHz with an 8K resolution and maximum dynamic frequency is 285MHz. Result from analysis shows that the proposed DBF hardware architecture operation cycle for one process coding unit has improved by 32% over the previous one.

Efficient Algorithms for Motion Parameter Estimation in Object-Oriented Analysis-Synthesis Coding (객체지향 분석-함성 부호화를 위한 효율적 움직임 파라미터 추정 알고리듬)

  • Lee Chang Bum;Park Rae-Hong
    • The KIPS Transactions:PartB
    • /
    • v.11B no.6
    • /
    • pp.653-660
    • /
    • 2004
  • Object-oriented analysis-synthesis coding (OOASC) subdivides each image of a sequence into a number of moving objects and estimates and compensates the motion of each object. It employs a motion parameter technique for estimating motion information of each object. The motion parameter technique employing gradient operators requires a high computational load. The main objective of this paper is to present efficient motion parameter estimation techniques using the hierarchical structure in object-oriented analysis-synthesis coding. In order to achieve this goal, this paper proposes two algorithms : hybrid motion parameter estimation method (HMPEM) and adaptive motion parameter estimation method (AMPEM) using the hierarchical structure. HMPEM uses the proposed hierarchical structure, in which six or eight motion parameters are estimated by a parameter verification process in a low-resolution image, whose size is equal to one fourth of that of an original image. AMPEM uses the same hierarchical structure with the motion detection criterion that measures the amount of motion based on the temporal co-occurrence matrices for adaptive estimation of the motion parameters. This method is fast and easily implemented using parallel processing techniques. Theoretical analysis and computer simulation show that the peak signal to noise ratio (PSNR) of the image reconstructed by the proposed method lies between those of images reconstructed by the conventional 6- and 8-parameter estimation methods with a greatly reduced computational load by a factor of about four.