• Title/Summary/Keyword: parallel computer processing

Search Result 652, Processing Time 0.024 seconds

Design of an Efficient VLSI Architecture and Verification using FPGA-implementation for HMM(Hidden Markov Model)-based Robust and Real-time Lip Reading (HMM(Hidden Markov Model) 기반의 견고한 실시간 립리딩을 위한 효율적인 VLSI 구조 설계 및 FPGA 구현을 이용한 검증)

  • Lee Chi-Geun;Kim Myung-Hun;Lee Sang-Seol;Jung Sung-Tae
    • Journal of the Korea Society of Computer and Information
    • /
    • v.11 no.2 s.40
    • /
    • pp.159-167
    • /
    • 2006
  • Lipreading has been suggested as one of the methods to improve the performance of speech recognition in noisy environment. However, existing methods are developed and implemented only in software. This paper suggests a hardware design for real-time lipreading. For real-time processing and feasible implementation, we decompose the lipreading system into three parts; image acquisition module, feature vector extraction module, and recognition module. Image acquisition module capture input image by using CMOS image sensor. The feature vector extraction module extracts feature vector from the input image by using parallel block matching algorithm. The parallel block matching algorithm is coded and simulated for FPGA circuit. Recognition module uses HMM based recognition algorithm. The recognition algorithm is coded and simulated by using DSP chip. The simulation results show that a real-time lipreading system can be implemented in hardware.

  • PDF

Inference of Context-Free Grammars using Binary Third-order Recurrent Neural Networks with Genetic Algorithm (이진 삼차 재귀 신경망과 유전자 알고리즘을 이용한 문맥-자유 문법의 추론)

  • Jung, Soon-Ho
    • Journal of the Korea Society of Computer and Information
    • /
    • v.17 no.3
    • /
    • pp.11-25
    • /
    • 2012
  • We present the method to infer Context-Free Grammars by applying genetic algorithm to the Binary Third-order Recurrent Neural Networks(BTRNN). BTRNN is a multiple-layered architecture of recurrent neural networks, each of which is corresponding to an input symbol, and is combined with external stack. All parameters of BTRNN are represented as binary numbers and each state transition is performed with any stack operation simultaneously. We apply Genetic Algorithm to BTRNN chromosomes and obtain the optimal BTRNN inferring context-free grammar of positive and negative input patterns. This proposed method infers BTRNN, which includes the number of its states equal to or less than those of existing methods of Discrete Recurrent Neural Networks, with less examples and less learning trials. Also BTRNN is superior to the recent method of chromosomes representing grammars at recognition time complexity because of performing deterministic state transitions and stack operations at parsing process. If the number of non-terminals is p, the number of terminals q, the length of an input string k, and the max number of BTRNN states m, the parallel processing time is O(k) and the sequential processing time is O(km).

Disk Load Balancing Scheme for High Speed Playback of Continuous Media in VOD Server (VOD서버에서 연속 매체의 고속 재생을 위한 디스크 부하 균형 정책)

  • Lee, Seung-Yong;Lee, Ho-Seok;Hong, Seong-Su
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.5
    • /
    • pp.1172-1181
    • /
    • 1997
  • A militimedia data is a data mixed of formatted data like an audio and video. Multimedia data has characteristics that it need large amount of storage,wide network bandwith andreal time responsibolity. Because of these characteristocs, the VOD server and continous media storage server have a disk stripe structure or disk stripe sructure or disk array structure(RAID).In the parallel disk access system,high-speed play-back of continous media using segment interleavung may not ensure Qos pf other cioents because of the concentrated load within some disks. The load concentration of disks is related to both the number of disks in the system and playback rate of contimous media.In this paper. we describe that high-speed playback scheme,which is independent of the number of disk and plyback rate can be achieved by technique of changing the in-teval of access to segnent location.We show the experimental result of this technique in this pater.

  • PDF

Optimized implementation of block cipher PIPO in parallel-way on 64-bit ARM Processors (64-bit ARM 프로세서 상에서의 블록암호 PIPO 병렬 최적 구현)

  • Eum, Si-Woo;Kwon, Hyeok-Dong;Kim, Hyun-Jun;Jang, Kyung-Bae;Kim, Hyun-Ji;Park, Jae-Hoon;Sim, Min-Joo;Song, Gyeong-Ju;Seo, Hwa-Jeong
    • Annual Conference of KIPS
    • /
    • 2021.05a
    • /
    • pp.163-166
    • /
    • 2021
  • ICISC'20에서 발표된 경량 블록암호 PIPO는 비트 슬라이스 기법 적용으로 효율적인 구현이 되었으며, 부채널 내성을 지니기에 안전하지 않은 환경에서도 안정적으로 사용 가능한 경량 블록암호이다. 본 논문에서는 ARM 프로세서를 대상으로 PIPO의 병렬 최적 구현을 제안한다. 제안하는 구현물은 8평문, 16평문의 병렬 암호화가 가능하다. 구현에는 최적의 명령어 활용, 레지스터 내부 정렬, 로테이션 연산 최적화 기법을 사용하였다. 구현은 A10x fusion 프로세서를 대상으로 한다. 대상 프로세서상에서, 기존 레퍼런스 PIPO 코드는 64/128, 64/256 규격에서 각각 34.6 cpb, 44.7 cpb의 성능을 가지나, 제안하는 기법은 8평문 64/128, 64/256 규격에서 각각 12.0 cpb, 15.6 cpb, 16평문 64/128, 64/256 규격에서 각각 6.3 cpb, 8.1 cpb의 성능을 보여준다. 이는 기존 대비 각 규격별로 8평문 병렬 구현물은 약 65.3%, 66.4%, 16평문 병렬 구현물은 약 81.8%, 82.1% 더 좋은 성능을 보인다.

Overlapping Effects of Circular Shift Communication and Computation (원형 쉬프트 통신의 중첩 효과 분석)

  • Kim, Jung-Hwan;Rho, Jung-Kyu;Song, Ha-Yoon
    • The KIPS Transactions:PartA
    • /
    • v.9A no.2
    • /
    • pp.197-206
    • /
    • 2002
  • Many researchers have been interested in the optimization of parallel programs through the latency hiding by overlapping the communication with the computation. We ana1yzed overlapping effects in the circular shift communication which is one of the collective communications being frequently used In many data parallel programs. We measured the time which can be possibly overlapped and the time which cannot be overlapped in over all circular shift communication period on an Ethernet switch-based clustered system. The result from each platform nay be used for the input of optimizing compilers. The previous performance models usually have two kinds of drawbacks one is only based on point-to-point communication, so it is not appropriate for analyzing the overall effects of collective communications. The other provides the performance of collective communication, but no overlapping effect. In this paper we extended the previous models and analyzed the experimental results of the extended model.

A Study on the Parallel Escape Maze through Cooperative Activities of Humanoid Robots (인간형 로봇들의 협력 작업을 통한 미로 동시 탈출에 관한 연구)

  • Jun, Bong-Gi
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.18 no.6
    • /
    • pp.1441-1446
    • /
    • 2014
  • For the escape from a maze, the cooperative method by robot swarm was proposed in this paper. The robots can freely move by collecting essential data and making a decision in the use of sensors; however, a central control system is required to organize all robots for the escape from the maze. The robots explore new mazes and then send the information to the system for analyzing and mapping the escaping route. Three issues were considered as follows for the effective escape by multiple robots from the mazes in this paper. In the first, the mazes began to divide and secondly, dead-ends should be blocked. Finally, after the first arrivals at the destination, a shortcut should be provided for rapid escaping from the maze. The parallel-escape algorithms were applied to the different size of mazes, so that robot swarm can effectively get away the mazes.

Adaptive Pipeline Architecture for an Asynchronous Embedded Processor (비동기식 임베디드 프로세서를 위한 적응형 파이프라인 구조)

  • Lee, Seung-Sook;Lee, Je-Hoon;Lim, Young-Il;Cho, Kyoung-Rok
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.44 no.1
    • /
    • pp.51-58
    • /
    • 2007
  • This paper presented an adaptive pipeline architecture for a high-performance and low-power asynchronous processor. The proposed pipeline architecture employed a stage-skipping and a stage-combining scheme. The stage-skipping scheme can skip the operation of a bubble stage that is not used pipeline stage in an instruction execution. In the stage-combining scheme, two consecutive stages can be joined to form one stage if the latter stage is empty. The proposed pipeline architecture could reduce the processing time and power consumption. The proposed architecture supports multi-processing in the EX stage that executes parallel 4 instructions. We designed an asynchronous microprocessor to estimate the efficiency of the proposed pipeline architecture that was synthesized to a gate level design using a $0.35-{\mu}m$ CMOS standard cell library. We evaluated the performance of the target processor using SPEC2000 benchmark programs. The proposed architecture showed about 2.3 times higher speed than the asynchronous counterpart, AMULET3i. As a result, the proposed pipeline schemes and architecture can be used for asynchronous high-speed processor design

Improving Haskell GC-Tuning Time Using Divide-and-Conquer (분할 정복법을 이용한 Haskell GC 조정 시간 개선)

  • An, Hyungjun;Kim, Hwamok;Liu, Xiao;Kim, Yeoneo;Byun, Sugwoo;Woo, Gyun
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.6 no.9
    • /
    • pp.377-384
    • /
    • 2017
  • The performance improvement of a single core processor has reached its limit since the circuit density cannot be increased any longer due to overheating. Therefore, the multicore and manycore architectures have emerged as viable approaches and parallel programming becomes more important. Haskell, a purely functional language, is getting popular in this situation since it naturally supports parallel programming owing to its beneficial features including the implicit parallelism in evaluating expressions and the monadic tools supporting parallel constructs. However, the performance of Haskell parallel programs is strongly influenced by the performance of the run-time system including the garbage collector. Though a memory profiling tool namely GC-tune has been suggested, we need a more systematic way to use this tool. Since GC-tune finds the optimal memory size by executing the target program with all the different possible GC options, the GC-tuning time takes too long. This paper suggests a basic divide-and-conquer method to reduce the number of GC-tune executions by reducing the search area by one-quarter for every searching step. Applying this method to two parallel programs, a maximally independent set and a K-means programs, the memory tuning time is reduced by 7.78 times with accuracy 98% on average.

GPGPU Task Management Technique to Mitigate Performance Degradation of Virtual Machines due to GPU Operation in Cloud Environments (클라우드 환경에서 GPU 연산으로 인한 가상머신의 성능 저하를 완화하는 GPGPU 작업 관리 기법)

  • Kang, Jihun;Gil, Joon-Min
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.9 no.9
    • /
    • pp.189-196
    • /
    • 2020
  • Recently, GPU cloud computing technology applying GPU(Graphics Processing Unit) devices to virtual machines is widely used in the cloud environment. In a cloud environment, GPU devices assigned to virtual machines can perform operations faster than CPUs through massively parallel processing, which can provide many benefits when operating high-performance computing services in a variety of fields in a cloud environment. In a cloud environment, a GPU device can help improve the performance of a virtual machine, but the virtual machine scheduler, which is based on the CPU usage time of a virtual machine, does not take into account GPU device usage time, affecting the performance of other virtual machines. In this paper, we test and analyze the performance degradation of other virtual machines due to the virtual machine that performs GPGPU(General-Purpose computing on Graphics Processing Units) task in the direct path based GPU virtualization environment, which is often used when assigning GPUs to virtual machines in cloud environments. Then to solve this problem, we propose a GPGPU task management method for a virtual machine.

A Computer Architecture Education Framework in IT Convergence Services Era (IT융합 서비스 환경을 위한 컴퓨터 아키텍쳐 교육 프레임워크)

  • Choi, Chang Yeol;Choi, Hwang Kyu
    • Journal of Information Technology and Architecture
    • /
    • v.10 no.1
    • /
    • pp.23-31
    • /
    • 2013
  • A rapid growth of IT convergence into different application areas draws a lot of interest in high performance platform and embedded system. Industry needs well educated computer professionals with the practical understanding on the emerging technologies and core issues of contemporary popular services. In this paper, we present an education framework for computer system architecture based on rigorous analyses of the characteristics of IT convergence services and information technology trends. The proposed framework puts emphasis on real-world and hands-on subjects related to multicore architecture, embedded system and parallel processing. We believe effective use in the development and management of computer system architecture courses encouraging both industries and students.