• 제목/요약/키워드: Parallel Processing Architecture

검색결과 394건 처리시간 0.026초

아날로그 PRML 디코더를 위한 아날로그 병렬처리 회로의 전향 차동 구조 (Feed forward Differential Architecture of Analog Parallel Processing Circuits for Analog PRML Decoder)

  • 마헤스워 샤퍄라;양창주;김형석
    • 전기학회논문지
    • /
    • 제59권8호
    • /
    • pp.1489-1496
    • /
    • 2010
  • A feed forward differential architecture of analog PRML decoder is investigated to implement on analog parallel processing circuits. The conventional PRML decoder performs the trellis processing with the implementation of single stage in digital and its repeated use. The analog parallel processing-based PRML comes from the idea that the decoding of PRML is done mainly with the information of the first several number of stages. Shortening the trellis processing stages but implementing it with analog parallel circuits, several benefits including higher speed, no memory requirement and no A/D converter requirement are obtained. Most of the conventional analog parallel processing-based PRML decoders are differential architecture with the feedback of the previous decoded data. The architecture used in this paper is without feedback, where error metric accumulation is allowed to start from all the states of the decoding stage, which enables to be decoded without feedback. The circuit of the proposed architecture is simpler than that of the conventional analog parallel processing structure with the similar decoding performance. Characteristics of the feed forward differential architecture are investigated through various simulation studies.

상용 응용을 위한 병렬처리 구조 설계 (Design of the new parallel processing architecture for commercial applications)

  • 한우종;윤석한;임기욱
    • 전자공학회논문지B
    • /
    • 제33B권5호
    • /
    • pp.41-51
    • /
    • 1996
  • In this paper, anew parallel processing system based on a cluster architecture which provides scalability of a parallel processing system while maintains shared memory multiprocessor characteristics is proposed. In recent days low cost, high performnce microprocessors have led to construction of large scale parallel processing systems. Such parallel processing systems provides large scalability but are mainly used for scientific applications which have large data parallelism. A shared memory multiprocessor system like TICOM is currently used as aserver for the commercial application, however, the shared memory multiprocessor system is known to have very limited scalability. The proposed architecture can support scalability and performance of the parallel processing system while it provides adaptability for the commerical application, hence it can overcome the limitation of the shared memory multiprocessor. The architecture and characteristics of the proposed system shall be described. A proprietary hierarchical crsossbar network is designed for this system, of which the protocol, routing and switching technique and the signal transfer technique are optimized for the proposed architecture. The design trade-offs for the network are described in this paper and with simulation usihng the SES/workbench, it is explored that the network fits to the proposed architecture.

  • PDF

ITU-T J.83 ANNEX B의 Parity Checksum Generator를 위한 병렬 처리 구조 (Parallel Processing Architecture for Parity Checksum Generator Complying with ITU-T J.83 ANNEX B)

  • 이종엽;홍언표;하동수;임회정
    • 한국통신학회논문지
    • /
    • 제34권6C호
    • /
    • pp.619-625
    • /
    • 2009
  • 이 논문은 ITU-T Recommendation J.83 Annex B에서 패킷 동기화와 에러 검출을 위해 사용된 패리티 체크섬 생성기의 병렬 구조를 제안한다. 제안된 병렬 처리 구조는 기존의 직렬 처리 구조에서 일어나는 병목현상을 제거하여 패리티 체크섬을 생성하는데 필요한 처리 시간을 상당히 줄여준다. 실험 결과는 제안된 병렬 처리 구조가 16%의 면적증가로 처리 속도를 83.1%나 줄일 수 있다는 것을 보여준다.

IoT/에지 컴퓨팅에서 저전력 메모리 아키텍처의 개선 연구 (A Study on Improvement of Low-power Memory Architecture in IoT/edge Computing)

  • 조두산
    • 한국산업융합학회 논문집
    • /
    • 제24권1호
    • /
    • pp.69-77
    • /
    • 2021
  • The widely used low-cost design methodology for IoT devices is very popular. In such a networked device, memory is composed of flash memory, SRAM, DRAM, etc., and because it processes a large amount of data, memory design is an important factor for system performance. Therefore, each device selects optimized design factors such as function, performance and cost according to market demand. The design of a memory architecture available for low-cost IoT devices is very limited with the configuration of SRAM, flash memory, and DRAM. In order to process as much data as possible in the same space, an architecture that supports parallel processing units is usually provided. Such parallel architecture is a design method that provides high performance at low cost. However, it needs precise software techniques for instruction and data mapping on the parallel architecture. This paper proposes an instruction/data mapping method to support optimized parallel processing performance. The proposed method optimizes system performance by actively using hardware and software parallelism.

개선된 수정 유클리드 알고리듬을 이용한 고속의 Reed-Solomon 복호기의 설계 (Implementation of High-Speed Reed-Solomon Decoder Using the Modified Euclid's Algorithm)

  • 김동선;최종찬;정덕진
    • 대한전기학회논문지:전력기술부문A
    • /
    • 제48권7호
    • /
    • pp.909-915
    • /
    • 1999
  • In this paper, we propose an efficient VLSI architecture of Reed-Solomon(RS) decoder. To improve the speed. we develope an architecture featuring parallel and pipelined processing. To implement the parallel and pipelined processing architecture, we analyze the RS decoding algorithm and the honor's algorithm for parallel processing and we also modified the Euclid's algorithm to apply the efficient parallel structure in RS decoder. To show the proposed architecture, the performance of the proposed RS decoder is compared to Shao's and we obtain the 10 % efficiency in area and three times faster in speed when it's compared to Shao's time domain decoder. In addition, we implemented the proposed RS decoder with Altera FPGA Flex10K-50.

  • PDF

트랜스퓨터를 사용한 피라미드형 병렬 어레이 컴퓨터 (TPPAC) 구조 (Transputer-based Pyramidal Parallel Array Computer(TPPAC) architecture (Prelimineary Version))

  • 정창성;정철환
    • 대한전기학회:학술대회논문집
    • /
    • 대한전기학회 1988년도 전기.전자공학 학술대회 논문집
    • /
    • pp.647-650
    • /
    • 1988
  • This paper proposes and sketches out a new parallel architecture of transputer-based pyramidal parallel array computer (TPPAC) used to process computationally intensive problems for geometric processing applications such as computer vision, image processing etc. It explores how efficiently the pyramid computer architecture is designed using transputer chips, and poses a new interconnection scheme for TPPAC without using additional transputers.

  • PDF

영상처리를 위한 Pipelined 병렬처리 시스템 (Pipelined Parallel Processing System for Image Processing)

  • 이형;김종배;최성혁;박종원
    • 전기전자학회논문지
    • /
    • 제4권2호
    • /
    • pp.212-224
    • /
    • 2000
  • 본 논문에서는 영상 응용프로그램의 처리 속도를 향상하기 위한 병렬처리 시스템을 제안한다. 병렬처리 시스템은 Pipelined SIMD 구조를 갖고 있으며, 다수개의 처리기와 다중접근 기억장치로 구성된다. 다중접근 기억장치는 메모리 모듈들과 메모리 제어부로 구성되며, 메모리 제어부는 메모리 모듈 선택 모듈, 데이터 라우팅 모듈, 그리고 주소 계산 및 라우팅 모듈로 구성되어 있으며, 블록, 행, 그리고 열 내의 데이터를 동시에 접근할 수 있는 기능을 제공한다. 제안한 병렬처리 시스템을 검증하기 위해서 형태학적 필터를 적용하여 기능 검증 및 처리속도를 확인하였다.

  • PDF

Accelerating Group Fusion for Ligand-Based Virtual Screening on Multi-core and Many-core Platforms

  • Mohd-Hilmi, Mohd-Norhadri;Al-Laila, Marwah Haitham;Hassain Malim, Nurul Hashimah Ahamed
    • Journal of Information Processing Systems
    • /
    • 제12권4호
    • /
    • pp.724-740
    • /
    • 2016
  • The performance issues of screening large database compounds and multiple query compounds in virtual screening highlight a common concern in Chemoinformatics applications. This study investigates these problems by choosing group fusion as a pilot model and presents efficient parallel solutions in parallel platforms, specifically, the multi-core architecture of CPU and many-core architecture of graphical processing unit (GPU). A study of sequential group fusion and a proposed design of parallel CUDA group fusion are presented in this paper. The design involves solving two important stages of group fusion, namely, similarity search and fusion (MAX rule), while addressing embarrassingly parallel and parallel reduction models. The sequential, optimized sequential and parallel OpenMP of group fusion were implemented and evaluated. The outcome of the analysis from these three different design approaches influenced the design of parallel CUDA version in order to optimize and achieve high computation intensity. The proposed parallel CUDA performed better than sequential and parallel OpenMP in terms of both execution time and speedup. The parallel CUDA was 5-10x faster than sequential and parallel OpenMP as both similarity search and fusion MAX stages had been CUDA-optimized.

David II: 효과적인 메모리 시스템을 가지는 병렬 렌더링 프로세서 (David II: A new architecture for parallel rendering processors with effective memory system)

  • 이길환;박우찬;김일산;한탁돈
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2004년도 춘계학술발표대회
    • /
    • pp.1655-1658
    • /
    • 2004
  • Current rendering processors are organized mainly to process a triangle as fast as possible and recently parallel 3D rendering processors, which can process multiple triangles in parallel with multiple rasterizers, begin to appear. For high performance in processing triangles, it is desirable for each rasterizer have its own local pixel cache. However, the consistency problem may occur in accessing the data at the same address simultaneously by more than one rasterizer. In this paper, we propose a parallel rendering processor architecture, called DAVID II, resolving such consistency problem effectively. Moreover, the proposed architecture reduces the latency due to a pixel cache miss significantly. The experimental results show that DAVID II achieves almost linear speedup at best case even in sixteen rasterizers.

  • PDF

An FPGA-based Parallel Hardware Architecture for Real-time Eye Detection

  • Kim, Dong-Kyun;Jung, Jun-Hee;Nguyen, Thuy Tuong;Kim, Dai-Jin;Kim, Mun-Sang;Kwon, Key-Ho;Jeon, Jae-Wook
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • 제12권2호
    • /
    • pp.150-161
    • /
    • 2012
  • Eye detection is widely used in applications, such as face recognition, driver behavior analysis, and human-computer interaction. However, it is difficult to achieve real-time performance with software-based eye detection in an embedded environment. In this paper, we propose a parallel hardware architecture for real-time eye detection. We use the AdaBoost algorithm with modified census transform(MCT) to detect eyes on a face image. We parallelize part of the algorithm to speed up processing. Several downscaled pyramid images of the eye candidate region are generated in parallel using the input face image. We can detect the left and the right eye simultaneously using these downscaled images. The sequential data processing bottleneck caused by repetitive operation is removed by employing a pipelined parallel architecture. The proposed architecture is designed using Verilog HDL and implemented on a Virtex-5 FPGA for prototyping and evaluation. The proposed system can detect eyes within 0.15 ms in a VGA image.