• Title/Summary/Keyword: 병렬프로세서

Search Result 579, Processing Time 0.031 seconds

Design and Verification of Pipelined Face Detection Hardware (파이프라인 구조의 얼굴 검출 하드웨어 설계 및 검증)

  • Kim, Shin-Ho;Jeong, Yong-Jin
    • Journal of Korea Multimedia Society
    • /
    • v.15 no.10
    • /
    • pp.1247-1256
    • /
    • 2012
  • There are many filter based image processing algorithms and they usually require a huge amount of computations and memory accesses making it hard to attain a real-time performance, expecially in embedded applications. In this paper, we propose a pipelined hardware structure of the filter based face detection algorithm to show that the real time performance can be achieved by hardware design. In our design, the whole computation is divided into three pipeline stages: resizing the image (Resize), Transforming the image (ICT), and finding candidate area (Find Candidate). Each stage is optimized by considering the parallelism of the computation to reduce the number of cycles and utilizing the line memory to minimize the memory accesses. The resulting hardware uses 507 KB internal SRAM and occupies 9,039 LUTs when synthesized and configured on Xilinx Virtex5LX330 FPGA. It can operate at maximum 165MHz clock, giving the performance of 108 frame/sec, while detecting up to 20 faces.

Reconfigurable Architecture Design for H.264 Motion Estimation and 3D Graphics Rendering of Mobile Applications (이동통신 단말기를 위한 재구성 가능한 구조의 H.264 인코더의 움직임 추정기와 3차원 그래픽 렌더링 가속기 설계)

  • Park, Jung-Ae;Yoon, Mi-Sun;Shin, Hyun-Chul
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.34 no.1
    • /
    • pp.10-18
    • /
    • 2007
  • Mobile communication devices such as PDAs, cellular phones, etc., need to perform several kinds of computation-intensive functions including H.264 encoding/decoding and 3D graphics processing. In this paper, new reconfigurable architecture is described, which can perform either motion estimation for H.264 or rendering for 3D graphics. The proposed motion estimation techniques use new efficient SAD computation ordering, DAU, and FDVS algorithms. The new approach can reduce the computation by 70% on the average than that of JM 8.2, without affecting the quality. In 3D rendering, midline traversal algorithm is used for parallel processing to increase throughput. Memories are partitioned into 8 blocks so that 2.4Mbits (47%) of memory is shared and selective power shutdown is possible during motion estimation and 3D graphics rendering. Processing elements are also shared to further reduce the chip area by 7%.

DSP Implementation and Open Sea Test of Underwater Image Transmission System Using QPSK Scheme (QPSK 방식을 이용한 수중영상 정보전송 시스템의 DSP구현 및 실해역 실험 연구)

  • 박종원;고학림;이덕환;최영철;김시문;김승근;임용곤
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.2
    • /
    • pp.117-124
    • /
    • 2004
  • In this paper, we have been implemented the QPSK-based underwater transmission systems using DSP in order to transmit the underwater image data. We have adopted a BDPA (Block Data Parallel Architecture) to control multiple DSPs used in the transmitter and receiver in order to transmit the image data in real-time. We also have developed GUI software in order to drive and to debug the implemanted system in real-time. We have executed open sea tests in order to analyze the performance of the implemented system at East Sea near Kosung in Kangwon-Do. As a result of these experiments, it has been demonstrated that 10 kbps image data can be received without errors at 30m and 80m depth points, while the distance between the transmitter and the receiver is up to 20m.

A Self-healing Space-Division Switch for a 2-Fiber Bidirectional Line Switched Ring (2-선 양방향 선로 스위칭 링을 위한 자기치유 공간분할 스위치 소자)

  • 이상훈;김성진
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.26 no.12C
    • /
    • pp.240-248
    • /
    • 2001
  • This paper describes the design of a space-division switch which can support a self-healing operation of 2-fiber bidirectional line switched ring in 2.5Gb/s SDH-based transmission system. The switch having a 1.25Gb/s throughput has been designed and implemented with COMPASS tool and 0.87$\mu\textrm{m}$ CMOS gate-array. The proposed switch is suitable for the quickly self-healing operations when a failure occurs in a 2-fiber bidirectional switched ring composed of ADM transmission systems. The switch is composed of an add/drop control part, a cross-point switch, a frame-phase aligner, processor interface and an unequipped data framer. The test results of the switch adapted to 2.SGb/s SDH-based transmission system, show immediate restoration when a failure occurs.

  • PDF

Accurate Prediction of Polymorphic Indirect Branch Target (간접 분기의 타형태 타겟 주소의 정확한 예측)

  • 백경호;김은성
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.41 no.6
    • /
    • pp.1-11
    • /
    • 2004
  • Modern processors achieve high performance exploiting avaliable Instruction Level Parallelism(ILP) by using speculative technique such as branch prediction. Traditionally, branch direction can be predicted at very high accuracy by 2-level predictor, and branch target address is predicted by Branch Target Buffer(BTB). Except for indirect branch, each of the branch has the unique target, so its prediction is very accurate via BTB. But because indirect branch has dynamically polymorphic target, indirect branch target prediction is very difficult. In general, the technique of branch direction prediction is applied to indirect branch target prediction, and much better accuracy than traditional BTB is obtained for indirect branch. We present a new indirect branch target prediction scheme which combines a indirect branch instruction with its data dependent register of the instruction executed earlier than the branch. The result of SPEC benchmark simulation which are obtained on SimpleScalar simulator shows that the proposed predictor obtains the most perfect prediction accuracy than any other existing scheme.

The Implementation of Fault Tolerance Service for QoS in Grid Computing (그리드 컴퓨팅에서 서비스 품질을 위한 결함 포용 서비스의 구현)

  • Lee, Hwa- Min
    • The Journal of Korean Association of Computer Education
    • /
    • v.11 no.3
    • /
    • pp.81-89
    • /
    • 2008
  • The failure occurrence of resources in the grid computing is higher than in a tradition parallel computing. Since the failure of resources affects job execution fatally, fault tolerance service is essential in computational grids. And grid services are often expected to meet some minimum levels of quality of service (QoS) for desirable operation. However Globus toolkit does not provide fault tolerance service that supports fault detection service and management service and satisfies QoS requirement. Thus this paper proposes fault tolerance service to satisfy QoS requirement in computational grids. In order to provide fault tolerance service and satisfy QoS requirements, we expand the definition of failure, such as process failure, processor failure, and network failure. And we propose resource scheduling service, fault detection service and fault management service and show implement and experiment results.

  • PDF

Separated Address/Data Network Design for Bus Protocol compatible Network-on-Chip (버스 프로토콜 호환 가능한 네트워크-온-칩에서의 분리된 주소/데이터 네트워크 설계)

  • Chung, Seungh Ah;Lee, Jae Hoon;Kim, Sang Heon;Lee, Jae Sung;Han, Tae Hee
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.53 no.4
    • /
    • pp.68-75
    • /
    • 2016
  • As the number of cores and IPs increase in multiprocessor system-on-chip (MPSoC), network-on-chip (NoC) has emerged as a promising novel interconnection architecture for its parallelism and scalability. However, minimization of the latency in NoC with legacy bus IPs must be addressed. In this paper, we focus on the latency minimization problem in NoC which accommodates legacy bus protocol based IPs considering the trade-offs between hop counts and path collisions. To resolve this problem, we propose separated address/data network for independent address and data phases of bus protocol. Compared to Mesh and irregular topologies generated by TopGen, experimental results show that average latency and execution time are reduced by 19.46% and 10.55%, respectively.

Enhancing the performance of taxi application based on in-memory data grid technology (In-memory data grid 기술을 활용한 택시 애플리케이션 성능 향상 기법 연구)

  • Choi, Chi-Hwan;Kim, Jin-Hyuk;Park, Min-Kyu;Kwon, Kaaen;Jung, Seung-Hyun;Nazareno, Franco;Cho, Wan-Sup
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.5
    • /
    • pp.1035-1045
    • /
    • 2015
  • Recent studies in Big Data Analysis are showing promising results, utilizing the main memory for rapid data processing. In-memory computing technology can be highly advantageous when used with high-performing servers having tens of gigabytes of RAM with multi-core processors. The constraint in network in these infrastructure can be lessen by combining in-memory technology with distributed parallel processing. This paper discusses the research in the aforementioned concept applying to a test taxi hailing application without disregard to its underlying RDBMS structure. The application of IMDG technology in the application's backend API without restructuring the database schema yields 6 to 9 times increase in performance in data processing and throughput. Specifically, the change in throughput is very small even with increase in data load processing.

DSSS MODEM Design and Implementation for a Medium Speed Wireless Link (대중저속 무선 통신을 위한 DSSS 모뎀 설계 및 구현)

  • Won Hee-Seok;Kim Young-Sik
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • v.43 no.1 s.343
    • /
    • pp.121-126
    • /
    • 2006
  • This paper report on the design and implementation of a 9.6kbps DSSS CDMA modem for a medium speed wireless link. The proposed modem provides a general purpose I/O interface with a microprocessor. The I/O interface consists of 8-bit data bus, chip enable, read/write, and interrupt pins. In transmit block, the 8-bit data delivered from the I/O interface buffer is converted to 9.6kbps serial data, which are spreaded into 76.8kcps with 8-bit PN code generated inside the modem by direct sequence method. An 8-bit training sequence is preceded in the data frame for data synchronization in receiver. In receiver block the PN code is synchronized from the received data spreaded to 76.8kcps and find the data timing from the 8-bit training sequence. We have used the Early-and-Late integration method. The modem has been implemented and verified using a Xilix FPGA board and has been fabricated as an ASIC CHIP through Hynir $0.25{\mu}m$ CMOS. The multiple accessing method is DSSS CDMA.

The Design of Hardware MPI Units for MPSoC (MPSoC를 위한 저비용 하드웨어 MPI 유닛 설계)

  • Jeong, Ha-Young;Chung, Won-Young;Lee, Yong-Surk
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.36 no.1B
    • /
    • pp.86-92
    • /
    • 2011
  • In this paper, we propose a novel hardware MPI(Message Passing Interface) unit which supports message passing in multiprocessor system which use distributed memory architecture. MPI Hardware unit processes data synchronization, transmission and completion, and it supports processor non-blocking operation so it reduces overhead according to synchronization. Additionally, MPI hardware unit combines ready entry, request entry, reserve entry which save and manage the synchronized messages and performs the multiple outstanding issue and out of order completion. According to BFM(Bus Functional Model) simulation result, the performance is increased by 25% on many to many communication. After we designed MPI unit using HDL, with synopsys design compiler we synthesized, and for synthesis library we used MagnaChip $0.18{\mu}m$. And then we making prototype chip. The proposed message transmission interface hardware shows high performance for its increase in size. Thus, as we consider low-cost design and scalability, MPI hardware unit is useful in increasing overall performance of embedded MPSoC(Multi-Processor System-on-Chip).