• Title/Summary/Keyword: quad-core

Search Result 22, Processing Time 0.022 seconds

A 3 V 12b 100 MS/s CMOS DAC for High-Speed Communication System Applications (고속통신 시스템 응용을 위한 3 V 12b 100 MS/s CMOS D/A 변환기)

  • 배현희;이명진;신은석;이승훈;김영록
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.40 no.9
    • /
    • pp.685-691
    • /
    • 2003
  • This work describes a 3 V 12b 100 MS/s CMOS digital-to-analog converter (DAC) for high-speed communication system applications. The proposed DAC is composed of a unit current-cell matrix for 8 MSBs and a binary-weighted array for 4 LSBs, considering linearity, power consumption, chip area, and glitch energy. The low-glitch switch driving circuit is employed to improve the linearity and the dynamic performance. Current sources of the DAC are laid out separately from the current-cell switch matrix core. The prototype DAC is implemented in a 0.35 urn n-well single-poly quad-metal CMOS technology. The measured DNL and INL of the prototype DAC are within $\pm$0.75 LSB and $\pm$1.73 LSB, respectively, and the spurious-free dynamic range (SFDR) is 64 dB at 100 MS/s with a 10 MHz input sinewave. The DAC dissipates 91 mW at 3 V and occupies the active die area of 2.2 mm ${\times}$ 2.0 mm.

The Study on a Semi-automated Mapping System (반자동 지도입력 시스템기술 개발 연구)

  • 윤재경;이기혁;우창헌;이경자;김수용
    • Spatial Information Research
    • /
    • v.3 no.1
    • /
    • pp.19-27
    • /
    • 1995
  • In this paper, a semi-automated mapping system, which can produ¬ce digital maps by using information acquired from pre-processing procedure, was introduced. To get a binary edge image, which is very important in vectori¬zation process, we applied adaptive smoothing and connection preserving thresho¬Iding algorithm. In mapper program, binary images are converted to vectors and for in-core data structure, extended PR quad tree was used. These procedures are dispatched to personal computers and workstations and through network resource sharing, the whole process was unified and simplified.

  • PDF

Performance Evaluation of an Embedded EtherCAT Master with SOEM on PREEMPT_RT Linux (PREEMPT_RT Linux에서 SOEM을 이용하는 임베디드 EtherCAT 마스터 성능 평가)

  • Kang, Sung Jin;Kim, Oe Cheol
    • Journal of the Semiconductor & Display Technology
    • /
    • v.21 no.3
    • /
    • pp.26-32
    • /
    • 2022
  • EtherCAT is an Ethernet-based fieldbus system standardized in IEC 61158 and SEMI, and widely used in the fields of factory automation, semiconductor equipment and robotics. In this paper, an EtherCAT master is implemented on an embedded board with Arm based 64-bit quad-core processor and its jitter performance is evaluated at the output of the network interface to include all the effects of the entire system in the results. For the EtherCAT master system, an open source EtherCAT master stack, Simple Open EtherCAT Master (SOEM), is installed on PREEMPT_RT patched Linux operating system for real-time operation. The results show that the jitter performance is comparable to that of Xenomai-based master and the EtherCAT master with two master instances has similar jitter performance to the EtherCAT master with one master instance.

Integrated Parallelization of Video Decoding on Multi-core Systems (멀티코어 시스템에서의 통합된 비디오 디코딩 병렬화)

  • Hong, Jung-Hyun;Kim, Won-Jin;Chung, Ki-Seok
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.49 no.7
    • /
    • pp.39-49
    • /
    • 2012
  • Demand for high resolution video services leads to active studies on high speed video processing. Especially, widespread deployment of multi-core systems accelerates researches on high resolution video processing based on parallelization of multimedia software. Previously proposed parallelization approach could improve the decoding performance. However, some parallelization methods did not consider the entropy decoding and others considered only a partial decoding parallelization. Therefore, we consider parallel entropy decoding integrated with other parallel video decoding process on a multi-core system. We propose a novel parallel decoding method called Integrated Parallelization. We propose a method on how to optimize the parallelization of video decoding when we have a multi-core system with many cores. We parallelized the KTA 2.7 decoder with the proposed technique on an Intel i7 Quad-Core platform with Intel Hyper-Threading technology and multi-threads scheduling. We achieved up to 70% performance improvement using IP method.

A linear array SliM-II image processor chip (선형 어레이 SliM-II 이미지 프로세서 칩)

  • 장현만;선우명훈
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.35C no.2
    • /
    • pp.29-35
    • /
    • 1998
  • This paper describes architectures and design of a SIMD type parallel image processing chip called SliM-II. The chiphas a linear array of 64 processing elements (PEs), operates at 30 MHz in the worst case simulation and gives at least 1.92 GIPS. In contrast to existing array processors, such as IMAP, MGAP-2, VIP, etc., each PE has a multiplier that is quite effective for convolution, template matching, etc. The instruction set can execute an ALU operation, data I/O, and inter-PE communication simulataneously in a single instruction cycle. In addition, during the ALU/multiplier operation, SliM-II provides parallel move between the register file and on-chip memory as in DSP chips, SliM-II can greatly reduce the inter-PE communication overhead, due to the idea a sliding, which is a technique of overlapping inter-PE communication with computation. Moreover, the bandwidth of data I/O and inter-PE communication increases due to bit-parallel data paths. We used the COMPASS$^{TM}$ 3.3 V 0.6.$\mu$m standrd cell library (v8r4.10). The total number of transistors is about 1.5 muillions, the core size is 13.2 * 13.0 mm$^{2}$ and the package type is 208 pin PQ2 (Power Quad 2). The performance evaluation shows that, compared to a existing array processors, a proposed architeture gives a significant improvement for algorithms requiring multiplications.s.

  • PDF

A 3 V 12b 100 MS/s CMOS D/A Converter for High-Speed Communication Systems

  • Kim, Min-Jung;Bae, Hyuen-Hee;Yoon, Jin-Sik;Lee, Seung-Hoon
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.3 no.4
    • /
    • pp.211-216
    • /
    • 2003
  • This work describes a 3 V 12b 100 MS/s CMOS digital-to-analog converter (DAC) for high-speed communication system applications. The proposed DAC is composed of a unit current-cell matrix for 8 MSBs and a binary-weighted array for 4 LSBs, trading-off linearity, power consumption, chip area, and glitch energy with this process. The low-glitch switch driving circuits are employed to improve linearity and dynamic performance. Current sources of the DAC are laid out separately from the current-cell switch matrix core block to reduce transient noise coupling. The prototype DAC is implemented in a 0.35 um n-well single-poly quad-metal CMOS technology and the measured DNL and INL are within ${\pm}0.75$ LSB and ${\pm}1.73$ LSB at 12b, respectively. The spurious-free dynamic range (SFDR) is 64 dB at 100 MS/s with a 10 MHz input sinewave. The DAC dissipates 91 mW at 3 V and occupies the active die area of $2.2{\;}mm{\;}{\times}{\;}2.0{\;}mm$

Two-Level Tries: A General Acceleration Structure for Parallel Routing Table Accesses

  • Mingche, Lai;Lei, Gao
    • Journal of Communications and Networks
    • /
    • v.13 no.4
    • /
    • pp.408-417
    • /
    • 2011
  • The stringent performance requirement for the high efficiency of routing protocols on the Internet can be satisfied by exploiting the threaded border gateway protocol (TBGP) on multi-cores, but the state-of-the-art TBGP performance is restricted by a mass of contentions when racing to access the routing table. To this end, the highly-efficient parallel access approach appears to be a promising solution to achieve ultra-high route processing speed. This study proposes a general routing table structure consisting of two-level tries for fast parallel access, and it presents a heuristic-based divide-and-recombine algorithm to solve a mass of contentions, thereby accelerating the parallel route updates of multi-threading and boosting the TBGP performance. As a projected TBGP, this study also modifies the table operations such as insert and lookup, and validates their correctness according to the behaviors of the traditional routing table. Our evaluations on a dual quad-core Xeon server show that the parallel access contentions decrease sharply by 92.5% versus the traditional routing table, and the maximal update time of a thread is reduced by 56.8 % on average with little overhead. The convergence time of update messages are improved by 49.7%.

AB9: A neural processor for inference acceleration

  • Cho, Yong Cheol Peter;Chung, Jaehoon;Yang, Jeongmin;Lyuh, Chun-Gi;Kim, HyunMi;Kim, Chan;Ham, Je-seok;Choi, Minseok;Shin, Kyoungseon;Han, Jinho;Kwon, Youngsu
    • ETRI Journal
    • /
    • v.42 no.4
    • /
    • pp.491-504
    • /
    • 2020
  • We present AB9, a neural processor for inference acceleration. AB9 consists of a systolic tensor core (STC) neural network accelerator designed to accelerate artificial intelligence applications by exploiting the data reuse and parallelism characteristics inherent in neural networks while providing fast access to large on-chip memory. Complementing the hardware is an intuitive and user-friendly development environment that includes a simulator and an implementation flow that provides a high degree of programmability with a short development time. Along with a 40-TFLOP STC that includes 32k arithmetic units and over 36 MB of on-chip SRAM, our baseline implementation of AB9 consists of a 1-GHz quad-core setup with other various industry-standard peripheral intellectual properties. The acceleration performance and power efficiency were evaluated using YOLOv2, and the results show that AB9 has superior performance and power efficiency to that of a general-purpose graphics processing unit implementation. AB9 has been taped out in the TSMC 28-nm process with a chip size of 17 × 23 ㎟. Delivery is expected later this year.

Multi-Threaded Parallel H.264/AVC Decoder for Multi-Core Systems (멀티코어 시스템을 위한 멀티스레드 H.264/AVC 병렬 디코더)

  • Kim, Won-Jin;Cho, Keol;Chung, Ki-Seok
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.47 no.11
    • /
    • pp.43-53
    • /
    • 2010
  • Wide deployment of high resolution video services leads to active studies on high speed video processing. Especially, prevalent employment of multi-core systems accelerates researches on high resolution video processing based on parallelization of multimedia software. In this paper, we propose a novel parallel H.264/AVC decoding scheme on a multi-core platform. Parallel H.264/AVC decoding is challenging not only because parallelization may incur significant synchronization overhead but also because software may have complicated dependencies. To overcome such issues, we propose a novel approach called Multi-Threaded Parallelization(MTP). In MTP, to reduce synchronization overhead, a separate thread is allocated to each stage in the pipeline. In addition, an efficient memory reuse technique is used to reduce the memory requirement. To verify the effectiveness of the proposed approach, we parallelized FFmpeg H.264/AVC decoder with the proposed technique using OpenMP, and carried out experiments on an Intel Quad-Core platform. The proposed design performs better than FFmpeg H.264/AVC decoder before the parallelization by 53%. We also reduced the amount of memory usage by 65% and 81% for a high-definition(HD) and a full high-definition(FHD) video, respectively compared with that of popular existing method called 2Dwave.

Task Parallelism System of Application for Multicore-Based Mobile Platform (멀티코어 기반 모바일 플랫폼을 위한 애플리케이션의 태스크 병렬화 시스템)

  • Lim, Geunsik;Lee, Seho;Eom, Young Ik
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.38C no.6
    • /
    • pp.521-530
    • /
    • 2013
  • This paper proposes a task parallelism system (BioMP) to improve applications' execution time of multicore based mobile device. When application developers append the functions of parallel specification into the existing software, our proposed system supports the parallel processing of threads as well as a compatibility. BioMP improves the software in order that an existing large-scale source can recognize the multicore architecture. From our experiment, our idea improved the execution time of application until about 64% against the existing system in multicore environment based on quad core. In addition, BioMP does not require any additional modification of a mobile platform because BioMP is independent component. Consequently, when application developers release multicore-aware applications into the application store, users can immediately run without any modification of the mobile device.