Search | Korea Science

Parallel Implementation and Performance Evaluation of the SIFT Algorithm Using a Many-Core Processor (매니코어 프로세서를 이용한 SIFT 알고리즘 병렬구현 및 성능분석)

Kim, Jae-Young;Son, Dong-Koo;Kim, Jong-Myon;Jun, Heesung
- Journal of the Korea Society of Computer and Information
- /
- v.18 no.9
- /
- pp.1-10
- /
- 2013
In this paper, we implement the SIFT(Scale-Invariant Feature Transform) algorithm for feature point extraction using a many-core processor, and analyze the performance, area efficiency, and system area efficiency of the many-core processor. In addition, we demonstrate the potential of the proposed many-core processor by comparing the performance of the many-core processor with that of high-performance CPU and GPU(Graphics Processing Unit). Experimental results indicate that the accuracy result of the SIFT algorithm using the many-core processor was same as that of OpenCV. In addition, the many-core processor outperforms CPU and GPU in terms of execution time. Moreover, this paper proposed an optimal model of the SIFT algorithm on the many-core processor by analyzing energy efficiency and area efficiency for different octave sizes.
https://doi.org/10.9708/jksci.2013.18.9.001 인용 PDF KSCI

Efficient pipelined FFT processor for the MIMO-OFDM systems (MIMO-OFDM 시스템을 위한 효율적인 파이프라인 FFT 프로세서의 설계)

Lee, Sang-Min;Jung, Yun-Ho;Kim, Jae-Seok
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.32 no.10C
- /
- pp.1025-1031
- /
- 2007
This paper proposes an area-efficient pipeline FFT processor for MIMO-OFDM systems with four transmitting and four receiving antennas. Since the MIMO-OFDM system transmits multiple data streams, the complexity for the MIMO-OFDM system with a single-channel FFT processor increases linearly with the increase of the number of transmit channels. The proposed FFT processor is based on multi-channel structure, and therefore it can efficiently support multiple data streams. With the mixed radix algorithm, the number of non-trivial multiplications of the proposed FFT processor is decreased. The proposed FFT processor is synthesized with CMOS $0.18{\mu}m$ process and reduces the logic gates by 25% over a 4-channel Radix-4 multi-path delay commutator (R4MDC) FFT processor. Since the MIMO-OFDM FFT processor is one of the largest modules in the systems, the proposed FFT processor will be a vast contribution improvement to the low complexity design of MIMO-OFDM systems.
PDF KSCI

Implementation of a 3D Graphics Hardwired T&L Accelerator based on a SoC Platform for a Mobile System (SoC 플랫폼 기반 모바일용 3차원 그래픽 Hardwired T&L Accelerator 구현)

Lee, Kwang-Yeob;Koo, Yong-Seo
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.44 no.9
- /
- pp.59-70
- /
- 2007
In this paper, we proposed an effective T&L(Transform & Lighting) Processor architecture for a real time 3D graphics acceleration SoC(System on a Chip) in a mobile system. We designed Floating point arithmetic IPs for a T&L processor. And we verified IPs using a SoC Platform. Designed T&L Processor consists of 24 bit floating point data format and 16 bit fixed point data format, and supports the pipeline keeping the balance between Transform process and Lighting process using a parallel computation of 3D graphics. The delay of pipeline processing only Transform operation is almost same as the delay processing both Transform operation and Lighting operation. Designed T&L Processor is implemented and verified using a SoC Platform. The T&L Processor operates at 80MHz frequency in Xilinx-Virtex4 FPGA. The processing speed is measured at the rate of 20M Vertexes/sec.
PDF KSCI

40-TFLOPS artificial intelligence processor with function-safe programmable many-cores for ISO26262 ASIL-D

Han, Jinho;Choi, Minseok;Kwon, Youngsu
- ETRI Journal
- /
- v.42 no.4
- /
- pp.468-479
- /
- 2020
The proposed AI processor architecture has high throughput for accelerating the neural network and reduces the external memory bandwidth required for processing the neural network. For achieving high throughput, the proposed super thread core (STC) includes 128 × 128 nano cores operating at the clock frequency of 1.2 GHz. The function-safe architecture is proposed for a fault-tolerance system such as an electronics system for autonomous cars. The general-purpose processor (GPP) core is integrated with STC for controlling the STC and processing the AI algorithm. It has a self-recovering cache and dynamic lockstep function. The function-safe design has proved the fault performance has ASIL D of ISO26262 standard fault tolerance levels. Therefore, the entire AI processor is fabricated via the 28-nm CMOS process as a prototype chip. Its peak computing performance is 40 TFLOPS at 1.2 GHz with the supply voltage of 1.1 V. The measured energy efficiency is 1.3 TOPS/W. A GPP for control with a function-safe design can have ISO26262 ASIL-D with the single-point fault-tolerance rate of 99.64%.
https://doi.org/10.4218/etrij.2020-0128 인용 PDF KSCI

A Performance Study of Embedded Multicore Processor Architectures (임베디드 멀티코어 프로세서의 성능 연구)

Lee, Jongbok
- The Journal of the Institute of Internet, Broadcasting and Communication
- /
- v.13 no.1
- /
- pp.163-169
- /
- 2013
Recently, the importance of embedded system is growing rapidly. In-order to satisfy the real-time constraints of the system, high performance embedded processor is required. Therefore, as in general purpose computer systems, embedded processor should be designed as multicore architecture as well. Using MiBench benchmarks as input, the trace-driven simulation has been performed and analyzed for the 2-core to 16-core embedded processor architectures with different types of cores from simple RISC to in-order and out-of-order superscalar processors, extensively. As a result, the achievable performance is as high as 23 times over the single core embedded RISC processor.
https://doi.org/10.7236/JIIBC.2013.13.1.163 인용 PDF KSCI

A VLSI DESIGN OF CD SIGNAL PROCESSOR for High-Speed CD-ROM

Kim, Jae-Won;Kim, Jae-Seok;Lee, Jaeshin
- Proceedings of the IEEK Conference
- /
- 2002.07b
- /
- pp.1296-1299
- /
- 2002
We implemented a CD signal processor operated on a CAV 48-speed CD-ROM drive into a VLSI. The CD signal processor is a mixed mode monolithic IC including servo-processor, data recovery, data-processor, and I-bit DAC. For servo signal processing, we included a DSP core, while, for CAV mode playback, we adopted a PLL with a wide recovery range. Data processor (DP) was designed to meet the yellow book specification.［2］So, the DP block consists of EFM demodulator, C1/C2 ECC block, audio processor and a block transferring data to an ATAPI chip. A modified Euclid's algorithm was used as a key equation solver for the ECC block To achieve the high-speed decoding, the RS decoder is operated by a pipelined method. Audio playability is increased by playing a CD-DA disc at the speed of 12X or 16X. For this, subcode sync and data are processed in the same way as main data processing. The overall performance of IC is verified by measuring a transfer rate from the innermost area of disc to the outermost area. At 48-speed, the operating frequency is 210 ㎒, and this chip is fabricated by 0.35 um STD90 cell library of Samsung Electronics.
PDF

A Design of Beam Steeringn-phase OPTO-ULSI Processor for IIPS (IIPS를위한 광선 제어용n-위상 OPTO-ULSI 프로세서의 디자인)

Lee, Chang-Ki;Im, Hyung-Kyu
- Journal of the Korea Computer Industry Society
- /
- v.5 no.2
- /
- pp.261-268
- /
- 2004
This study to design an optimum phase implementing a 256 phase Opto-ULSI processor for multi-function capable optical networks. The design of an 8 phase processor is already in construction and will provide the Initial base for experimentation and characterisation. The challenge is to be able to compensate for the non-linearity of the liquid crystal, find an optimum phase, and implement a larger scale Opto-ULSI processor. This research is oriented around the initial development of an 8 phase Opto-ULSI processor that implements a Beam Steering (BS) Opto-ULSI processor (OUP) for integrated intelligent photonic system (IIPS), while investigating the optimal phase characteristics and developing compensation for the non-linearity of liquid crystal.
PDF

Performance Study of Multicore Digital Signal Processor Architectures (멀티코어 디지털 신호처리 프로세서의 성능 연구)

Lee, Jongbok
- The Journal of the Institute of Internet, Broadcasting and Communication
- /
- v.13 no.4
- /
- pp.171-177
- /
- 2013
Due to the demand for high speed 3D graphic rendering, video file format conversion, compression, encryption and decryption technologies, the importance of digital signal processor system is growing rapidly. In order to satisfy the real-time constraints, high performance digital signal processor is required. Therefore, as in general purpose computer systems, digital signal processor should be designed as multicore architecture as well. Using UTDSP benchmarks as input, the trace-driven simulation has been performed and analyzed for the 2 to 16-core digital signal processor architectures with the cores from simple RISC to in-order and out-of-order superscalar processors for the various window sizes, extensively.
https://doi.org/10.7236/JIIBC.2013.13.4.171 인용 PDF KSCI

Design of Beam Steering n-phase OPTO-ULSI Processor for IIPS (IIPS를 위한 빔 조향 n위상 광 ULSI 프로세서 디자인)

Lee, Chang-Ki;Lim, Hyung-Kyu
- The Journal of the Korea institute of electronic communication sciences
- /
- v.3 no.3
- /
- pp.158-164
- /
- 2008
This project investigates an optimum phase design implementing a 256 phase Opto-ULSI processor for multi-function capable optical networks. The design of an 8 phase processor is already in construction and will provide the initial base for experimentation and characterization. The challenge is to be able to compensate for the non-linearity of the liquid crystal, find an optimum phase, and implement a larger scale Opto-ULSI processor. This research is oriented around the initial development of an 8 phase Opto-ULSI processor that implements a Beam Steering(BS) Opto-ULSI processor(OUP) for integrated intelligent photonic system(IIPS), while investigating the optimal phase characteristics and developing compensation for the non-linearity of liquid crystal.
PDF

Compiler Processor Trade-offs for Dynamic Scheduling of VLIW Instructions (VLIW명령어의 동적 스케줄링을 위한 컴파일러와 프로세서간 상호보완)

Sunghyun Jee
- Journal of KIISE:Computer Systems and Theory
- /
- v.31 no.5_6
- /
- pp.279-287
- /
- 2004
This paper describes a processor architecture, named Dynamically Instruction Scheduled VLIW (DISVLIW). The DISVLIW Processor architecture is designed for dynamic scheduling VLIW instructions using dependency information. The DISVLIW instruction format is augmented to allow dependency bit vectors to be placed in the same VLIW word. The DISVLIW processor dynamically schedules each instruction in long instructions using functional unit and dynamic scheduler pairs. Features such as explicit parallelism, balanced scheduling effort, and dynamic scheduling of VLIW instructions can be used to provide a sound frustructure for supercomputing. We simulate the DISVLIW processor architecture and show that the DISVLIW processor performs significantly better than the VLIW processor for a wide range of cache sites and across numerical benchmark applications.
PDF KSCI

Search Result 4,819, Processing Time 0.031 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)