Search | Korea Science

A Performance Study of Multi-core Out-of-Order Superscalar Processor Architecture (멀티코어 비순차 수퍼스칼라 프로세서의 성능 연구)

Lee, Jong-Bok
- The Transactions of The Korean Institute of Electrical Engineers
- /
- v.61 no.10
- /
- pp.1502-1507
- /
- 2012
In order to overcome the hardware complexity and power consumption problems, recently the multi-core architecture has been prevalent. For hardware simplicity, usually RISC processor is adopted as the unit core processor. However, if the performance of unit core processor is enhanced, the overall performance of the multi-core processor architecture can be further increased. In this paper, out-of-order superscalar processor is utilized for the multi-core processor architecture. Using SPEC 2000 benchmarks as input, the trace-driven simulation has been performed for the out-of-order superscalar cores between 2 and 16 extensively. As a result, the 16-core out-of-order superscalar processor for the window size of 16 resulted in 17.4 times speed up over the single-core out-of-order superscalar processor, and 50 times speed up over the single core RISC processor. When compared for the same number of cores on the average, the multi-core out-of-order superscalar processor performance achieved 3.2 times speed up over the multi-core RISC processor and 1.6 times speed up over the multi-core in-order superscalar processor.
https://doi.org/10.5370/KIEE.2012.61.10.1502 인용 PDF KSCI

Design of Embedded Processor Architecture Applicable to Mobile Multimedia (Mobile Multimedia 지원을 위한 Embedded Processor 구조 설계)

이호석;한진호;배영환;조한진
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.41 no.5
- /
- pp.71-80
- /
- 2004
This paper describes embedded processor architecture design which is applicable to multimedia in mobile platform The main description is based on basic processor architecture and consideration about energy efficiency when used in mobile platform To design processor data path architecture (pipeline, branch prediction, multiple issue superscalar, function unit number) which is optimal to multimedia application and cache hierarchy and its structure, we have nut the simulation with variant architecture using MPEG4 test bench as multimedia application. We analyzed energy efficiency of architecture to check if it is applicable to mobile platform and decide basic processor architecture based on analysis result. The suggested basic processor architecture not only can be applied to mobile platform but also can be applied to basic processor architecture of configurable processor which is designed through automatic design environment.
PDF KSCI

Implementation of a Flexible Intelligent Electronic Device(IED) platform based on The Network processor (Network processor 기반 유연 Intelligent Electronic Device(IED) 플랫폼 구현)

Jeon, Hyeon-Jin;Lee, Wan-Gyu;Chang, Tae-Gyu
- Proceedings of the KIEE Conference
- /
- 2006.04a
- /
- pp.255-257
- /
- 2006
This paper proposed a platform which includes both Network processor and DSP for flexible IED. The Network processor is one of the Intel's IXP4XX Product Line family and the DSP is one of the TI's C6000 family. An embedded Linux is ported in Network processor so that a DSP program can be downloaded to Network processor through ethernet and then downloaded to DSP. Using this method, various algorithms according to IED can be applied to the Network processor board. Maximum ten ADCs can be connected because there is a CPLD between DSP and ADC. That is, the network processor board which can measure maximum 40 channels is implemented. In DSP program, thread and double buffering methods are used not to miss voltage samples. The Network processor board is verified using a method that eight channel voltage signals converted to digital are transmitted to server through both DSP and IXP425.
PDF

3D graphics processor architecture based on multistreaming (다중스트리밍을 이용한 3차원 그래픽 프로세서 구조)

박용진;이동호
- Journal of the Korean Institute of Telematics and Electronics C
- /
- v.34C no.9
- /
- pp.10-21
- /
- 1997
In this paper, we propose multiple instruction issuable multi-streaming as a processor architecture for 3D graphics processor. Multistreaming can eliminate inteferences within concurrently executing instructions inthe pipelined processor to allow enough parallelism for parallel processing. Through cycle level simulation study, we show that the proposed architecture outperforms a conventional RISC processor, MIPS R3000 by three times with reasonable resource overheads. Multiple instruction issuable multistreaming processor will be a bood architecture for instruction processor when a large number of threads are guaranteed.
PDF

Application Specific Processor Design for H.264 Decoder with a Configurable Embedded Processor

Han, Jin-Ho;Lee, Mi-Young;Bae, Young-Hwan;Cho, Han-Jin
- ETRI Journal
- /
- v.27 no.5
- /
- pp.491-496
- /
- 2005
An application specific processor for an H.264 decoder with a configurable embedded processor is designed in this research. The motion compensation, inverse integer transform, inverse quantization, and entropy decoding algorithm of H.264 decoder software are optimized. We improved the performance of the processor with instruction-level hardware optimization, which is tailored to configurable embedded processor architecture. The optimized instructions for video processing can be used in other video compression standards such as MPEG 1, 2, and 4. A significant performance improvement is achieved with high flexibility. Experimental results show that we could achieve 300% performance for the H.264 baseline profile level 2 decoder.
PDF

A Study on Processor Monitoring for Integration Test of Flight Control Computer equipped with A Modern Processor (최신 프로세서 탑재 비행제어 컴퓨터의 통합시험을 위한 프로세서 모니터링 연구)

Lee, Cheol;Kim, Jae-Cheol;Cho, In-Jae
- Journal of Institute of Control, Robotics and Systems
- /
- v.14 no.10
- /
- pp.1081-1087
- /
- 2008
This paper describes limitations and solutions of the existing processor-monitoring concept for a military supersonics aircraft Flight Control Computer (FLCC) equipped with modern architecture processor to perform the system integration test. Safecritical FLCC integration test, which requires automatic test for thousands of test cases and real-time input/output test condition generation, depends on the processor-monitoring device called Processor Interface (PI). The PI, which relies upon on the FLCC processor's external address and data-bus data, has some limitations due to multi-fetching capability of the modern sophisticated military processors, like C6000's VLIW (Very-Long Instruction Word) architecture and PowerPC's Superscalar architecture. Several techniques for limitations were developed and proper monitoring approach was presented for modem processor-adopted FLCC system integration test.
https://doi.org/10.5302/J.ICROS.2008.14.10.1081 인용 PDF KSCI

AB9: A neural processor for inference acceleration

Cho, Yong Cheol Peter;Chung, Jaehoon;Yang, Jeongmin;Lyuh, Chun-Gi;Kim, HyunMi;Kim, Chan;Ham, Je-seok;Choi, Minseok;Shin, Kyoungseon;Han, Jinho;Kwon, Youngsu
- ETRI Journal
- /
- v.42 no.4
- /
- pp.491-504
- /
- 2020
We present AB9, a neural processor for inference acceleration. AB9 consists of a systolic tensor core (STC) neural network accelerator designed to accelerate artificial intelligence applications by exploiting the data reuse and parallelism characteristics inherent in neural networks while providing fast access to large on-chip memory. Complementing the hardware is an intuitive and user-friendly development environment that includes a simulator and an implementation flow that provides a high degree of programmability with a short development time. Along with a 40-TFLOP STC that includes 32k arithmetic units and over 36 MB of on-chip SRAM, our baseline implementation of AB9 consists of a 1-GHz quad-core setup with other various industry-standard peripheral intellectual properties. The acceleration performance and power efficiency were evaluated using YOLOv2, and the results show that AB9 has superior performance and power efficiency to that of a general-purpose graphics processing unit implementation. AB9 has been taped out in the TSMC 28-nm process with a chip size of 17 × 23 ㎟. Delivery is expected later this year.
https://doi.org/10.4218/etrij.2020-0134 인용 PDF KSCI

Performance Analysis of Caching Instructions on SVLIW Processor and VLIW Processor (SVLIW 프로세서와 VLIW 프로세서의 명령어 캐싱에 따른 성능 분석)

Ji, Sung-Hyun;Park, No-Kwang;Kim, Suk-Il
- Journal of IKEEE
- /
- v.1 no.1 s.1
- /
- pp.101-110
- /
- 1997
SVLIW processor architectures can resolve resource collisions and data dependencies between the instructions while scheduling VLIW instructions at run-time. As a result, long NOP word instructions can be removed from the object code produced for the processor. Thus, the occurrence of cache misses on the SVLIW processor would be lesser than that on the same cache size VLIW processor. Less frequent cache misses on the SVLIW processor would incur less frequent memory access, and thus, the total execution cycles to complete an application would be shortened compared with cases on the VLIW processor. Such a feature eventually compromises effects of longer instruction pipeline stages than those of the VLIW processor. In this paper, we formulate and compare two execution cycle models of the two architectures. A simulation results show that the longer memory access cycles when cache miss occurs, the total execution cycles of SVLIW processor would be shorter than those of VLIW processor.
PDF

Performance Study of Multi-core In-Order Superscalar Processor Architecture (멀티코어 순차 수퍼스칼라 프로세서의 성능 연구)

Lee, Jongbok
- The Journal of the Institute of Internet, Broadcasting and Communication
- /
- v.12 no.5
- /
- pp.123-128
- /
- 2012
In order to overcome the hardware complexity and performance limit problems, recently the multi-core architecture has been prevalent. For hardware simplicity, usually RISC processor is adopted as the unit core processor. However, if the performance of unit core processor is enhanced, the overall performance of the multi-core processor architecture can be further enhanced. In this paper, in-order superscalar processor is utilized as the core for the multi-core processor architecture. Using SPEC 2000 benchmarks as input, the trace-driven simulation has been performed for the number of superscalar cores between 2 and 16 and the window size of 4 to 16 extensively. As a result, the 16-core superscalar processor for the window size of 16 results in 8.4 times speed up over the single core superscalar processor. When compared with the same number of cores, the multi-core superscalar processor performance doubles that of the multi-core RISC processor.
https://doi.org/10.7236/JIWIT.2012.12.5.123 인용 PDF KSCI

A Design of Vector Processing Based 3D Graphics Geometry Processor (벡터 프로세싱 기반의 3차원 그래픽 지오메트리 프로세서 설계)

Lee, Jung-Woo;Kim, Ki-Chul
- Proceedings of the IEEK Conference
- /
- 2006.06a
- /
- pp.989-990
- /
- 2006
This paper presents a design of 3D Graphics Geometry processor. A geometry processor needs to cope with a large amount of computation and consists of transformation processor and lighting processor. To deal with the huge computation, a vector processing structure based on pipeline chaining is proposed. The proposed geometry processor performs 4.3M vertices/sec at 100MHz using 11 floating-point units.
PDF

Search Result 4,811, Processing Time 0.034 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)