Search | Korea Science

Performance Evaluation and Verification of MMX-type Instructions on an Embedded Parallel Processor (임베디드 병렬 프로세서 상에서 MMX타입 명령어의 성능평가 및 검증)

Jung, Yong-Bum;Kim, Yong-Min;Kim, Cheol-Hong;Kim, Jong-Myon
- Journal of the Korea Society of Computer and Information
- /
- v.16 no.10
- /
- pp.11-21
- /
- 2011
This paper introduces an SIMD(Single Instruction Multiple Data) based parallel processor that efficiently processes massive data inherent in multimedia. In addition, this paper implements MMX(MultiMedia eXtension)-type instructions on the data parallel processor and evaluates and analyzes the performance of the MMX-type instructions. The reference data parallel processor consists of 16 processors each of which has a 32-bit datapath. Experimental results for a JPEG compression application with a 1280x1024 pixel image indicate that MMX-type instructions achieves a 50% performance improvement over the baseline instructions on the same data parallel architecture. In addition, MMX-type instructions achieves 100% and 51% improvements over the baseline instructions in energy efficiency and area efficiency, respectively. These results demonstrate that multimedia specific instructions including MMX-type have potentials for widely used many-core GPU(Graphics Processing Unit) and any types of parallel processors.
https://doi.org/10.9708/jksci.2011.16.10.011 인용 PDF KSCI

Color Media Instructions for Embedded Parallel Processors (임베디드 병렬 프로세서를 위한 칼라미디어 명령어 구현)

Kim, Cheol-Hong;Kim, Jong-Myon
- Journal of KIISE:Computer Systems and Theory
- /
- v.35 no.7
- /
- pp.305-317
- /
- 2008
As a mobile computing environment is rapidly changing, increasing user demand for multimedia-over-wireless capabilities on embedded processors places constraints on performance, power, and sire. In this regard, this paper proposes color media instructions (CMI) for single instruction, multiple data (SIMD) parallel processors to meet the computational requirements and cost goals. While existing multimedia extensions store and process 48-bit pixels in a 32-bit register, CMI, which considers that color components are perceptually less significant, supports parallel operations on two-packed compressed 16-bit YCbCr (6 bit Y and 5 bits Cb, Cr) data in a 32-bit datapath processor. This provides greater concurrency and efficiency for YCbCr data processing. Moreover, the ability to reduce data format size reduces system cost. The reduction in data bandwidth also simplifies system design. Experimental results on a representative SIMD parallel processor architecture show that CMI achieves an average speedup of 6.3x over the baseline SIMD parallel processor performance. This is in contrast to MMX (a representative Intel's multimedia extensions), which achieves an average speedup of only 3.7x over the same baseline SIMD architecture. CMI also outperforms MMX in both area efficiency (a 52% increase versus a 13% increase) and energy efficiency (a 50% increase versus an 11% increase). CMI improves the performance and efficiency with a mere 3% increase in the system area and a 5% increase in the system power, while MMX requires a 14% increase in the system area and a 16% increase in the system power.
PDF KSCI

Implementation of Pixel Subword Parallel Processing Instructions for Embedded Parallel Processors (임베디드 병렬 프로세서를 위한 픽셀 서브워드 병렬처리 명령어 구현)

Jung, Yong-Bum;Kim, Jong-Myon
- The KIPS Transactions:PartA
- /
- v.18A no.3
- /
- pp.99-108
- /
- 2011
Processor technology is currently continued to parallel processing techniques, not by only increasing clock frequency of a single processor due to the high technology cost and power consumption. In this paper, a SIMD (Single Instruction Multiple Data) based parallel processor is introduced that efficiently processes massive data inherent in multimedia. In addition, this paper proposes pixel subword parallel processing instructions for the SIMD parallel processor architecture that efficiently operate on the image and video pixels. The proposed pixel subword parallel processing instructions store and process four 8-bit pixels on the partitioned four 12-bit registers in a 48-bit datapath architecture. This solves the overflow problem inherent in existing multimedia extensions and reduces the use of many packing/unpacking instructions. Experimental results using the same SIMD-based parallel processor architecture indicate that the proposed pixel subword parallel processing instructions achieve a speedup of $2.3{\times}$ over the baseline SIMD array performance. This is in contrast to MMX-type instructions (a representative Intel multimedia extension), which achieve a speedup of only $1.4{\times}$ over the same baseline SIMD array performance. In addition, the proposed instructions achieve $2.5{\times}$ better energy efficiency than the baseline program, while MMX-type instructions achieve only $1.8{\times}$ better energy efficiency than the baseline program.
https://doi.org/10.3745/KIPSTA.2011.18A.3.099 인용 PDF KSCI

Simulation of YUV-Aware Instructions for High-Performance, Low-Power Embedded Video Processors (고성능, 저전력 임베디드 비디오 프로세서를 위한 YUV 인식 명령어의 시뮬레이션)

Kim, Cheol-Hong;Kim, Jong-Myon
- Journal of KIISE:Computing Practices and Letters
- /
- v.13 no.5
- /
- pp.252-259
- /
- 2007
With the rapid development of multimedia applications and wireless communication networks, consumer demand for video-over-wireless capability on mobile computing systems is growing rapidly. In this regard, this paper introduces YUV-aware instructions that enhance the performance and efficiency in the processing of color image and video. Traditional multimedia extensions (e.g., MMX, SSE, VIS, and AltiVec) depend solely on generic subword parallelism whereas the proposed YUV-aware instructions support parallel operations on two-packed 16-bit YUV (6-bit Y, 5-bits U, V) values in a 32-bit datapath architecture, providing greater concurrency and efficiency for color image and video processing. Moreover, the ability to reduce data format size reduces system cost. Experiment results on a representative dynamically scheduled embedded superscalar processor show that YUV-aware instructions achieve an average speedup of 3.9x over the baseline superscalar performance. This is in contrast to MMX (a representative Intel#s multimedia extension), which achieves a speedup of only 2.1x over the same baseline superscalar processor. In addition, YUV-aware instructions outperform MMX instructions in energy reduction (75.8% reduction with YUV-aware instructions, but only 54.8% reduction with MMX instructions over the baseline).
PDF KSCI

Design of a Graphic Processor for Multimedia Data Processing (멀티미디어 데이타 처리를 위한 그래픽 프로세서 설계)

고익상;한우종;선우명동
- Journal of the Korean Institute of Telematics and Electronics C
- /
- v.36C no.10
- /
- pp.56-65
- /
- 1999
This paper presents an architecture and its instruction set for a graphic coprocessor(GCP) which can be used for a multimedia server. The proposed instruction set employs parallel architecture concepts, such as SIMD and Superscalar. GCP consists of a scheduler and four functional units. The scheduler solves an instruction bottleneck problem causing by sharing with four general processors(GPs). GCP can execute up to 4 instructions in parallel. It consists of about 56,000 gates and operates at 30 MHz clock frequency due to speed limitation of SOG technology. GCP meets the real-time DCT algorithm requirement of the CIF image format and can process up to 63 frames/sec for the DCT Algorithm and 21 frames/sec for the Full Block matching Algorithm of the CIF image format.
PDF

Multimedia Extension Instructions and Optimal Many-core Processor Architecture Exploration for Portable Ultrasonic Image Processing (휴대용 초음파 영상처리를 위한 멀티미디어 확장 명령어 및 최적의 매니코어 프로세서 구조 탐색)

Kang, Sung-Mo;Kim, Jong-Myon
- Journal of the Korea Society of Computer and Information
- /
- v.17 no.8
- /
- pp.1-10
- /
- 2012
This paper proposes design space exploration methodology of many-core processors including multimedia specific instructions to support high-performance and low power ultrasound imaging for portable devices. To explore the impact of multimedia instructions, we compare programs using multimedia instructions and baseline programs with a same many-core processor in terms of execution time, energy efficiency, and area efficiency. Experimental results using a $256{\times}256$ ultrasound image indicate that programs using multimedia instructions achieve 3.16 times of execution time, 8.13 times of energy efficiency, and 3.16 times of area efficiency over the baseline programs, respectively. Likewise, programs using multimedia instructions outperform the baseline programs using a $240{\times}320$ image (2.16 times of execution time, 4.04 times of energy efficiency, 2.16 times of area efficiency) as well as using a $240{\times}400$ image (2.25 times of execution time, 4.34 times of energy efficiency, 2.25 times of area efficiency). In addition, we explore optimal PE architecture of many-core processors including multimedia instructions by varying the number of PEs and memory size.
https://doi.org/10.9708/jksci.2012.17.8.001 인용 PDF KSCI

A Design of an Embedded Microprocessor with Variable Length Instruction Mode (가변길이 명령어 모드를 갖는 Embedded Microprocessor의 설계)

박기현;오민석;이광엽;한진호;김영수;배영환;조한진
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.41 no.4
- /
- pp.83-90
- /
- 2004
In this paper, we proposed a new instruction set(X32Y ISA) with 3 different types of instruction mode. The proposed instruction set organizes 32-bit, 24-bit, 16-bit instruction in order to solves a problem of memory size limitation in an embedded microprocessor. We designed a 32-bit 5 stage pipeline RISC microprocessor based on the X32V ISA. To verify the proposed the X32V ISA and a microprocessor, we estimated a program code size of multimedia application programs using a X32V simulator. In result, we verified that the Light mode and the Ultra Light mode obtains 8%, 27% reduction of a program code size through comparison with the Default mode. The proposed microprocessor was verified all X32V instructions execution at Xilinx FPGA with 33MHz operating frequency,
PDF KSCI

Design of High-speed H.264/AVC Parallel Decoder Using ASIP Approach (ASIP 기술을 활용한 H.264/AVC 고속 병렬 복호화기 설계)

Ji, Bong-Il;Sim, Dong-Gyu;Kim, Kyung-Su;Park, Seong-Mo
- Proceedings of the Korean Society of Broadcast Engineers Conference
- /
- 2009.11a
- /
- pp.251-254
- /
- 2009
본 논문에서는 고해상도 동영상의 실시간 복호화를 위하여 Application Specific Instruction-set Processor (ASIP)기술을 이용하여 H.264/AVC 고속 병렬 복호화기를 설계하였다. 우선, 하드웨어에 최적화된 구조로 복호화기를 설계하고 LISA로 기술한 멀티미디어 전용 명령어를 명령어 집합에 추가하였다. 이렇게 설계한 고속 H.264/AVC 복호화기는 사이클 기반 시뮬레이터에서 성능을 측정한 결과 기존 대비 약 35%의 복호화 사이클 감소를 보였다. 추가적인 성능 향상을 위해, 앞서 설계한 고속복호화기를 여러 개 사용하여 병렬 H.264/AVC 복호화기를 설계하였다. 병렬 복호화기는 여러 매크로블록을 동시에 복호화 처리함으로써 복호화기의 성능을 대폭 향상시켰다. 병렬 복호화기는 고속 복호화기 대비 약 75%의 복호화 사이클이 감소하였다. 이에 고해상도 동영상의 실시간 복호화를 위한 H.264/AVC 고속 병렬 복호화기의 설계 방법을 제시하고자 한다.
PDF

Implementation and Verification of a Multi-Core Processor including Multimedia Specific Instructions (멀티미디어 전용 명령어를 내장한 멀티코어 프로세서 구현 및 검증)

Seo, Jun-Sang;Kim, Jong-Myon
- IEMEK Journal of Embedded Systems and Applications
- /
- v.8 no.1
- /
- pp.17-24
- /
- 2013
In this paper, we present a multi-core processor including multimedia specific instructions to process multimedia data efficiently in the mobile environment. Multimedia specific instructions exploit subword level parallelism (SLP), while the multi-core processor exploits data level parallelism (DLP). These combined parallelisms improve the performance of multimedia processing applications. The proposed multi-core processor including multimedia specific instructions is implemented and tested using a Xilinx ISE 10.1 tool and SoCMaster3 testbed system including Vertex 4 FPGA. Experimental results using a fire detection algorithm show that multimedia specific instructions outperform baseline instructions in the same multi-core architecture in terms of performance (1.2x better), energy efficiency (1.37x better), and area efficiency (1.23x better).
https://doi.org/10.14372/IEMEK.2013.8.1.017 인용 PDF KSCI

An Active Prefetch Filtering Schemes using Exclusive Prefetch Cache (선인출 전용 캐시를 이용한 적극적 선인출 필터링 기법)

Chon Young-Suk;Kim Suk-il;Jeon Joong-nam
- The KIPS Transactions:PartA
- /
- v.12A no.1 s.91
- /
- pp.41-52
- /
- 2005
Memory reference instruction caused by cache miss is the critical factor that limits the processing power of processor. Cache prefetching technique is an effective way to reduce the latency due to memory access. However, excessively aggressive prefetch leads to cache pollution and finally to cancel out the advantage of prefetch. In this study, an active prefetch filtering scheme is introduced which dynamically decides whether to commence prefetching after referring a filtering table to reduce the cache pollution due to unnecessary prefetches. For the precision filtering, an evicted address referencing scheme has been proposed where the filter directly compares the current prefetch address with previous unnecessary prefetch addresses stored in filtering table. Moreover, a small sized exclusive prefetch cache has been introduced to increase the amount of eviction of unnecessarily prefetched addresses to enhance the accuracy of dynamic filtering. The exclusive prefetch cache also prevents useful demand data from being pushed out by prefetched data, while the evicted address direct referencing scheme enables the prefetch cache to keep most of useful prefetch data within its small size. Experimental results from commonly used general and multimedia benchmarks show that the average cache miss ratio has been decreased by $13.3{\%}$ by virtue of enhanced filtering accuracy compared with conventional schemes.
https://doi.org/10.3745/KIPSTA.2005.12A.1.041 인용 PDF KSCI

Search Result 12, Processing Time 0.02 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)