• Title/Summary/Keyword: 인텔

Search Result 134, Processing Time 0.025 seconds

Implementation of Pixel Subword Parallel Processing Instructions for Embedded Parallel Processors (임베디드 병렬 프로세서를 위한 픽셀 서브워드 병렬처리 명령어 구현)

  • Jung, Yong-Bum;Kim, Jong-Myon
    • The KIPS Transactions:PartA
    • /
    • v.18A no.3
    • /
    • pp.99-108
    • /
    • 2011
  • Processor technology is currently continued to parallel processing techniques, not by only increasing clock frequency of a single processor due to the high technology cost and power consumption. In this paper, a SIMD (Single Instruction Multiple Data) based parallel processor is introduced that efficiently processes massive data inherent in multimedia. In addition, this paper proposes pixel subword parallel processing instructions for the SIMD parallel processor architecture that efficiently operate on the image and video pixels. The proposed pixel subword parallel processing instructions store and process four 8-bit pixels on the partitioned four 12-bit registers in a 48-bit datapath architecture. This solves the overflow problem inherent in existing multimedia extensions and reduces the use of many packing/unpacking instructions. Experimental results using the same SIMD-based parallel processor architecture indicate that the proposed pixel subword parallel processing instructions achieve a speedup of $2.3{\times}$ over the baseline SIMD array performance. This is in contrast to MMX-type instructions (a representative Intel multimedia extension), which achieve a speedup of only $1.4{\times}$ over the same baseline SIMD array performance. In addition, the proposed instructions achieve $2.5{\times}$ better energy efficiency than the baseline program, while MMX-type instructions achieve only $1.8{\times}$ better energy efficiency than the baseline program.

Multi-Threaded Parallel H.264/AVC Decoder for Multi-Core Systems (멀티코어 시스템을 위한 멀티스레드 H.264/AVC 병렬 디코더)

  • Kim, Won-Jin;Cho, Keol;Chung, Ki-Seok
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.47 no.11
    • /
    • pp.43-53
    • /
    • 2010
  • Wide deployment of high resolution video services leads to active studies on high speed video processing. Especially, prevalent employment of multi-core systems accelerates researches on high resolution video processing based on parallelization of multimedia software. In this paper, we propose a novel parallel H.264/AVC decoding scheme on a multi-core platform. Parallel H.264/AVC decoding is challenging not only because parallelization may incur significant synchronization overhead but also because software may have complicated dependencies. To overcome such issues, we propose a novel approach called Multi-Threaded Parallelization(MTP). In MTP, to reduce synchronization overhead, a separate thread is allocated to each stage in the pipeline. In addition, an efficient memory reuse technique is used to reduce the memory requirement. To verify the effectiveness of the proposed approach, we parallelized FFmpeg H.264/AVC decoder with the proposed technique using OpenMP, and carried out experiments on an Intel Quad-Core platform. The proposed design performs better than FFmpeg H.264/AVC decoder before the parallelization by 53%. We also reduced the amount of memory usage by 65% and 81% for a high-definition(HD) and a full high-definition(FHD) video, respectively compared with that of popular existing method called 2Dwave.

Lightweight Loop Invariant Code Motion for Java Just-In-Time Compiler on Itanium (Itanium상의 자바 적시 컴파일러를 위한 가벼운 루프 불변 코드 이동)

  • Yu Jun-Min;Choi Hyung-Kyu;Moon Soo-Mook
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.3
    • /
    • pp.215-226
    • /
    • 2005
  • Loop invariant code motion (LICM) optimization includes relatively heavy code analyses, thus being not readily applicable to Java Just-In-Time (JIT) compilation where the JIT compilation time is part of the whole running time. 'Classical' LICM optimization first analyzes the code and constructs both the def-use chains and the use-def chains. which are then used for performing code motions. This paper proposes a light-weight LICM algorithm, which requires only the def-use chains of loop invariant code (without use-def chains) by exploiting the fact that the Java virtual machine is based on a stack machine, hence generating code with simpler patterns. We also propose two techniques that allow more code motions than classical LICM techniques. First, unlike previous JIT techniques that uses LICM only in single-path loops for simplicity, we apply LICM to multi-path loops (natural loops) safely for partially redundant code. Secondly, we move loop-invariant, partially-redundant null pointer check code via predication support in Itanium. The proposed techniques were implemented in a JIT compiler for Itanium processor on ORP (Open Runtime Platform) Java virtual machine of Intel. On SPECjvrn98 benchmarks, the proposed technique increases the JIT compilation overhead by the geometric mean of 1.3%, yet it improves the total running time by the geometric mean of 2.2%.

A Study on Improved Image Matching Method using the CUDA Computing (CUDA 연산을 이용한 개선된 영상 매칭 방법에 관한 연구)

  • Cho, Kyeongrae;Park, Byungjoon;Yoon, Taebok
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.16 no.4
    • /
    • pp.2749-2756
    • /
    • 2015
  • Recently, Depending on the quality of data increases, the problem of time-consuming to process the image is raised by being required to accelerate the image processing algorithms, in a traditional CPU and CUDA(Compute Unified Device Architecture) based recognition system for computing speed and performance gains compared to OpenMP When character recognition has been learned by the system to measure the input by the character data matching is implemented in an environment that recognizes the region of the well, so that the font of the characters image learning English alphabet are each constant and standardized in size and character an image matching method for calculating the matching has also been implemented. GPGPU (General Purpose GPU) programming platform technology when using the CUDA computing techniques to recognize and use the four cores of Intel i5 2500 with OpenMP to deal quickly and efficiently an algorithm, than the performance of existing CPU does not produce the rate of four times due to the delay of the data of the partition and merge operation proposed a method of improving the rate of speed of about 3.2 times, and the parallel processing of the video card that processes a result, the sequential operation of the process compared to CPU-based who performed the performance gain is about 21 tiems improvement in was confirmed.

Integrated Parallelization of Video Decoding on Multi-core Systems (멀티코어 시스템에서의 통합된 비디오 디코딩 병렬화)

  • Hong, Jung-Hyun;Kim, Won-Jin;Chung, Ki-Seok
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.49 no.7
    • /
    • pp.39-49
    • /
    • 2012
  • Demand for high resolution video services leads to active studies on high speed video processing. Especially, widespread deployment of multi-core systems accelerates researches on high resolution video processing based on parallelization of multimedia software. Previously proposed parallelization approach could improve the decoding performance. However, some parallelization methods did not consider the entropy decoding and others considered only a partial decoding parallelization. Therefore, we consider parallel entropy decoding integrated with other parallel video decoding process on a multi-core system. We propose a novel parallel decoding method called Integrated Parallelization. We propose a method on how to optimize the parallelization of video decoding when we have a multi-core system with many cores. We parallelized the KTA 2.7 decoder with the proposed technique on an Intel i7 Quad-Core platform with Intel Hyper-Threading technology and multi-threads scheduling. We achieved up to 70% performance improvement using IP method.

A Study on the Implementation of PC Interface for Packet Terminal of ISDN (ISDN 패킷 단말기용 PC 접속기 구현에 관한 연구)

  • 조병록;박병철
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.16 no.12
    • /
    • pp.1336-1347
    • /
    • 1991
  • In this paper, The PC interface for packet terminal of ISDN is designed and implemented in order to build packet communication networks which share computer resources and exchange informations between computer in the ISDN environment. The PC interface for packet terminal of ISDN constitutes S interface handler part which controls functions of ISDN layer1 and layer 2, constitutes packet handler part which controls services of X.25 protocol in the packet level.Where, The function of ISDN layer1 provides rules of electrical and mechanical characteristics, services for ISDN layer 2. The function of ISDN layer 2 provides function of LAPD procedure, services for X.25 The X.25 specifies interface between DCE and DTE for terminals operrating in the packet mode. The S interface handler part is orfanized by Am 79C30 ICs manufactured by Advanecd Micro Devices. ISDN packet handler part is organiged by AmZ8038 for FIFO for the purpose of D channel. The common signal procedure for D channel is controlled by Intel's 8086 microprocessor. The S interface handler part is based on ISDN layer1,2 is controlled by mail box in order to communicate between layers. The ISDN packet handler part is based on module in the X.25 lebel. The communication between S interface handler part and ISDN packet handler part is organized by interface controller.

  • PDF

An Efficient WLAN Device Power Control Technique for Streaming Multimedia Contents over Mobile IP Storage (모바일 IP 스토리지 상에서 멀티미디어 컨텐츠 실행을 위한 효율적인 무선랜 장치 전력제어 기법)

  • Nam, Young-Jin;Choi, Min-Seok
    • The KIPS Transactions:PartA
    • /
    • v.16A no.5
    • /
    • pp.357-368
    • /
    • 2009
  • Mobile IP storage has been proposed to overcome storage limitation in the flash memory and hard disks. It provides almost capacity-free space for mobile devices over wireless IP networks. However, battery lifetime of the mobile devices is reduced rapidly because of power consumption with continuous use of a WLAN device when multimedia contents are being streamed through the mobile IP storage. This paper proposes an energy-efficient WLAN device power control technique for streaming multimedia contents with the mobile IP storage. The proposed technique consists of a prefetch buffer input/output module, a WLAN device power control module, and a reconfigurable prefetch buffer module. Besides, it adaptively determines the size of the prefetch buffer according to a quality of the multimedia contents, and it dynamically controls the power mode of the WLAN device on the basis of power on-off operations while streaming the multimedia contents. We evaluate the performance of the proposed technique on a PXA270-based mobile device that employs the embedded linux 2.6.11, Intel iSCSI reference codes, and a WLAN device. Extensive experiments reveal that the proposed technique can save the energy consumption of the WLAN device up to 8.5 times with QVGA multimedia contents, as compared with no power control.

Early Null Pointer Check using Predication in Java Just-In-Time Compilation (자바 적시 컴파일에서의 조건 수행을 이용한 비어 있는 포인터의 조기검사)

  • Lee Sanggyu;Choi Hyug-Kyu;Moon Soo-Mook
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.7
    • /
    • pp.683-692
    • /
    • 2005
  • Java specification states that all accesses to an object must be checked at runtime if object refers to null. Since Java is an object-oriented language, object accesses are frequent enough to make null pointer checks affect the performance significantly. In order to reduce the performance degradation, there have been attempts to remove redundant null pointer checks. For example, in a Java environment where a just-in-time (JIT) compiler is used, the JIT compiler removes redundant null pointer check code via code analysis. This paper proposes a technique to remove additional null pointer check code that could not be removed by previous JIT compilation techniques, via early null pointer check using an architectural feature called predication. Generally, null point check code consists of two instructions: a compare and a branch. Our idea is moving the compare instruction that is usually located just before an use of an object, to the point right after the object is defined so that the total number of compare instructions is reduced. This results in reduction of dynamic and static compare instructions by 3.21$\%$ and 1.98$\%$. respectively, in SPECjvm98 bechmarks, compared to the code that has already been optimized by previous null pointer check elimination techniques. Its performance impact on an Itanium machine is an improvement of 0.32$\%$.

Low Power EccEDF Algorithm for Real-Time Operating Systems (실시간 운영체제를 위한 저전력 EccEDF 알고리듬)

  • Lee, Min-Seok;Lee, Cheol-Hoon
    • The Journal of the Korea Contents Association
    • /
    • v.15 no.1
    • /
    • pp.31-43
    • /
    • 2015
  • For battery based real-time embedded systems, high performance to meet their real-time constraints and energy efficiency to extend battery life are both essential. Real-Time Dynamic Voltage Scaling (RT-DVS) has been a key technique to satisfy both requirements. In this paper, we present an efficient RT-DVS algorithm called EccEDF that is designed based on ccEDF. The proposed algorithm can precisely calculate the maximum unused utilization with consideration of the elapsed time while keeping the structural simplicity of ccEDF, which overlooked the time needed to run the task in calculating the available slack. The maximum unused utilization can be calculated by dividing remaining execution time($C_i-cc_i$) by remaining time($P_i-E_i$) on completion of the task and it is proved using Fluid scheduling model. We also show that the algorithm outperforms ccEDF in practical applications which is modelled using a PXA250 and a 0.28V-to-1.2V wide-operating-range IA-32 processor model.

A Security Nonce Generation Algorithm Scheme Research for Improving Data Reliability and Anomaly Pattern Detection of Smart City Platform Data Management (스마트시티 플랫폼 데이터 운영의 이상패턴 탐지 및 데이터 신뢰성 향상을 위한 보안 난수 생성 알고리즘 방안 연구)

  • Lee, Jaekwan;Shin, Jinho;Joo, Yongjae;Noh, Jaekoo;Kim, Jae Do;Kim, Yongjoon;Jung, Namjoon
    • KEPCO Journal on Electric Power and Energy
    • /
    • v.4 no.2
    • /
    • pp.75-80
    • /
    • 2018
  • The smart city is developing an energy system efficiently through a common management of the city resource for the growth and a low carbon social. However, the smart city doesn't counter a verification effectively about a anomaly pattern detection when existing security technology (authentication, integrity, confidentiality) is used by fixed security key and key deodorization according to generated big data. This paper is proposed the "security nonce generation based on security nonce generation" for anomaly pattern detection of the adversary and a safety of the key is high through the key generation of the KDC (Key Distribution Center; KDC) for improvement. The proposed scheme distributes the generated security nonce and authentication keys to each facilities system by the KDC. This proposed scheme can be enhanced to the security by doing the external pattern detection and changed new security key through distributed security nonce with keys. Therefore, this paper can do improving the security and a responsibility of the smart city platform management data through the anomaly pattern detection and the safety of the keys.