Search | Korea Science

A Performance Evaluation of Parallel Color Conversion based on the Thread Number on Multi-core Systems (멀티코어 시스템에서 쓰레드 수에 따른 병렬 색변환 성능 검증)

Kim, Cheong Ghil
- Journal of Satellite, Information and Communications
- /
- v.9 no.4
- /
- pp.73-76
- /
- 2014
With the increasing popularity of multi-core processors, they have been adopted even in embedded systems. Under this circumstance many multimedia applications can be parallelized on multi-core platforms because they usually require heavy computations and extensive memory accesses. This paper proposes an efficient thread-level parallel implementation for color space conversion on multi-core CPU. Thread-level parallelism has been becoming very useful parallel processing paradigm especially on shared memory computing systems. In this work, it is exploited by allocating different input pixels to each thread for concurrent loop executions. For the performance evaluation, this paper evaluate the performace improvements for color conversion on multi-core processors based on the processing speed comparison between its serial implementation and parallel ones. The results shows that thread-level parallel implementations show the overall similar ratios of performance improvements regardless of different multi-cores.
PDF KSCI

Enhanced Memory Allocator for Scalability Improvement On Multicore (멀티코어 환경에서의 확장성 향상을 위한 메모리 할당자)

Cho, Youngjoong;Kim, Inhyuk;Eom, Young Ik
- Proceedings of the Korea Information Processing Society Conference
- /
- 2013.05a
- /
- pp.164-165
- /
- 2013
멀티프로세서에서 시스템의 병렬성을 향상시키기 위해서 멀티스레드 프로그램을 이용한다. 이러한 멀티스레드 프로그램은 스레드간 역할을 분담하여 작업을 진행하게 된다. 멀티스레드 프로그램에는 생산자-소비자 구조가 있다. 기존 메모리할당자들은 생산자-소비자 구조에 대한 연구가 진행되지 않고, 크리티컬 섹션이 긴 락을 사용하여 성능상에 문제가 있다. 우리는 이러한 문제점을 독특한 메모리 해제 방법을 통해 해결하였고, 실험을 통해 메모리 할당자의 속도가 향상되는 것을 검증하였다.
https://doi.org/10.3745/PKIPS.y2013m05a.164 인용 PDF

The Pixel Shading on Multi Core GP-GPU with Dual Phase Architecture (듀얼 페이즈 구조의 멀티 코어 GP-GPU를 이용한 픽셀 셰이딩)

Kim, Jun-Seo;Park, Tae-Ryong;Lee, Kwang-Yeob
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2010.10a
- /
- pp.339-342
- /
- 2010
최근 프로세서가 클럭 향상의 한계에 부딪힘에 따라, 프로세서의 성능을 향상시키기 위해 멀티 코어 기반의 병렬처리를 이용한 방법들이 제안 되고 있다. 본 논문은 여러개의 연산기를 한 명령어 사이클에 동시에 사용할 수 있는 MIMD(Multiple Instruction, Multiple Data) 구조를 가지며, Scratch Counter를 이용해 멀티 코어와 멀티 스레드의 작업을 할당하는 구조의 GP-GPU(General Purpose - Graphics Processing Unit)를 활용해 멀티 코어, 멀티 스레드 환경에서의 효율적인 픽셀 셰이딩 방법을 설계 하였다. 선형 안개 픽셀 셰이딩의 경우 싱글코어에서 18.3 FPS이며 4개의 멀티코어 GP-GPU에서는 4배가 증가한 73.2 FPS 결과를 얻었다.
PDF

A Simulator for Performance Evaluation of Multithreaded Memory Allocation Operation in Multi-Core Environment (멀티코어 환경에서의 멀티스레드 기법을 이용한 메모리 할당 연산의 성능 평가를 위한 시뮬레이터)

Kim, Ho-Young;Huang, Dada;Han, Sang-Hyuck;Kim, Young-Kuk
- Proceedings of the Korean Information Science Society Conference
- /
- 2012.06a
- /
- pp.245-247
- /
- 2012
최근 멀티코어 프로세서의 활용이 대중화되고 있다. 멀티코어 시스템에서는 소프트웨어가 동시에 여러 코어를 사용하여 동작을 수행 할 때 성능 향상 효과를 얻을 수 있다. 즉, 하나의 소프트웨어가 여러 코어를 동시에 사용할 수 있는 멀티스레드 프로그래밍 기법을 사용할 때 성능을 높일 수 있다. 이러한 환경에서 효율적인 메모리 할당은 데스크톱, 서버 및 과학 등과 같은 응용에 매우 중요하다. 하지만, 동적으로 메모리를 할당하는 것은 메모리 할당 연산과 반환 연산 및 어떤 스레드가 다른 스레드의 힙 영역에 접근하는 것을 처리하기 위한 동기화 문제로 인한 오버헤드가 발생하여 성능에 영향을 끼치는 문제가 발생하게 된다. 따라서 이와 같은 환경에서 실제로 성능에 어느 정도 영향을 끼칠 것인가를 측정할 수 있는 도구가 필요하다. 이에 멀티코어 환경에서 멀티스레드 기법을 사용하여 메모리 할당 연산이 성능에 어떠한 영향을 끼치는지를 측정 및 평가할 수 있는 시뮬레이터인 MAES(Memory Allocation Evaluation Simulator)를 설계하고 구현한다.

Method of Multi Thread Management based on Shader Instruction for Mobile GPGPU (GPGPU를 위한 쉐이더 명령어기반 멀티 스레드 관리 기법)

Lee, Kwang-Yeob;Park, Tae-Ryong
- Journal of IKEEE
- /
- v.16 no.4
- /
- pp.310-315
- /
- 2012
This thesis is intended to design multi thread mobile GPGPU optimized in mobile environment, and to verify an effective thread management method of the multi thread mobile processor. In thread management, there is no management hardware and implement with software instructions. For the verification of the multi thread management method, Lane detection algorithm was implemented to compare nVidia's CUDA Architecture and the designed GPGPU in terms of thread management efficiency. The number of thread is normalized to 48 threads. An implemented Land Detection Algorithm is composed of Gaussian filter algorithm and Sobel Edge Detection algorithm. As a result, the designed GPGPU's thread efficiency is up to 2 times higher than CUDA's thread efficiency.
https://doi.org/10.7471/ikeee.2012.16.4.310 인용 PDF KSCI

Frame Partition based Parallelization of H.264/AVC decoder (프레임 분할 기반 병렬화 H.264/AVC 디코더)

Kim, Won-Jin;Park, Joo-Yul;Chung, Ki-Seok
- Proceedings of the Korean Society of Broadcast Engineers Conference
- /
- 2010.07a
- /
- pp.252-255
- /
- 2010
고해상도의 동영상 서비스가 보편화 되면서 동영상을 빠르게 처리를 위한 연구가 활발히 이루어 지고 있다. 그리고 멀티코어 프로세서의 사용이 증가 하고 멀티코어 시스템에서 H.264/AVC 디코더를 구현하기 위하여 다양한 병렬화 방법이 제안되고 있다. 하지만 H.264/AVC디코더의 병렬화를 진행하는 과정에서 각 스레드에서 처리하는 데이터의 처리시간 차이로 인하여 스레드의 동기를 확인 해야 한다. 이로 인하여 병렬화를 통한 성능 향상의 걸림돌이 된다. 우리는 이러한 병렬화 과정에서 발생하는 문제점을 고려하여 효과적으로 H.264/AVC 디코더를 병렬화 하는 방법에 대하여 연구하였다. 우리가 제안하는 Frame Partition based Parallelization (FPP) 방법은 프레임을 매크로 블록 묶음으로 나누어 병렬화 한다. 그리고 병렬화 과정에서 스레드를 처리하는 방법을 개선하여 성능을 향상 시켰다. 본 논문에서는 FFmpeg H.264/AVC 디코더를 이용하여 실험 하였고 인텔 쿼드 코어 기반의 멀티코어 시스템에서 멀티 스레드로 구현하였다. 우리는 FPP 방법을 적용하여 병렬화 방법 적용 전 H.264/AVC 디코더와 비교하여 최대 53%의 성능 향상을 보였다.
PDF

Embedded Multithreading Processor Architecture for Personal Information Devices (개인용 정보 단말장치를 위한 내장형 멀티스레딩 프로세서 구조)

Jeong, Ha-Young;Chung, Won-Young;Lee, Yong-Surk
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.47 no.9
- /
- pp.7-13
- /
- 2010
In this paper, we proposed a processor architecture that is suitable for next generation embedded applications, especially for personal information devices such as smart phones, tablet PC. Latest high performance embedded processors are developed to achieve high clock speed. Because increasing performance makes design more difficult and induces large overhead, architectural evolution in embedded processor field is necessary. Among more enhanced processor types, out-of-order superscalar cannot be a candidate for embedded applications due to its excessive complexity and relatively low performance gain compared to its overhead. Therefore, new architecture with moderate complexity must be designed. In this paper, we developed a low-cost SMT architecture model and compared its performance to other architectures including scalar, superscalar and multiprocessor. Because current personal information devices have a tendency to execute multiple tasks simultaneously, SMT or CMP can be a good choice. And our simulation result shows that the efficiency of SMT is the best among the architectures considered.
PDF KSCI

A Design of a Shader Processor based on a dual-phase pipeline architecture (듀얼 페이즈 명령어 파이프라인구조의 쉐이더 프로세서 설계)

Jeong, Hyung-Ki;Nam, Ki-Hun;Lee, Gwang-Yeob
- Journal of IKEEE
- /
- v.12 no.4
- /
- pp.246-254
- /
- 2008
This paper represents a design of a 4 way SIMD processor with multi-thread and dual phase instruction pipeline. 8 threads can be performing in round-robin order, so any hazards can’t occur. The dual phase pipeline makes a pipeline operate as two pipelines, and it can fetch maximum 4 unit instructions at once. This variable length instruction set divide into first phase and second phase instructions, and with this function, complex branch and addressing can be executed at one clock cycle. This processor reduces the code size to quarter, pull out the doubled performance improvement than normal SIMD architecture.
PDF

The Implementation of Real-time Performance Monitor for Multi-thread Application (멀티스레드 어플리케이션을 위한 실시간 성능모니터의 구현)

Kim, Jin-Hyuk;Shin, Kwang-Sik;Yoon, Wan-Oh;Lee, Chang-Ho;Choi, Sang-Bang
- Journal of the Institute of Electronics Engineers of Korea CI
- /
- v.48 no.3
- /
- pp.82-90
- /
- 2011
Multi-core system is becoming more general with development of microprocessors. Due to this change in performance improvement paradigm, switching conventional single thread applications with multi thread applications. Performance monitoring tools are used to optimize application performance because of complexity in development of multi thread applications. Conventional performance monitoring tools are focused on performance itself rather than user friendliness or real-time support. Real-time performance monitor identify the problem while multi-threaded applications should be performed as well as check real-time operating status of the application. So it can be used as an effective tool compared to non-real-time performance monitor that only with simple performance indicators to find the cause of the problem. In this paper, we propose RMPM(Real-time Multi-core Performance Monitor) which is real-time performance monitoring tool for multi-core system. Observation period is optimized by comparing relation between overhead due to performance evaluation period and accuracy. Our performance monitor shows not only amount of CPU usage of whole system, memory usage, network usage but also aspect of overhead distribution per thread of an application.
PDF KSCI

Multi-Threaded Parallel H.264/AVC Decoder for Multi-Core Systems (멀티코어 시스템을 위한 멀티스레드 H.264/AVC 병렬 디코더)

Kim, Won-Jin;Cho, Keol;Chung, Ki-Seok
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.47 no.11
- /
- pp.43-53
- /
- 2010
Wide deployment of high resolution video services leads to active studies on high speed video processing. Especially, prevalent employment of multi-core systems accelerates researches on high resolution video processing based on parallelization of multimedia software. In this paper, we propose a novel parallel H.264/AVC decoding scheme on a multi-core platform. Parallel H.264/AVC decoding is challenging not only because parallelization may incur significant synchronization overhead but also because software may have complicated dependencies. To overcome such issues, we propose a novel approach called Multi-Threaded Parallelization(MTP). In MTP, to reduce synchronization overhead, a separate thread is allocated to each stage in the pipeline. In addition, an efficient memory reuse technique is used to reduce the memory requirement. To verify the effectiveness of the proposed approach, we parallelized FFmpeg H.264/AVC decoder with the proposed technique using OpenMP, and carried out experiments on an Intel Quad-Core platform. The proposed design performs better than FFmpeg H.264/AVC decoder before the parallelization by 53%. We also reduced the amount of memory usage by 65% and 81% for a high-definition(HD) and a full high-definition(FHD) video, respectively compared with that of popular existing method called 2Dwave.
PDF KSCI

Search Result 15, Processing Time 0.021 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)