Search | Korea Science

A Performance Evaluation of Parallel Color Conversion based on the Thread Number on Multi-core Systems (멀티코어 시스템에서 쓰레드 수에 따른 병렬 색변환 성능 검증)

Kim, Cheong Ghil
- Journal of Satellite, Information and Communications
- /
- v.9 no.4
- /
- pp.73-76
- /
- 2014
With the increasing popularity of multi-core processors, they have been adopted even in embedded systems. Under this circumstance many multimedia applications can be parallelized on multi-core platforms because they usually require heavy computations and extensive memory accesses. This paper proposes an efficient thread-level parallel implementation for color space conversion on multi-core CPU. Thread-level parallelism has been becoming very useful parallel processing paradigm especially on shared memory computing systems. In this work, it is exploited by allocating different input pixels to each thread for concurrent loop executions. For the performance evaluation, this paper evaluate the performace improvements for color conversion on multi-core processors based on the processing speed comparison between its serial implementation and parallel ones. The results shows that thread-level parallel implementations show the overall similar ratios of performance improvements regardless of different multi-cores.
PDF KSCI

A Massively Parallel Algorithm for Fuzzy Vector Quantization (퍼지 벡터 양자화를 위한 대규모 병렬 알고리즘)

Huynh, Luong Van;Kim, Cheol-Hong;Kim, Jong-Myon
- The KIPS Transactions:PartA
- /
- v.16A no.6
- /
- pp.411-418
- /
- 2009
Vector quantization algorithm based on fuzzy clustering has been widely used in the field of data compression since the use of fuzzy clustering analysis in the early stages of a vector quantization process can make this process less sensitive to its initialization. However, the process of fuzzy clustering is computationally very intensive because of its complex framework for the quantitative formulation of the uncertainty involved in the training vector space. To overcome the computational burden of the process, this paper introduces an array architecture for the implementation of fuzzy vector quantization (FVQ). The arrayarchitecture, which consists of 4,096 processing elements (PEs), provides a computationally efficient solution by employing an effective vector assignment strategy during the clustering process. Experimental results indicatethat the proposed parallel implementation providessignificantly greater performance and efficiency than appropriately scaled alternative array systems. In addition, the proposed parallel implementation provides 1000x greater performance and 100x higher energy efficiency than other implementations using today's ARMand TI DSP processors in the same 130nm technology. These results demonstrate that the proposed parallel implementation shows the potential for improved performance and energy efficiency.
https://doi.org/10.3745/KIPSTA.2009.16A.6.411 인용 PDF KSCI

Parallel Distributed Implementation of GHT on MPI-based PC Cluster (MPI 기반 PC 클러스터에서 GHT의 병렬 분산 구현)

Kim, Yeong-Soo;Kim, Jeong-Sahm;Choi, Heung-Moon
- Journal of the Institute of Electronics Engineers of Korea CI
- /
- v.44 no.3
- /
- pp.81-89
- /
- 2007
This paper presents a parallel distributed implementation of the GHT (generalized Hough transform) for the fast processing on the MPI-based PC cluster. We tried to achieve the higher speedup mainly by alleviating the communication overhead through the pipelined broadcast and accumulator array partition strategy and by time overlapping of the communication and the computation over entire process. Experimental results show that nearly linear speedup is reachable by the proposed method on the MPI-based PC clusters connected through 100Mbps Ethernet switch.
PDF KSCI

Implementation and Performance Evaluation of an Object-Oriented Parallel Programming Environment with Multithreaded Computational Model (다중스레드 계산 모델을 이용한 병렬 객체 지향 프로그래밍 환경의 구현 및 성능 평가)

Song, Jong-Hun;Kim, Heung-Hwan;Han, Sang-Yeong
- Journal of KIISE:Computing Practices and Letters
- /
- v.5 no.6
- /
- pp.708-718
- /
- 1999
본 논문에서 제안하는 시스템은 일반적인 병렬 시스템의 하드웨어 구조에서, 다중 스레드 계산 모델을 이용하여 객체 지향 프로그래밍 환경을 구현한 시스템이다. 제안하는 시스템을 효과적으로 구현하기 위하여 컴파일러와 실행 시간 시스템의 측면에서 여러 가지 기법을 제시한다. 컴파일러의 측면에서는 멤버 변수의 접근 분석, 메소드의 병렬성 분석 기법을 제시하고, 실행 시간 시스템에서는 실시간 스레드/메시지 결합, 프레임 공유 기법을 제시한다. 본 논문에서 제안된 프로그래밍 환경은, MPI 메시지 인터페이스를 이용하여 구현하였으며, 벤치마크 프로그램을 실행함으로써 성능 분석을 하였다. 분석의 결과는 실행시간 시스템의 여러 가지 기법들이 성능 향상에 많은 효과가 있음을 보여주며, 이러한 결과는 일반적인 병렬 시스템에서도 적용 가능하다.Abstract In this paper, we suggest an object-oriented programming environment with multithreaded computation model on general parallel processing systems. We developed many methods for our environment to be efficient : in compiler, the analysis of member variable and method parallelism, and in runtime system, thread/message merging and frame sharing. The programming environment is implemented with MPI message interface, and its performance is analyzed with executing benchmark programs. The results show that the developed methods have influence on performance improvement, and this improvement can be applied to general parallel processing systems.

Optimized Implementation of Block Cipher PIPO in Parallel-Way on 64-bit ARM Processors (64-bit ARM 프로세서 상에서의 블록암호 PIPO 병렬 최적 구현)

Eum, Si Woo;Kwon, Hyeok Dong;Kim, Hyun Jun;Jang, Kyoung Bae;Kim, Hyun Ji;Park, Jae Hoon;Song, Gyeung Ju;Sim, Min Joo;Seo, Hwa Jeong
- KIPS Transactions on Computer and Communication Systems
- /
- v.10 no.8
- /
- pp.223-230
- /
- 2021
The lightweight block cipher PIPO announced at ICISC'20 has been effectively implemented by applying the bit slice technique. In this paper, we propose a parallel optimal implementation of PIPO for ARM processors. The proposed implementation enables parallel encryption of 8-plaintexts and 16-plaintexts. The implementation targets the A10x fusion processor. On the target processor, the existing reference PIPO code has performance of 34.6 cpb and 44.7 cpb in 64/128 and 64/256 standards. Among the proposed methods, the general implementation has a performance of 12.0 cpb and 15.6 cpb in the 8-plaintexts 64/128 and 64/256 standards, and 6.3 cpb and 8.1 cpb in the 16-plaintexts 64/128 and 64/256 standards. Compared to the existing reference code implementation, the 8-plaintexts parallel implementation for each standard has about 65.3%, 66.4%, and the 16-plaintexts parallel implementation, about 81.8%, and 82.1% better performance. The register minimum alignment implementation shows performance of 8.2 cpb and 10.2 cpb in the 8-plaintexts 64/128 and 64/256 specifications, and 3.9 cpb and 4.8 cpb in the 16-plaintexts 64/128 and 64/256 specifications. Compared to the existing reference code implementation, the 8-plaintexts parallel implementation has improved performance by about 76.3% and 77.2%, and the 16-plaintext parallel implementation is about 88.7% and 89.3% higher for each standard.
https://doi.org/10.3745/KTCCS.2021.10.8.223 인용 PDF KSCI

Parallelization of Feature Detection and Panorama Image Generation using OpenCL and Embedded GPU (OpenCL 및 Embedded GPU를 이용한 영상 특징 추출 및 파노라마 영상 생성의 병렬화)

Kang, Seung Heon;Lee, Seung-Jae;Lee, Man Hee;Park, In Kyu
- Journal of Broadcast Engineering
- /
- v.19 no.3
- /
- pp.316-328
- /
- 2014
In this paper, we parallelize the popular feature detection algorithms, i.e. SIFT and SURF, and its application to fast panoramic image generation on the latest embedded GPU. Parallelized algorithms are implemented using recently developed OpenCL as the embedded GPGPU software platform. We compare the implementation efficiency and speed performance of conventional OpenGL Shading Language and OpenCL. Experimental result shows that implementation on OpenCL has comparable performance with GLSL. Compared with the performance on the embedded CPU in the same application processor, the embedded GPU runs 3~4 times faster. As an example of using feature extraction, panorama image synthesis is performed on embedded GPU by applying image matching using detected features.
https://doi.org/10.5909/JBE.2014.19.3.316 인용 PDF KSCI KPUBS

Parallel Processing System with combined Architecture of SIMD with MIMD (SIMD와 MIMD가 결합된 구조를 갖는 병렬처리시스템)

Lee, Hyung;Choi, Sung-Hyuk;Kim, Jung-Bae;Park, Jong-Won
- The KIPS Transactions:PartA
- /
- v.8A no.1
- /
- pp.9-15
- /
- 2001
영상에 관련된 다양한 응용 시스템들을 구현하는 많은 연구들이 진행되어 왔지만, 그러한 영상 관련 응용 시스템을 구현함에 있어서 처리속도의 저하로 인하여 많은 어려움을 겪고 있다. 이를 해결하기 위해 대두된 여러 방법들 중에서 최근 하드웨어 접근 방법에 고려한 많은 관심과 연구가 진행되고 있다. 본 논문은 영상을 실시간으로 처리하기 위하여 하드웨어 구조를 갖는 병렬처리시스템을 기술하며, 또한 병렬처리시스템을 얼굴 검색 시스템에 적용한 후 처리속도 및 실험 결과를 기술한다. 병렬처리시스템은 SIMD와 MIMD가 결합된 구조를 갖고 있기 때문에 다양한 영상 응용시스템에 대해서 융통성과 효율성을 제공하며, 144개의 처리기와 12개의 다중접근기억장치, 외부 메모리 모듈을 위한 인터페이스와 외부 프로세서 장치(i960Kx)와의 통신을 위한 인터페이스로 구성되어있다. 다중접근기억장치는 메모리 모듈선택회로, 데이터 라이팅회로, 그리고, 주소계산 및 라우팅회로로 구성되어 있다. 또한 얼굴 검색 시스템을 병렬처리 시스템에 적합한 병렬화를 제공하기 위해 메쉬방법을 이용하여 전처리, 정규화, 4개 특징값 추출, 그리고 분류화로 구성하였다. 병렬처리시스템은 하드웨어 모의실험 패키지인 CADENCE사의 Verilog-XL로 모의실험을 수행하여 기능과 성능을 검증하였다.
PDF

Comparison of Parallelized Network Coding Performance (네트워크 코딩의 병렬처리 성능비교)

Choi, Seong-Min;Park, Joon-Sang;Ahn, Sang-Hyun
- The KIPS Transactions:PartC
- /
- v.19C no.4
- /
- pp.247-252
- /
- 2012
Network coding has been shown to improve various performance metrics in network systems. However, if network coding is implemented as software a huge time delay may be incurred at encoding/decoding stage so it is imperative for network coding to be parallelized to reduce time delay when encoding/decoding. In this paper, we compare the performance of parallelized decoders for random linear network coding (RLC) and pipeline network coding (PNC), a recent development in order to alleviate problems of RLC. We also compare multi-threaded algorithms on multi-core CPUs and massively parallelized algorithms on GPGPU for PNC/RLC.
https://doi.org/10.3745/KIPSTC.2012.19C.4.247 인용 PDF KSCI

An Effective Parallel Implementation of Sound Synthesis of Guitar using GPU (GPU를 이용한 기타의 음 합성을 위한 효과적인 병렬 구현)

Kang, Sung-Mo;Kim, Jong-Myon
- Journal of the Korea Society of Computer and Information
- /
- v.18 no.8
- /
- pp.1-8
- /
- 2013
This paper proposes an effective parallel implementation of a physical modeling synthesis of guitar on the GPU environment. We used appropriate filter coefficients and adjusted the length of delay line for each open string to generate 44,100 six-polyphonic guitar sounds (E2, A2, D3, G4, B3, E4) by using physical modeling synthesis. In addition, we analyzed the physical modeling synthesis algorithm and observed that we can exploit parallelism inherent in the length of delay line. Thus, we assigned CUDA cores as many as the length of delay line and effectively implemented the physical modeling synthesis using GPU to achieve the highest performance. Experimental results indicated that synthetic guitar sounds using GPU were very similar to the original sounds when we compared their spectra. In addition, GPU achieved 68x and 3x better performance than high-performance TI DSP and CPU, respectively. Furthermore, this paper implemented and evaluated the performance of multi-GPU systems for the physical modeling algorithm.
https://doi.org/10.9708/jksci.2013.18.8.001 인용 PDF KSCI

Design and Implementation of Visual Environment for Parallel Object-Oriented Programming (병렬 객체지향 프로그래밍을 위한 시각 환경의 설계 및 구현)

Choe, Suk-Yeong
- The Transactions of the Korea Information Processing Society
- /
- v.6 no.2
- /
- pp.485-496
- /
- 1999
Comparing with sequential programming, parallel programming has additional complexity due to the consideration of parallelism, communication and synchronization of processes. A synergism between users and compliers should be established, each assisting the other to produce high quality parallel programs. On the above underlying philosophy, we developed a parallel Object-Oriented specification language, POOSL, as preliminary works. However, it is still likely to hard for users to write parallel program because users have to consider grammar of POOSL and to write text-based parallel program. It would be more desirable to provide users wit visual environment for effective parallel programming. Therefore, we propose a visual programming environment. VEPO(Visual environment for Parallel Object-Oriented Programming), based on POOSL in order that users can develop parallel programs more easily and conveniently. It aims at supporting a programming environment in which users can represent their programs more naturally and visually I parallel manner with object-oriented concept and essential steps during parallel program development such as program specification, compilation, execution and animation of execution are integrated. VEPO has useful features for parallel processing. Especially, complicated parallel codes for synchronization and communication of processes are automatically generated in the translation phase, so users can be relieved of writing error-prone parallel codes. The system is targeted to the transputer-based parallel system, MC-3. The graphic user interface of VEPO was implemented using Visual C++. Visual programs descirbed on VEPO are translated into Inmos C and executed on MC-3.
PDF

Search Result 1,474, Processing Time 0.027 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)