• Title/Summary/Keyword: multi-core systems

Search Result 249, Processing Time 0.041 seconds

Multi-Threaded Parallel H.264/AVC Decoder for Multi-Core Systems (멀티코어 시스템을 위한 멀티스레드 H.264/AVC 병렬 디코더)

  • Kim, Won-Jin;Cho, Keol;Chung, Ki-Seok
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.47 no.11
    • /
    • pp.43-53
    • /
    • 2010
  • Wide deployment of high resolution video services leads to active studies on high speed video processing. Especially, prevalent employment of multi-core systems accelerates researches on high resolution video processing based on parallelization of multimedia software. In this paper, we propose a novel parallel H.264/AVC decoding scheme on a multi-core platform. Parallel H.264/AVC decoding is challenging not only because parallelization may incur significant synchronization overhead but also because software may have complicated dependencies. To overcome such issues, we propose a novel approach called Multi-Threaded Parallelization(MTP). In MTP, to reduce synchronization overhead, a separate thread is allocated to each stage in the pipeline. In addition, an efficient memory reuse technique is used to reduce the memory requirement. To verify the effectiveness of the proposed approach, we parallelized FFmpeg H.264/AVC decoder with the proposed technique using OpenMP, and carried out experiments on an Intel Quad-Core platform. The proposed design performs better than FFmpeg H.264/AVC decoder before the parallelization by 53%. We also reduced the amount of memory usage by 65% and 81% for a high-definition(HD) and a full high-definition(FHD) video, respectively compared with that of popular existing method called 2Dwave.

Joint Channel Assignment and Multi-path Routing in Multi-radio Multi-channel Wireless Mesh Network

  • Pham, Ngoc Thai;Choi, Myeong-Gil;Hwang, Won-Joo
    • Journal of Korea Multimedia Society
    • /
    • v.12 no.6
    • /
    • pp.824-832
    • /
    • 2009
  • Multi-radio multi-channel Wireless Mesh Network requires an effective management policy to control the assignment of channels to each radio. We concentrated our investigation on modeling method and solution to find a dynamic channel assignment scheme that is adapted to change of network traffic. Multi-path routing scheme was chosen to overwhelm the unreliability of wireless link. For a particular traffic state, our optimization model found a specific traffic distribution over multi-path and a channel assignment scheme that maximizes the overall network throughput. We developed a simple heuristic method for channel assignment by gradually removing clique load to obtain higher throughput. We also presented numerical examples and discussion of our models in comparison with existing research.

  • PDF

Accelerating Group Fusion for Ligand-Based Virtual Screening on Multi-core and Many-core Platforms

  • Mohd-Hilmi, Mohd-Norhadri;Al-Laila, Marwah Haitham;Hassain Malim, Nurul Hashimah Ahamed
    • Journal of Information Processing Systems
    • /
    • v.12 no.4
    • /
    • pp.724-740
    • /
    • 2016
  • The performance issues of screening large database compounds and multiple query compounds in virtual screening highlight a common concern in Chemoinformatics applications. This study investigates these problems by choosing group fusion as a pilot model and presents efficient parallel solutions in parallel platforms, specifically, the multi-core architecture of CPU and many-core architecture of graphical processing unit (GPU). A study of sequential group fusion and a proposed design of parallel CUDA group fusion are presented in this paper. The design involves solving two important stages of group fusion, namely, similarity search and fusion (MAX rule), while addressing embarrassingly parallel and parallel reduction models. The sequential, optimized sequential and parallel OpenMP of group fusion were implemented and evaluated. The outcome of the analysis from these three different design approaches influenced the design of parallel CUDA version in order to optimize and achieve high computation intensity. The proposed parallel CUDA performed better than sequential and parallel OpenMP in terms of both execution time and speedup. The parallel CUDA was 5-10x faster than sequential and parallel OpenMP as both similarity search and fusion MAX stages had been CUDA-optimized.

Joint Scheduling and Rate Optimization in Multi-channel Multi-radio Wireless Networks with Contention-based MAC

  • Bui, Dang Quang;Choi, Myeong-Gil;Hwang, Won-Joo
    • Journal of Korea Multimedia Society
    • /
    • v.11 no.12
    • /
    • pp.1716-1721
    • /
    • 2008
  • Currently, Wireless Networks have some nice characteristics such as multi-hop, multi-channel, multi-radio, etc but these kinds of resources are not fully used. The most difficulty to solve this issue is to solve mixed integer optimization. This paper proposes a method to solve a class of mixed integer optimization for wireless networks by using AMPL modeling language with CPLEX solver. The result of method is scheduling and congestion control in multi-channel multi-radio wireless networks.

  • PDF

Separate Signature Monitoring for Control Flow Error Detection (제어흐름 에러 탐지를 위한 분리형 시그니처 모니터링 기법)

  • Choi, Kiho;Park, Daejin;Cho, Jeonghun
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.13 no.5
    • /
    • pp.225-234
    • /
    • 2018
  • Control flow errors are caused by the vulnerability of memory and result in system failure. Signature-based control flow monitoring is a representative method for alleviating the problem. The method commonly consists of two routines; one routine is signature update and the other is signature verification. However, in the existing signature-based control flow monitoring, monitoring target application is tightly combined with the monitoring code, and the operation of monitoring in a single thread is the basic model. This makes the signature-based monitoring method difficult to expect performance improvement that can be taken in multi-thread and multi-core environments. In this paper, we propose a new signature-based control flow monitoring model that separates signature update and signature verification in thread level. The signature update is combined with application thread and signature verification runs on a separate monitor thread. In the proposed model, the application thread and the monitor thread are separated from each other, so that we can expect a performance improvement that can be taken in a multi-core and multi-thread environment.

Integrated Fire Monitoring System Based on Wireless Multi-Hop Sensor Network and Mobile Robot (무선 멀티 홉 센서 네트워크와 이동로봇을 이용한 통합 화재 감시 시스템)

  • Kim, Tae-Hyoung;Seo, Gang-Lae;Lee, Jae-Yeon;Lee, Won-Chang
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.16 no.2
    • /
    • pp.114-119
    • /
    • 2010
  • Network technology has been developed rapidly for digital service in these days. ZigBee, one of the IEEE 802.15.4 protocols, supporting local communication has become the core technology in the wireless network area. In this paper we designed an integrated fire monitoring system using a mobile robot and the ZigBee sensor nodes which are deployed to monitor fires. When a fire breaks out, the image information of the scene of a fire is transmitted by an autonomous mobile robot and we also monitor the current position of the robot. Furthermore, the data around the place where the fire breaks out and the positions of the sensor nodes can be transmitted to a server via the multi-hop communication in the real time.

Dynamic Core Affinity for High-Performance I/O Devices Supporting Multiple Queues (다중 큐를 지원하는 고속 I/O 장치를 위한 동적 코어 친화도)

  • Cho, Joong-Yeon;Uhm, Junyong;Jin, Hyun-Wook;Jung, Sungin
    • Journal of KIISE
    • /
    • v.43 no.7
    • /
    • pp.736-743
    • /
    • 2016
  • Several studies have reported the impact of core affinity on the network I/O performance of multi-core systems. As the network bandwidth increases significantly, it becomes more important to determine the effective core affinity. Although a framework for dynamic core affinity that considers both network and disk I/O has been suggested, the multiple queues provided by high-speed I/O devices are not properly supported. In this paper, we extend the existing framework of dynamic core affinity to efficiently support the multiple queues of high-speed I/O devices, such as 40 Gigabit Ethernet and NVM Express. Our experimental results show that the extended framework can improve the HDFS file upload throughput by up to 32%, and can provide improved scalability in terms of the number of cores. In addition, we analyze the impact of the assignment policy of multiple I/O queues across a number of cores.

Size-dependent magneto-electro-elastic vibration analysis of FG saturated porous annular/ circular micro sandwich plates embedded with nano-composite face sheets subjected to multi-physical pre loads

  • Amir, Saeed;Arshid, Ehsan;Arani, Mohammad Reza Ghorbanpour
    • Smart Structures and Systems
    • /
    • v.23 no.5
    • /
    • pp.429-447
    • /
    • 2019
  • The present study analyzed free vibration of the three-layered micro annular/circular plate which its core and face sheets are made of saturated porous materials and FG-CNTRCs, respectively. The structure is subjected to magneto-electric fields and magneto-electro-mechanical pre loads. Mechanical properties of the porous core and also FG-CNTRC face sheets are varied through the thickness direction. Using dynamic Hamilton's principle, the motion equations based on MCS and FSD theories are derived and solved via GDQ as an efficient numerical method. Effect of different parameters such as pores distributions, porosity coefficient, pores compressibility, CNTs distribution, elastic foundation, multi-physical pre loads, small scale parameter and aspect ratio of the plate are investigated. The findings of this study can be useful for designing smart structures such as sensor and actuator.

Multi-communication layered HPL model and its application to GPU clusters

  • Kim, Young Woo;Oh, Myeong-Hoon;Park, Chan Yeol
    • ETRI Journal
    • /
    • v.43 no.3
    • /
    • pp.524-537
    • /
    • 2021
  • High-performance Linpack (HPL) is among the most popular benchmarks for evaluating the capabilities of computing systems and has been used as a standard to compare the performance of computing systems since the early 1980s. In the initial system-design stage, it is critical to estimate the capabilities of a system quickly and accurately. However, the original HPL mathematical model based on a single core and single communication layer yields varying accuracy for modern processors and accelerators comprising large numbers of cores. To reduce the performance-estimation gap between the HPL model and an actual system, we propose a mathematical model for multi-communication layered HPL. The effectiveness of the proposed model is evaluated by applying it to a GPU cluster and well-known systems. The results reveal performance differences of 1.1% on a single GPU. The GPU cluster and well-known large system show 5.5% and 4.1% differences on average, respectively. Compared to the original HPL model, the proposed multi-communication layered HPL model provides performance estimates within a few seconds and a smaller error range from the processor/accelerator level to the large system level.