• Title/Summary/Keyword: multi-core architecture

Search Result 158, Processing Time 0.023 seconds

Numerical Analysis of Welding Residual Stresses for Ultra-thick Plate of EH40 TM and API 2W Gr.50 Steel Joined by Flux Core Arc Welding (EH40과 API2W강재의 극 후판재 다층 FCAW 버트 접합부 잔류응력해석)

  • Hwang, Se-Yun;Lee, Jang-Hyun;Yang, Young-Sik;Lee, Sung-Je;Kim, Byung-Jong
    • Journal of Welding and Joining
    • /
    • v.28 no.3
    • /
    • pp.65-72
    • /
    • 2010
  • Some structural members of large-scale marine vessels such as large-scale offshore structures and very large container ships are assembled by very thick plates of which thickness exceeds 60mm. Also, high-tensile steels have been selected to meet the required structural strength and fatigue strength. Generally, multi-pass welding method such as FCA(Flux-Core Arc) welding has been used to join the thick plates. Considering the welding residual stresses, fatigue strength of the welded joints of thick plates should be assured since the residual stress influences the fatigue strength. This paper presents a numerical procedure to investigate the residual stress of structure joined by multi-pass FCA welding so that it can be incorporated into the fatigue strength assessment considering the effect of welding residual stress. The residual stress distribution is also measured by X-Ray diffraction method. The residual stress obtained by the computational model also has been compared with that of experiment. The results of FEA are in very good agreement with the experimental measurements.

Accelerating 2D DCT in Multi-core and Many-core Environments (멀티코어와 매니코어 환경에서의 2 차원 DCT 가속)

  • Hong, Jin-Gun;Jung, Sung-Wook;Kim, Cheong-Ghil;Burgstaller, Bernd
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2011.04a
    • /
    • pp.250-253
    • /
    • 2011
  • Chip manufacture nowadays turned their attention from accelerating uniprocessors to integrating multiple cores on a chip. Moreover desktop graphic hardware is now starting to support general purpose computation. Desktop users are able to use multi-core CPU and GPU as a high performance computing resources these days. However exploiting parallel computing resources are still challenging because of lack of higher programming abstraction for parallel programming. The 2-dimensional discrete cosine transform (2D-DCT) algorithms are most computational intensive part of JPEG encoding. There are many fast 2D-DCT algorithms already studied. We implemented several algorithms and estimated its runtime on multi-core CPU and GPU environments. Experiments show that data parallelism can be fully exploited on CPU and GPU architecture. We expect parallelized DCT bring performance benefit towards its applications such as JPEG and MPEG.

Optimal Design Space Exploration of Multi-core Architecture for Real-time Lane Detection Algorithm (실시간 차선인식 알고리즘을 위한 최적의 멀티코어 아키텍처 디자인 공간 탐색)

  • Jeong, Inkyu;Kim, Jongmyon
    • Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology
    • /
    • v.7 no.3
    • /
    • pp.339-349
    • /
    • 2017
  • This paper proposes a four-stage algorithm for detecting lanes on a driving car. In the first stage, it extracts region of interests in an image. In the second stage, it employs a median filter to remove noise. In the third stage, a binary algorithm is used to classify two classes of backgrond and foreground of an input image. Finally, an image erosion algorithm is utilized to obtain clear lanes by removing noises and edges remained after the binary process. However, the proposed lane detection algorithm requires high computational time. To address this issue, this paper presents a parallel implementation of a real-time line detection algorithm on a multi-core architecture. In addition, we implement and simulate 8 different processing element (PE) architectures to select an optimal PE architecture for the target application. Experimental results indicate that 40×40 PE architecture show the best performance, energy efficiency and area efficiency.

Collaborative Streamlined On-Chip Software Architecture on Heterogenous Multi-Cores for Low-Power Reactive Control in Automotive Embedded Processors (차량용 임베디드 프로세서에서 저전력 반응적 제어를 위한 이기종 멀티코어 협력적 스트리밍 온-칩 소프트웨어 구조)

  • Jisu, Kwon;Daejin, Park
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.17 no.6
    • /
    • pp.375-382
    • /
    • 2022
  • This paper proposes a multi-core cooperative computing structure considering the heterogeneous features of automotive embedded on-chip software. The automotive embedded software has the heterogeneous execution flow properties for various hardware drives. Software developed with a homogeneous execution flow without considering these properties will incur inefficient overhead due to core latency and load. The proposed method was evaluated on an target board on which a automotive MCU (micro-controller unit) with built-in multi-cores was mounted. We demonstrate an overhead reduction when software including common embedded system tasks, such as ADC sampling, DSP operations, and communication interfaces, are implemented in a heterogeneous execution flow. When we used the proposed method, embedded software was able to take advantage of idle states that occur between heterogeneous tasks to make efficient use of the resources on the board. As a result of the experiments, the power consumption of the board decreased by 42.11% compared to the baseline. Furthermore, the time required to process the same amount of sampling data was reduced by 27.09%. Experimental results validate the efficiency of the proposed multi-core cooperative heterogeneous embedded software execution technique.

Performance Improvement of a Real-time Traffic Identification System on a Multi-core CPU Environment (멀티 코어 환경에서 실시간 트래픽 분석 시스템 처리속도 향상)

  • Yoon, Sung-Ho;Park, Jun-Sang;Kim, Myung-Sup
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.37 no.5B
    • /
    • pp.348-356
    • /
    • 2012
  • The application traffic analysis is getting more and more challenging due to the huge amount of traffic from high-speed network link and variety of applications running on wired and wireless Internet devices. Multi-level combination of various analysis methods is desired to achieve high completeness and accuracy of analysis results for a real-time analysis system, while requires much of processing burden on the contrary. This paper proposes a novel architecture for a real-time traffic analysis system which improves the processing performance on multi-core CPU environment. The main contribution of the proposed architecture is an efficient parallel processing mechanism with multiple threads of various analysis methods. The feasibility of the proposed architecture was proved by implementing and deploying it on our campus network.

Implementation of low power BSPE Core for deep learning hardware accelerators (딥러닝을 하드웨어 가속기를 위한 저전력 BSPE Core 구현)

  • Jo, Cheol-Won;Lee, Kwang-Yeob;Nam, Ki-Hun
    • Journal of IKEEE
    • /
    • v.24 no.3
    • /
    • pp.895-900
    • /
    • 2020
  • In this paper, BSPE replaced the existing multiplication algorithm that consumes a lot of power. Hardware resources are reduced by using a bit-serial multiplier, and variable integer data is used to reduce memory usage. In addition, MOA resource usage and power usage were reduced by applying LOA (Lower-part OR Approximation) to MOA (Multi Operand Adder) used to add partial sums. Therefore, compared to the existing MBS (Multiplication by Barrel Shifter), hardware resource reduction of 44% and power consumption of 42% were reduced. Also, we propose a hardware architecture design for BSPE Core.

Accelerating Group Fusion for Ligand-Based Virtual Screening on Multi-core and Many-core Platforms

  • Mohd-Hilmi, Mohd-Norhadri;Al-Laila, Marwah Haitham;Hassain Malim, Nurul Hashimah Ahamed
    • Journal of Information Processing Systems
    • /
    • v.12 no.4
    • /
    • pp.724-740
    • /
    • 2016
  • The performance issues of screening large database compounds and multiple query compounds in virtual screening highlight a common concern in Chemoinformatics applications. This study investigates these problems by choosing group fusion as a pilot model and presents efficient parallel solutions in parallel platforms, specifically, the multi-core architecture of CPU and many-core architecture of graphical processing unit (GPU). A study of sequential group fusion and a proposed design of parallel CUDA group fusion are presented in this paper. The design involves solving two important stages of group fusion, namely, similarity search and fusion (MAX rule), while addressing embarrassingly parallel and parallel reduction models. The sequential, optimized sequential and parallel OpenMP of group fusion were implemented and evaluated. The outcome of the analysis from these three different design approaches influenced the design of parallel CUDA version in order to optimize and achieve high computation intensity. The proposed parallel CUDA performed better than sequential and parallel OpenMP in terms of both execution time and speedup. The parallel CUDA was 5-10x faster than sequential and parallel OpenMP as both similarity search and fusion MAX stages had been CUDA-optimized.

Implementation of Ethernet-Based High-Speed Data Communication for Multi-core DSP (멀티 코어 DSP를 위한 이더넷 기반 고속 데이터 통신 구현)

  • Nguyen, Dung Huy;Choi, Joon-Young
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.17 no.3
    • /
    • pp.185-190
    • /
    • 2022
  • We propose a high speed data communication method for motor drive systems with fast control cycle in order to collect state variables of motor control without degrading control performance. Ethernet is chosen for communication device, and multi-core DSP architecture is exploited for communication processing load distribution. The communication program including network protocol stack and motor control program are assigned to two separate cores, and data between two cores are exchanged using interrupt-based inter-process communication mechanism, which enables to achieve a high-speed communication performance without degrading the motor control performance. The performance of developed communication method is demonstrated by real experiments using TCP, UDP and Raw Socket protocols in an experimental setup consisting of TI's TMS320F28388D motor control card and MS Windows PC.

Analysis on the Performance and Temperature of the 3D Quad-core Processor according to Cache Organization (캐쉬 구성에 따른 3차원 쿼드코어 프로세서의 성능 및 온도 분석)

  • Son, Dong-Oh;Ahn, Jin-Woo;Choi, Hong-Jun;Kim, Jong-Myon;Kim, Cheol-Hong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.17 no.6
    • /
    • pp.1-11
    • /
    • 2012
  • As the process technology scales down, multi-core processors cause serious problems such as increased interconnection delay, high power consumption and thermal problems. To solve the problems in 2D multi-core processors, researchers have focused on the 3D multi-core processor architecture. Compared to the 2D multi-core processor, the 3D multi-core processor decreases interconnection delay by reducing wire length significantly, since each core on different layers is connected using vertical through-silicon via(TSV). However, the power density in the 3D multi-core processor is increased dramatically compared to that in the 2D multi-core processor, because multiple cores are stacked vertically. Unfortunately, increased power density causes thermal problems, resulting in high cooling cost, negative impact on the reliability. Therefore, temperature should be considered together with performance in designing 3D multi-core processors. In this work, we analyze the temperature of the cache in quad-core processors varying cache organization. Then, we propose the low-temperature cache organization to overcome the thermal problems. Our evaluation shows that peak temperature of the instruction cache is lower than threshold. The peak temperature of the data cache is higher than threshold when the cache is composed of many ways. According to the results, our proposed cache organization not only efficiently reduces the peak temperature but also reduces the performance degradation for 3D quad-core processors.

Bi-directional Bus Architecture Suitable to Multitasking in MPEG System (MPEG 시스템용 다중 작업에 적합한 양방향 버스 구조)

  • Jun Chi-hoon;Yeon Gyu-sung;Hwang Tae-jin;Wee Jae-Kyung
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.42 no.4 s.334
    • /
    • pp.9-18
    • /
    • 2005
  • This paper proposes the novel synchronous segmented bus architecture that has the pipeline bus architecture based on OCP(open core protocol) and the memory-oriented bus for MPEG system. The proposed architecture has bus architectures that support the memory interface for image data processing of MPEG system. Also it has the segmented hi-directional multiple bus architecture for multitasking processing by using multi -masters/multi - slave. In the scheme address of masters and slaves are fixed so that they are arranged for the location of IP cores according to operational characteristics of the system for efficient data processing. Also the bus architecture adopts synchronous segmented bus architecture for reuse of IP's and architecture or developed chips. This feature is suitable to the high performance and low power multimedia SoC systum by inherent characteristics of multitasking operation and segmented bus. Proposed bus architecture can have up to 3.7 times improvement in the effective bandwidth md up to 4 times reduction in the communication latency.