• Title/Summary/Keyword: Multithreaded

Search Result 42, Processing Time 0.022 seconds

A Simulator for Performance Evaluation of Multithreaded Memory Allocation Operation in Multi-Core Environment (멀티코어 환경에서의 멀티스레드 기법을 이용한 메모리 할당 연산의 성능 평가를 위한 시뮬레이터)

  • Kim, Ho-Young;Huang, Dada;Han, Sang-Hyuck;Kim, Young-Kuk
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2012.06a
    • /
    • pp.245-247
    • /
    • 2012
  • 최근 멀티코어 프로세서의 활용이 대중화되고 있다. 멀티코어 시스템에서는 소프트웨어가 동시에 여러 코어를 사용하여 동작을 수행 할 때 성능 향상 효과를 얻을 수 있다. 즉, 하나의 소프트웨어가 여러 코어를 동시에 사용할 수 있는 멀티스레드 프로그래밍 기법을 사용할 때 성능을 높일 수 있다. 이러한 환경에서 효율적인 메모리 할당은 데스크톱, 서버 및 과학 등과 같은 응용에 매우 중요하다. 하지만, 동적으로 메모리를 할당하는 것은 메모리 할당 연산과 반환 연산 및 어떤 스레드가 다른 스레드의 힙 영역에 접근하는 것을 처리하기 위한 동기화 문제로 인한 오버헤드가 발생하여 성능에 영향을 끼치는 문제가 발생하게 된다. 따라서 이와 같은 환경에서 실제로 성능에 어느 정도 영향을 끼칠 것인가를 측정할 수 있는 도구가 필요하다. 이에 멀티코어 환경에서 멀티스레드 기법을 사용하여 메모리 할당 연산이 성능에 어떠한 영향을 끼치는지를 측정 및 평가할 수 있는 시뮬레이터인 MAES(Memory Allocation Evaluation Simulator)를 설계하고 구현한다.

Modern Concurrent Programming for Multicode Environment (동시성으로 작성하는 파이썬 크롤러)

  • Kim, Nam-gue;Kang, Young-Jin;Lee, HoonJae
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2017.05a
    • /
    • pp.430-433
    • /
    • 2017
  • Programming that ensures concurrency is essential for developers. If you do not use it, it is hard to expect the speed of the program to improve unless there is technical advancement of the hardware itself. Programming languages that support good concurrency code include go, elixir, and scala. Python, which supports a number of useful libraries, also supports concurrent programming like asyncio and coroutine. This paper defines the concepts of concurrency and parallelism, and explains what to note when writing concurrency programming in Python. The crawler that collects web data is written in concurrent code and compared with programs written in sequential, multithreaded code.

  • PDF

Exploiting Thread-Level Parallelism in Lockstep Execution by Partially Duplicating a Single Pipeline

  • Oh, Jaeg-Eun;Hwang, Seok-Joong;Nguyen, Huong Giang;Kim, A-Reum;Kim, Seon-Wook;Kim, Chul-Woo;Kim, Jong-Kook
    • ETRI Journal
    • /
    • v.30 no.4
    • /
    • pp.576-586
    • /
    • 2008
  • In most parallel loops of embedded applications, every iteration executes the exact same sequence of instructions while manipulating different data. This fact motivates a new compiler-hardware orchestrated execution framework in which all parallel threads share one fetch unit and one decode unit but have their own execution, memory, and write-back units. This resource sharing enables parallel threads to execute in lockstep with minimal hardware extension and compiler support. Our proposed architecture, called multithreaded lockstep execution processor (MLEP), is a compromise between the single-instruction multiple-data (SIMD) and symmetric multithreading/chip multiprocessor (SMT/CMP) solutions. The proposed approach is more favorable than a typical SIMD execution in terms of degree of parallelism, range of applicability, and code generation, and can save more power and chip area than the SMT/CMP approach without significant performance degradation. For the architecture verification, we extend a commercial 32-bit embedded core AE32000C and synthesize it on Xilinx FPGA. Compared to the original architecture, our approach is 13.5% faster with a 2-way MLEP and 33.7% faster with a 4-way MLEP in EEMBC benchmarks which are automatically parallelized by the Intel compiler.

  • PDF

New execution model for CAPE using multiple threads on multicore clusters

  • Do, Xuan Huyen;Ha, Viet Hai;Tran, Van Long;Renault, Eric
    • ETRI Journal
    • /
    • v.43 no.5
    • /
    • pp.825-834
    • /
    • 2021
  • Based on its simplicity and user-friendly characteristics, OpenMP has become the standard model for programming on shared-memory architectures. Checkpointing-aided parallel execution (CAPE) is an approach that utilizes the discontinuous incremental checkpointing technique (DICKPT) to translate and execute OpenMP programs on distributed-memory architectures automatically. Currently, CAPE implements the OpenMP execution model by utilizing the DICKPT to distribute parallel jobs and their data to slave machines, and then collects the results after executing these distributed jobs. Although this model has been proven to be effective in terms of performance and compatibility with OpenMP on distributed-memory systems, it cannot fully exploit the capabilities of multicore processors. This paper presents a novel execution model for CAPE that utilizes two levels of parallelism. In the proposed model, we add another level of parallelism in the form of multithreaded processes on slave machines with the goal of better exploiting their multicore CPUs. Initial experimental results presented near the end of this paper demonstrate that this model provides significantly enhanced CAPE performance.

An Enhancement of Services Selection in Web Services (웹 서비스에서 서비스의 선택의 개선)

  • Nasridinov, Aziz;Kim, Kyoungwook;Byun, Jeongyong
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2009.04a
    • /
    • pp.1307-1310
    • /
    • 2009
  • Web services provide the possibility of dynamically integrating distributed service components scattered over the Internet to fulfill sophisticated business demands. However due to today's wide variety of services offered to perform a specific task, it's essential that users are supported in the eventual selection of appropriate services. An example of web services for which selection of appropriate services will be crucial is Auto Repair Services. Selecting proper service from a variety of Auto Part Shops would be result of delivering high-quality service and minimizing Auto Repair Service customer's waiting time. Therefore, in this research to assist selecting proper service, we present Functional-Level Mediator and illustrate its usage in matching customer's and web service's goals. Five matching cases have been analyzed and results from experiment have been shown. Also, taking advantage of implementing multithreaded web services which reflects concurrent activity in the real world more naturally, we have significantly minimized customer's waiting time at Auto Repair Service.

MECHA: Multithreaded and Efficient Cryptographic Hardware Access (MECHA: 다중 스레드 및 효율적인 암호화 하드웨어 액세스)

  • Pratama Derry;Laksmono Agus Mahardika Ari;Iqbal Muhammad;Howon Kim
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.05a
    • /
    • pp.339-341
    • /
    • 2023
  • This paper presents a multithread and efficient cryptographic hardware access (MECHA) for efficient and fast cryptographic operations that eliminates the need for context switching. Utilizing a UNIX domain socket, MECHA manages multiple requests from multiple applications simultaneously, resulting in faster processing and improved efficiency. We comprise several key components, including the Server thread, Client thread, Transceiver thread, and a pair of Sender and Receiver queues. MECHA design is portable and can be used with any communication protocol, with experimental results demonstrating a 83% increase in the speed of concurrent cryptographic requests compared to conventional interface design. MECHA architecture has significant potential in the field of secure communication applications ranging from cloud computing to the IoT, offering a faster and more efficient solution for managing multiple cryptographic operation requests concurrently.

Multithreaded and Overlapped Systolic Array for Depthwise Separable Convolution (깊이별 분리 합성곱을 위한 다중 스레드 오버랩 시스톨릭 어레이)

  • Jongho Yoon;Seunggyu Lee;Seokhyeong Kang
    • Transactions on Semiconductor Engineering
    • /
    • v.2 no.1
    • /
    • pp.1-8
    • /
    • 2024
  • When processing depthwise separable convolution, low utilization of processing elements (PEs) is one of the challenges of systolic array (SA). In this study, we propose a new SA architecture to maximize throughput in depthwise convolution. Moreover, the proposed SA performs subsequent pointwise convolution on the idle PEs during depthwise convolution computation to increase the utilization. After the computation, we utilize unused PEs to boost the remaining pointwise convolution. Consequently, the proposed 128x128 SA achieves a 4.05x and 1.75x speed improvement and reduces the energy consumption by 66.7 % and 25.4 %, respectively, compared to the basic SA and RiSA in MobileNetV3.

Scheduling and Load Balancing Methods of Multithread Parallel Linear Solver of Finite Element Structural Analysis (유한요소 구조해석 다중쓰레드 병렬 선형해법의 스케쥴링 및 부하 조절 기법 연구)

  • Kim, Min Ki;Kim, Seung Jo
    • Journal of the Korean Society for Aeronautical & Space Sciences
    • /
    • v.42 no.5
    • /
    • pp.361-367
    • /
    • 2014
  • In this paper, task scheduling and load balancing methods of multifrontal solution methods of finite element structural analysis in a modern multicore machine are introduced. Many structural analysis problems have generally irregular grid and many kinds of properties and materials. These irregularities and heterogeneities lead to bottleneck of parallelization and cause idle time to analysis. Therefore, task scheduling and load balancing are desired to reduce inefficiency. Several kinds of multithreaded parallelization methods are presented and comparison between static and dynamic task scheduling are shown. To reduce the idle time caused by irregular partitioned subdomains, computational load balancing methods, Balancing all tasks and minmax task pairing balancing, are invented. Theoretical and actual elapsed time are shown and the reason of their performance gap are discussed.

Array Localization for Multithreaded Code Generation (다중스레드 코드 생성을 위한 배열 지역화)

  • Yang, Chang-Mo;Yu, Won-Hui
    • The Transactions of the Korea Information Processing Society
    • /
    • v.3 no.6
    • /
    • pp.1407-1417
    • /
    • 1996
  • In recent researches on thread partitioning algorithms break a thread at the long latency operation and merge threads to get the longer threads under the given constraints. Due to this limitation, even a program with little parallelism is partitioned into small-sized threads and context-swithings occur frequently. In the paper, we propose another method array localization about the array name, dependence distance(the difference of accessed element index from loop index), and the element usage that indicates whether element is used or defined. Using this information we can allocate array elements to the node where the corresponding loop activation is executed. By array localization, remote accesses to array elements can be replaced with local accesses to localized array elements. As a resuit,the boundaries of some threads are removed, programs can be partitioned into the larger threads and the number of context switchings reduced.

  • PDF

Verification for Multithreaded Java Code using Java Memory Model (자바 메모리 모델을 이용한 멀티 스레드 자바 코드 검증)

  • Lee, Min;Kwon, Gi-Hwon
    • The KIPS Transactions:PartD
    • /
    • v.15D no.1
    • /
    • pp.99-106
    • /
    • 2008
  • Recently developed compilers perform some optimizations in order to speed up the execution time of source program. These optimizations require the reordering of the sequence of program statements. This reordering does not give any problems in a single-threaded program. However, the reordering gives some significant errors in a multi-threaded program. State-of-the-art model checkers such as JavaPathfinder do not consider the reordering resulted in the optimization step in a compiler since they just consider a single memory model. In this paper, we develop a new verification tool to verify Java source program based on Java Memory Model. And our tool is capable of handling the reordering in verifying Java programs. As a result, our tool finds an error in the test program which is not revealed with the traditional model checker JavaPathFinder.