• Title/Summary/Keyword: Multi-core processor

Search Result 131, Processing Time 0.03 seconds

Investigation on TLB Miss Impact through TLB Lockdown in Multi-core Systems (멀티코어 시스템에서 TLB Lockdown에 의한 TLB Miss 영향 분석)

  • Song, Daeyoung;Park, Sihyeong;Kim, Hyungshin
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.17 no.1
    • /
    • pp.59-65
    • /
    • 2022
  • Virtual memory is used as the method to ensure the safety of the system through memory protection in the real-time system. TLB miss caused by using virtual memory makes the real-time system WCET more pessimistically. TLB lockdown can be applied as a method to improve this problem. However, processors with limited TLB lockdown entries, a selection criterion is needed to efficiently utilize the TLB lockdown entry. In this paper, the most frequently accessed virtual pages in the process are applied to the TLB lockdown by analyzing memory profiling. The results showed that micro data TLB miss stall cycle and main data TLB miss stall cycle of the processor decreased by at least 4.7% and up to 29.7%.

A design and implementation of transmit/receive model to speed up the transmission of large string-data sets in TCP/IP socket communication (TCP/IP 소켓통신에서 대용량 스트링 데이터의 전송 속도를 높이기 위한 송수신 모델 설계 및 구현)

  • Kang, Dong-Jo;Park, Hyun-Ju
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.17 no.4
    • /
    • pp.885-892
    • /
    • 2013
  • In the model Utilizing the TCP / IP socket communication to transmit and receive data, if the size of data is small and if data-transmission aren't frequently requested, the importance of communication speed between a server and a client isn't emphasized. But nowadays, it has emerged for large amounts of data transfer requests and frequent data transfer request. This paper propose the TCP/IP communication model that can be improved the data transfer rate in multi-core environment by changing the receiving structure of the client to receive large amounts of data and the transmission structure of the server to send large amounts of data.

Exploration of Optimal Multi-Core Processor Architecture for Physical Modeling of Plucked-String Instruments (현악기의 물리적 모델링을 위한 최적의 멀티코어 프로세서 아키텍처 탐색)

  • Kang, Myeong-Su;Choi, Ji-Won;Kim, Yong-Min;Kim, Jong-Myon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.30 no.5
    • /
    • pp.281-294
    • /
    • 2011
  • Physics-based sound synthesis usually requires high computational costs and this results in a restriction of its use in real-time applications. This motivates us to implement the sound synthesis algorithm of plucked-string instruments using multi-core processor architectures and determine the optimal processing element (PE) configuration for the target instruments. To determine the optimal PE configuration, we evaluate the impacts of a sample-per-processing element (SPE) ratio that is defined as the amount of sample data directly mapped to each PE on system performance and both area and energy efficiencies using architectural and workload simulations. For the acoustic guitar, the highest area and energy efficiencies are achieved at a SPE ratio of 5,513 and 2,756, respectively, for the synthesis of musical sounds sampled at 44.1 kHz. In the case of the classical guitar, the maximum area and energy efficiencies are achieved at a SPE ratio of 22,050 and 5,513, respectively. In addition, the synthetic sounds were very similar to original sounds in their spectra. Furthermore, we conducted MUSHRA subjective listening test with ten subjects including nine graduate students and one professor from the University of Ulsan, and the evaluation of the synthetic sounds was excellent.

Hardware Implementation of Motor Controller Based on Zynq EPP(Extensible Processing Platform) (Zynq EPP를 이용한 모터 제어기의 하드웨어 구현)

  • Moon, Yong-Seon;Lim, Seung-Woo;Lee, Young-Pil;Bae, Young-Chul
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.8 no.11
    • /
    • pp.1707-1712
    • /
    • 2013
  • In this paper, we implement a hardware for motor control based on FPGA + embedded processor using Zynq EPP which is All Programmable SoC in order to improve a structural problem of motion control based on such as DSP, MCU and FPGA previously. The implemented motor controller that is fused controller with advantage of FPGA and embedded processor. The signal processing part of high velocity motor control is performed by motor controller based on FPGA. A motion profile and kinematic calculation that are required algorithm process such as operation of a complicate decimal point has processed in an embedded processor based on dual core. As a result of a hardware implementation, it has an advantage that has can be realized an effect of distribution process in one chip. It has also an advantage that is able to organize as a multi-axis motor controller through adding the IP core of motor control implemented on FPGA.

Real-Time Power-Saving Scheduling Based on Genetic Algorithms in Multi-core Hybrid Memory Environments (멀티코어 이기종메모리 환경에서의 유전 알고리즘 기반 실시간 전력 절감 스케줄링)

  • Yoo, Suhyeon;Jo, Yewon;Cho, Kyung-Woon;Bahn, Hyokyung
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.20 no.1
    • /
    • pp.135-140
    • /
    • 2020
  • Recently, due to the rapid diffusion of intelligent systems and IoT technologies, power saving techniques in real-time embedded systems has become important. In this paper, we propose P-GA (Parallel Genetic Algorithm), a scheduling algorithm aims at reducing the power consumption of real-time systems in multi-core hybrid memory environments. P-GA improves the Proportional-Fairness (PF) algorithm devised for multi-core environments by combining the dynamic voltage/frequency scaling of the processor with the nonvolatile memory technologies. Specifically, P-GA applies genetic algorithms for optimizing the voltage and frequency modes of processors and the memory types, thereby minimizing the power consumptions of the task set. Simulation experiments show that the power consumption of P-GA is reduced by 2.85 times compared to the conventional schemes.

A Study on the Scalability of Multi-core-PC Cluster for Seismic Design of Reinforced-Concrete Structures based on Genetic Algorithm (유전알고리즘 기반 콘크리트 구조물의 최적화 설계를 위한 멀티코어 퍼스널 컴퓨터 클러스터의 확장 가능성 연구)

  • Park, Keunhyoung;Choi, Se Woon;Kim, Yousok;Park, Hyo Seon
    • Journal of the Computational Structural Engineering Institute of Korea
    • /
    • v.26 no.4
    • /
    • pp.275-281
    • /
    • 2013
  • In this paper, determination of the scalability of the cluster composed common personal computer was performed when optimization of reinforced concrete structure using genetic algorithm. The goal of this research is watching the potential of multi-core-PC cluster for optimization of seismic design of reinforced-concrete structures. By increasing the number of core-processer of cluster, decreasing of computation time per each generation of genetic algorithm was observed. After classifying the components in singular personal computer, the estimation of the expected bottle-neck phenomenon and comparison with wall-clock time and Amdahl's law equation was performed. So we could obseved the scalability of the cluster appear complex tendency. For separating the bottle-neck phenomenon of physical and algorithm, the different size of population was selected for genetic algorithm cases. When using 64 core-processor, the efficiency of cluster is low as 31.2% compared with Amdahl's law efficiency.

A Low Power Design of H.264 Codec Based on Hardware and Software Co-design

  • Park, Seong-Mo;Lee, Suk-Ho;Shin, Kyoung-Seon;Lee, Jae-Jin;Chung, Moo-Kyoung;Lee, Jun-Young;Eum, Nak-Woong
    • Information and Communications Magazine
    • /
    • v.25 no.12
    • /
    • pp.10-18
    • /
    • 2008
  • In this paper, we present a low-power design of H.264 codec based on dedicated hardware and software solution on EMP(ETRI Multi-core platform). The dedicated hardware scheme has reducing computation using motion estimation skip and reducing memory access for motion estimation. The design reduces data transfer load to 66% compared to conventional method. The gate count of H.264 encoder and the performance is about 455k and 43Mhz@30fps with D1(720x480) for H.264 encoder. The software solution is with ASIP(Application Specific Instruction Processor) that it is SIMD(Single Instruction Multiple Data), Dual Issue VLIW(Very Long Instruction Word) core, specified register file for SIMD, internal memory and data memory access for memory controller, 6 step pipeline, and 32 bits bus width. Performance and gate count is 400MHz@30fps with CIF(Common Intermediated format) and about 100k per core for H.264 decoder.

SoC including 2M-byte on-chip SRAM and analog circuits for Miniaturization and low power consumption (소형화와 저전력화를 위해 2M-byte on-chip SRAM과 아날로그 회로를 포함하는 SoC)

  • Park, Sung Hoon;Kim, Ju Eon;Baek, Joon Hyun
    • Journal of IKEEE
    • /
    • v.21 no.3
    • /
    • pp.260-263
    • /
    • 2017
  • Based on several CPU cores, an SoC including ADCs, DC-DC converter and 2M-byte SRAM is proposed in this paper. The CPU core consists of a 12-bit MENSA, a 32-bit Symmetric multi-core processor, as well as 16-bit CDSP. To eliminate the external SDRAM memory, internal 2M-byte SRAM is implemented. Because the SRAM normally occupies huge area, the parasitic components reduce the speed of SoC. In this work, the SRAM blocks are divided into small pieces to reduce the parasitic components. The proposed SoC is developed in a standard 55nm CMOS process and the speed of SoC is 200MHz.

A Study on Machine Learning Compiler and Modulo Scheduler (머신러닝 컴파일러와 모듈로 스케쥴러에 관한 연구)

  • Doosan Cho
    • Journal of the Korean Society of Industry Convergence
    • /
    • v.27 no.1
    • /
    • pp.87-95
    • /
    • 2024
  • This study is on modulo scheduling algorithms for multicore processor in machine learning applications. Machine learning algorithms are designed to perform a large amount of operations such as vectors and matrices in order to quickly process large amounts of data stream. To support such large amounts of computations, processor architectures to support applications such as artificial intelligence, neural networks, and machine learning are designed in the form of parallel processing such as multicore. To effectively utilize these multi-core hardware resources, various compiler techniques are being used and studied. In this study, among these compiler techniques, we analyzed the modular scheduler, which is especially important in one core's computation pipeline. This paper looked at and compared the iterative modular scheduler and the swing modular scheduler, which are the most widely used and studied. As a result, both schedulers provided similar performance results, and when measuring register pressure as an indicator, it was confirmed that the swing modulo scheduler provided slightly better performance. In this study, a technique that divides recurrence edge is proposed to improve the minimum initiation interval of the modulo schedulers.

A SoC based on the Gaussian Pyramid (GP) for Embedded image Applications (임베디드 영상 응용을 위한 GP_SoC)

  • Lee, Bong-Kyu
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.59 no.3
    • /
    • pp.664-668
    • /
    • 2010
  • This paper presents a System-On-a-chip (SoC) for embedded image processing and pattern recognition applications that need Gaussian Pyramid structure. The system is fully implemented into Field-Programmable Gate Array (FPGA) based on the prototyping platform. The SoC consists of embedded processor core and a hardware accelerator for Gaussian Pyramid construction. The performance of the implementation is benchmarked against software implementations on different platforms.