Search | Korea Science

A Heuristic Load Balancing Algorithm by using Iterative Load Transfer (반복적인 부하 이동에 의한 휴리스틱 부하 평형 알고리즘)

Song Eui-Seok;Oh Ha-Ryung;Seong Yeong-Rak
- The KIPS Transactions:PartA
- /
- v.11A no.7 s.91
- /
- pp.499-510
- /
- 2004
This paper proposes a heuristic load balancing algorithm for multiprocessor systems. The algorithm minimizes the number of idle links to distribute load traffic and reduces its communication cost. Each processor iteratively tries to transfer unit load to/from every neighbor processors. However, real load transfer is collectively done after complete load traffic calculation to minimize useless traffic. The proposed algorithm can be employed in various interconnection topologies with slight modifications. In this paper, it is applied to both hypercube and mesh environments. For performance evaluation, simulation studies are performed. The performance of proposed algorithm is compared to those of two well-known algorithms. The results show that the proposed algorithm always balances the loads perfectly. Furthermore, it reduces the communication costs by $70{\%}{\~}90{\%}$ in the hypercube ; and it reduces the cost by $\75{\%}$ in the mesh, compared to existing algorithms.
https://doi.org/10.3745/KIPSTA.2004.11A.7.499 인용 PDF KSCI

A architecture for parallel rendering processor with by effective memory organization (효과적인 메모리 구조를 갖는 병렬 렌더링 프로세서 구조)

Kim, Kyung-Su;Yoon, Duk-Ki;Kim, Il-San;Park, Woo-Chan
- Journal of Korea Game Society
- /
- v.5 no.3
- /
- pp.39-47
- /
- 2005
Current rendering processors are organized mainly to process a triangle as fast as possible and recently parallel 3D rendering processors, which can process multiple triangles in parallel with multiple rasterizers, begin to appear. For high performance in processing triangles, it is desirable for each rasterizer have its own local pixel cache. However, the consistency problem may occur in accessing the data at the same address simulaneously by more than one rasterizer. In this paper, we propose a parallel rendering processor architecture resolving such consistency problem effectively. Moreover, the proposed architecture reduces the latency due to a pixel cache miss significantly. The experimental results show that proposed architecture achieves almost linear speedup at best case even in sixteen rasterizer
PDF

frequency Domain processor nor ADSL G.LITE Modem (ADSL G.LITE모뎀을 위한 주파수 영역 프로세서의 설계)

고우석;기준석;고태호;윤대희
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.26 no.12C
- /
- pp.233-239
- /
- 2001
Among the operations in frequency domain for ADSL G.LITE Modem to perform, FFT and FEQ are most computation-intensive part, of which many researches have been focused on the efficient implementation. Previous papers suggested hardwares suitable for ADSL G.DMT system, which is not feasible for simple G.LITE system. The analysis of frequency domain operations and computational efficiency according to the allocation of hardware resources is performed in this paper. The suggested processor has the structure of one real multiplier and two real adders connected in parallel, which can perform the operations efficiently through the pipeline- and/or parallel-type job scheduling. The suggested processor uses less hardware resources than Kiss\`s ALU structure or FFT/IFFT processor suggested by Wang, so the suggested one is more suitable for G.LITE system than previous works.
PDF

공유 메모리를 갖는 다중 프로세서 컴퓨터 시스팀의 설계 및 성능분석

Choe, Chang-Yeol;Park, Byeong-Gwan;Park, Seong-Gyu;O, Gil-Rok
- ETRI Journal
- /
- v.10 no.3
- /
- pp.83-91
- /
- 1988
This paper describes the architecture and the performance analysis of a multiprocessor system, which is based on the shared memory and single system bus. The system bus provides the pended protocol for the multiprocessor environment. Analyzing the processor utilization, address/data bus utilization and memory conflicts, we use a simulation model. The hit ratio of private cache memory is a major factor on the linear increase of the performance of a shared memory based multiprocessor system.
PDF

Analysis of Low Internal Bus Operation Frequency on the System Performance in Embedded Processor Based High-Performance Systems (내장 프로세서 기반 고성능 시스템에서의 내부 버스 병목에 의한 시스템 성능 영향 분석)

Lim, Hong-Yeol;Park, Gi-Ho
- Proceedings of the Korean Information Science Society Conference
- /
- 2011.06d
- /
- pp.24-27
- /
- 2011
최근 스마트 폰 등 모바일 기기의 폭발적인 성장에 의해 내장 프로세서인 ARM 프로세서 기반 기기들이 활발히 개발되어 사용되고 있다. 이에 따라 상대적으로 저성능, 저 전력화에 치중하였던 내장 프로세서도 고성능화를 위한 고속 동작 및 멀티코어 프로세서를 개발하여 사용하게 되었으며, 메모리 동작 속도 역시 빠르게 발전하고 있다. 특히 모바일 기기 등에 사용 되는 저전력 메모리인 LPDDR2 소자 등의 개발에 따라 빠른 동작 속도를 가지도록 개발되고 있다. 그러나 시스템 온 칩(SoC, System on Chip) 형태로 제작되는 ARM 프로세서 기반의 SoC는 다양한 하드웨어 가속기 등을 함께 내장하고 있고, 저 전력화를 위한 버스 구조 등에 의하여 온 칩 버스의 속도 향상이 고성능 범용 시스템에 비하여 낮은 수준이다. 본 연구에서는 이러한 점을 고려하여, 프로세서 코어와 메모리 소자의 동작 속도 향상에 의하여 얻을 수 있는 성능 향상과, 상대적으로 낮은 버스 동작 속도에 의하여 저하되는 성능의 정도를 분석하고 이를 극복하기 위한 방안을 검토하였다.

Design of Low-complexity FFT Processor for Multi-mode Radar Signal Processing (멀티모드 레이다 신호처리를 위한 저복잡도 FFT 프로세서 설계)

Park, Yerim;Jung, Yongchul;Jung, Yunho
- Journal of Advanced Navigation Technology
- /
- v.24 no.2
- /
- pp.85-91
- /
- 2020
Recently, a multi-mode radar system was designed for efficient operation of unmanned aerial vehicles (UAVs) in various environments, which has the advantage of being able to integrate and utilize methods of the pulse Doppler (PD) radar and the frequency modulated continuous wave (FMCW) radar. For the range detection part of the multi-mode radar signal processor (RSP), the hardware structure using the FFT processor and the IFFT processor is required to be designed in a way that improves efficiency on the area side. In addition, given the radar application environment that requires a variety of distance resolutions, FFT processors need to support variable-length operations. In this paper, the FFT processor and IFFT processor in multi-mode RSP range estimation are designed and proposed as hardware for a single FFT processor that supports variable length operation of 16-1024 points. The proposed FFT processor designed in hardware description language (HDL) and can be implemented with 7,452 logic elements and 5,116 registers.
https://doi.org/10.12673/jant.2020.24.2.85 인용 PDF KSCI

The Domain Separation Mechanism of the Intel$\Intel^{textregistered}$ Microprocessor (인텔$\Intel^{textregistered}$ 마이크로 프로세서의 영역분리 메커니즘)

성윤기;이은경;최용준
- Proceedings of the Korea Institutes of Information Security and Cryptology Conference
- /
- 2002.11a
- /
- pp.11-15
- /
- 2002
현재 공통평가기준을 이용하여 작성된 대부분의 파이어월과 VPN, 스마트 카드의 보호 프로파일의 기능요구사항에 영역분리기능이 포함되어 있다. 공통평가기준의 영역분리기능은 미국 국방성 운영체제 평가 기준인 TCSEC이 요구하는 보증수단에서 유래하였다. 8086프로세서는 리얼 모드라는 세그멘테이션 메커니즘을 처음으로 이용하여 향상된 메모리 주소관리를 제공하고 있으며, 80$\times$86은 리얼 모드이외에 보호모드를 제공하여 시스템 영역과 응용프로그램영역을 분리시킬 수 있는 메커니즘을 제공한다. 인텔 80$\times$86 프로세서의 구조적인 발전을 이용하여 구현된 Trusted OS는 링 상태(ring state)라고 알려져 있는 영역 분리기능을 제공하여 시스템의 영역을 응용 프로그램 영역으로부터 보호하는 메커니즘을 구현하고 보증하고 있다. 단 논문에서는 인텔 마이크로 프로세서 8086과 80$\times$86의 구조와 메모리 관리방법을 고찰하여 8086과 비교한 80$\times$86의 발전된 보호모드 메커니즘을 연구하여 시스템영역을 보호할 수 있는 영역분리 메커니즘을 연구하였다.
PDF

2K/4K/8K-Point FFT Processor Based on the CORDIC Algorithm for DVB-T (CORDIC 알고리듬에 기반한 DVB-T용 2K/4K/8K-Point FFT 프로세서)

박상윤;조남익
- Proceedings of the IEEK Conference
- /
- 2001.09a
- /
- pp.261-264
- /
- 2001
본 논문에서는 OFDM 시스템용 복조기의 구현에 가장 핵심적인 소자인 2K/4K/8K-point FFT 프로세서를 제안하였다. 구현된 프로세서는 30MHz 시스템 클럭에 서 8,192개의 복소 입력 샘플을 273㎲에 2,048개의 복소 입력 샘플을 68.26㎲에 수행함으로써 OFDM방송에서 요구하는 심볼 fp이트의 데이터를 처리할 수 있다. 기본 구조는 1차원 DFT를 작은 크기의 2차원 DFT로 변환할 수 있는 쿨리-투키 알고리듬을 적용하였으며 다차원 DFT 변환에 적합한 전치 메모리와 셔플 메모리를 사용하였다. 복소 곱셈기는 기존의 방법보다 더 효율적인 메모리 구조를 갖는 CORDIC 프로세서를 사용하였으며 제안하는 트위들팩터 발생 방법은 트위들팩터를 저장하기 위한 ROM의 크기를 효과적으로 줄일 수 있다.
PDF

Implementation and Performance Evaluation of Task Creation/Assignment Algorithms in Parallel Spatial Join using R-tree (R-tree를 이용한 병렬공간 조인의 태스크 생성/할당 알고리즘의 구현 및 성능평가)

서영덕;김진덕;홍봉희
- Proceedings of the Korean Information Science Society Conference
- /
- 1998.10b
- /
- pp.111-113
- /
- 1998
공간조인은 지리정보 시스템에서 공간분석을 위한 주요 연산중의 하나이다. 이러한 공간조인은 대상이 되는 공간 객체의 수가 증가함에 따라 연산시간이 지수적으로 증가하는 특징을 가지고 있다. 그래서 대규모 공간 데이터에 다한 공간 연산시간을 줄이기 위한 처리기법이 연구되고 있다. 그렇지만, 공유 디스크 구조에서 다중 프로세서의 디스크 동시 접근으로 인한 병목현상을 완화하고, 프로세서간의 공유 디스크 구조에서 다중 프로세서의 디스크 동시 접근으로 인한 병목현상을 완화하고, 프로세서간의 메시지 전달을 최소화하기 위한 태스크 생성방법, 태스크 할당방법에 관한 구체적인 연구가 없었다. 그래서 우선 병렬 공간 조인의 성능저하 요인을 분석하고, 이에 대한 성능 향상방안을 제시한다. 구체적으로 디스크 접근 시간을 줄이기 위한 객체 캐쉬 방법과 시공간 지역성을 이용한 태스크 생성 및 할당방법을 제시한다. 그리고 제안한 방법들에 대해 실험평가를 통해 최대 7.2배의 성능증가를 획득할 수 있음을 보여준다.

Construction of an Automatic Generation System of Embedded Processor Cores (임베디드 프로세서 코어 자동생성 시스템의 구축)

Cho Jae-Bum;You Yong-Ho;Hwang Sun-Young
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.30 no.6A
- /
- pp.526-534
- /
- 2005
This paper presents the structure and function of the system which automatically generates embedded processor cores using the SMDL. Accepting processor description in the SDML, the proposed system generates the processor core, consisting of the pipelined datapath and memory modules together with their control unit. The generated cores support muti-cycle instructions for proper handling of memory accesses, and resolve pipeline hazards encountered in the pipelined processors. Experimental results show the functional accuracy of the generated cores.
PDF KSCI

Search Result 1,042, Processing Time 0.027 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)