Search | Korea Science

Design of a Hybrid Data Value Predictor with Dynamic Classification Capability in Superscalar Processors (슈퍼스칼라 프로세서에서 동적 분류 능력을 갖는 혼합형 데이타 값 예측기의 설계)

Park, Hee-Ryong;Lee, Sang-Jeong
- Journal of KIISE:Computer Systems and Theory
- /
- v.27 no.8
- /
- pp.741-751
- /
- 2000
To achieve high performance by exploiting instruction level parallelism aggressively in superscalar processors, it is necessary to overcome the limitation imposed by control dependences and data dependences which prevent instructions from executing parallel. Value prediction is a technique that breaks data dependences by predicting the outcome of an instruction and executes speculatively its data dependent instruction based on the predicted outcome. In this paper, a hybrid value prediction scheme with dynamic classification mechanism is proposed. We design a hybrid predictor by combining the last predictor, a stride predictor and a two-level predictor. The choice of a predictor for each instruction is determined by a dynamic classification mechanism. This makes each predictor utilized more efficiently than the hybrid predictor without dynamic classification mechanism. To show performance improvements of our scheme, we simulate the SPECint95 benchmark set by using execution-driven simulator. The results show that our scheme effect reduce of 45% hardware cost and 16% prediction accuracy improvements comparing with the conventional hybrid prediction scheme and two-level value prediction scheme.
PDF

A VLSI Array Processor Architecture for High-Speed Processing of Full Search Block Matching Algorithm (완전탐색 블럭정합 알고리즘의 고속 처리를 위한 VLSI 어레이 프로세서의 구조)

이수진;우종호
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.27 no.4A
- /
- pp.364-370
- /
- 2002
In this paper, we propose a VLSI array architecture for high speed processing of FBMA. First of all, the sequential FBMA is transformed into a single assignment code by using the index space expansion, and then the dependance graph is obtained from it. The two dimensional VLSI array is derived by projecting the dependance graph along the optimal direction. Since the candidate blocks in the search range are overlapped with columns as well as rows, the processing elements of the VLSI array are designed to reuse the overlapped data. As the results, the number of data inputs is reduced so that the processing performance is improved. The proposed VLSI array has (N$^2$+1)${\times}$(2p+1) processing elements and (N+2p) input ports where N is the block size and p is the maximum search range. The computation time of the rat reference block is (N$^2$+2(p+1)N+6p), and the block pipeline period is (3N+4p-1).
PDF KSCI

A Differentiated Web Service System through Kernel-Level Realtime Scheduling and Load Balancing (커널 수준 실시간 스케줄링과 부하 분산을 통한 차별화된 웹 서비스 시스템)

이명섭;박창현
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.28 no.6B
- /
- pp.533-543
- /
- 2003
Recently, according to the rapid increase of Web users, various kinds of Web applications have been being developed. Hence, Web QoS(Quality of Service) becomes a critical issue in the Web services, such as e-commerce, Web hosting, etc. Nevertheless, most Web servers currently process various requests from Web users on a FIFO basis, which can not provide differentiated QoS. This paper presents two approaches to provide differentiated Web QoS. The first is the kernel-level approach, which is adding a real-time scheduling processor to the operating system kernel to maintain the priority of user requests determined by the scheduling processor of Web server. The second is the load-balancing approach, which uses If-level masquerading and tunneling technology to improve reliability and response speed upon user requests.
PDF KSCI

Core-aware Cache Replacement Policy for Reconfigurable Last Level Cache (재구성 가능한 라스트 레벨 캐쉬 구조를 위한 코어 인지 캐쉬 교체 기법)

Son, Dong-Oh;Choi, Hong-Jun;Kim, Jong-Myon;Kim, Cheol-Hong
- Journal of the Korea Society of Computer and Information
- /
- v.18 no.11
- /
- pp.1-12
- /
- 2013
In multi-core processors, Last Level Cache(LLC) can reduce the speed gap between the memory and the core. For this reason, LLC has big impact on the performance of processors. LLC is composed of shared cache and private cache. In computer architecture community, most researchers have mainly focused on the management techniques for shared cache, while management techniques for private cache have not been widely researched. In conventional private LLC, memory is statically assigned to each core, resulting in serious performance degradation when the workloads are not fairly distributed. To overcome this problem, this paper proposes the replacement policy for managing private cache of LLC efficiently. As proposed core-aware cache replacement policy can reconfigure LLC dynamically, hit rate of LLC is increases drastically. Moreover, proposed policy uses 2-bit saturating counters to improve the performance. According to our simulation results, the proposed method can improve hit rates by 9.23% and reduce the access time by 12.85% compared to the conventional method.
https://doi.org/10.9708/jksci.2013.18.11.001 인용 PDF KSCI

Block-based Adaptive Bit Allocation for Reference Memory Reduction (효율적인 참조 메모리 사용을 위한 블록기반 적응적 비트할당 알고리즘)

Park, Sea-Nae;Nam, Jung-Hak;Sim, Dong-Gy;Joo, Young-Hun;Kim, Yong-Serk;Kim, Hyun-Mun
- Journal of the Institute of Electronics Engineers of Korea SP
- /
- v.46 no.3
- /
- pp.68-74
- /
- 2009
In this paper, we propose an effective memory reduction algorithm to reduce the amount of reference frame buffer and memory bandwidth in video encoder and decoder. In general video codecs, decoded previous frames should be stored and referred to reduce temporal redundancy. Recently, reference frames are recompressed for memory efficiency and bandwidth reduction between a main processor and external memory. However, these algorithms could hurt coding efficiency. Several algorithms have been proposed to reduce the amount of reference memory with minimum quality degradation. They still suffer from quality degradation with fixed-bit allocation. In this paper, we propose an adaptive block-based min-max quantization that considers local characteristics of image. In the proposed algorithm, basic process unit is $8{\times}8$ for memory alignment and apply an adaptive quantization to each $4{\times}4$ block for minimizing quality degradation. We found that the proposed algorithm can obtain around 1.7% BD-bitrate gain and 0.03dB BD-PSNR gain, compared with the conventional fixed-bit min-max algorithm with 37.5% memory saving.
PDF KSCI

Differential Multicast in Switch-Based Irregular Topology Network (스위치 기반의 비정규적 네트워크에서의 차별적인 다중 전송)

Roh, Byoun-Kwon;Kim, Sung-Chun
- Journal of KIISE:Computer Systems and Theory
- /
- v.29 no.7
- /
- pp.394-400
- /
- 2002
Networks of Workstations(NOWs), that has features of flexibility and scalability, recently has emerged as an inexpensive alternative to massively parallel multicomputers. However it is not easier to perform deadlock-free multicast than regular topologies like mash or hypercube. Single phase differential multicast(SPDM) is a modified multicast algorithm with less burden of the root node. By applying quality of serviece(QoS), a specific node can have differentiated service and artificial change of message flow pattern is also available. As the results of performance evaluation experiments, SPDM has lower latency and lower packet concentration rate of the root node than the case of SPAM, and has ability to control network load distribution among switch nodes by controlling the assignment rate among nodes.
PDF KSCI

Design of the Entropy Processor using the Memory Stream Allocation for the Image Processing (메모리 스트림 할당 기법을 이용한 영상처리용 엔트로피 프로세서 설계)

Lee, Seon-Keun;Jeong, Woo-Yeol
- The Journal of the Korea institute of electronic communication sciences
- /
- v.7 no.5
- /
- pp.1017-1026
- /
- 2012
Due to acceleration of the IT industry and the environment for a variety of media in modern society, such as real-time video images 3D-TV is a very important issue. These high-quality live video is being applied to various fields such as CCTV footage has become an important performance parameters. However, these high quality images, even vulnerable because of shortcomings secure channel or by using various security algorithms attempt to get rid of these disadvantages are underway very active. These shortcomings, this study added extra security technologies to reduce the processing speed image processing itself, but by adding security features to transmit real-time processing and security measures for improving the present.
https://doi.org/10.13067/JKIECS.2012.7.5.1017 인용 PDF KSCI

Memory Management based Hybrid Transactional Memory Scheme for Efficiently Processing Transactions in Multi-core Environment (멀티코어 환경에서 효율적인 트랜잭션 처리를 위한 메모리 관리 기반 하이브리드 트랜잭셔널 메모리 기법)

Jang, Yeon-Woo;Kang, Moon-Hwan;Chang, Jae-Woo
- Proceedings of the Korea Information Processing Society Conference
- /
- 2017.04a
- /
- pp.795-798
- /
- 2017
최근 멀티코어 프로세서가 개발됨에 따라 병렬 프로그래밍은 멀티코어를 효과적으로 활용하기 위한 기법으로 그 중요성이 높아지고 있다. 트랜잭셔널 메모리는 처리 방식에 따라 HTM, STM, HyTM으로 구분되며, 최근 HTM 및 STM 결합한 HyTM 이 활발히 연구되고 있다. 그러나 기존의 HyTM 는 HTM과 STM의 동시성 제어를 위해 블룸필터를 사용하는 반면, 블룸필터의 자체적인 긍정 오류를 해결하지 못한다. 아울러, 트랜잭션 처리를 위한 메모리 할당/해제를 기존의 락 메커니즘을 사용하여 관리한다. 따라서 멀티코어 환경에서 스레드 수가 증가할수록 트랜잭션 처리 효율이 떨어진다. 본 논문에서는 멀티코어 환경에서 효율적인 트랜잭션 처리를 위한 메모리 관리 기반 하이브리드 트랜잭셔널 메모리 기법을 제안한다. 제안하는 기법은 트랜잭션 처리에 최적화된 블룸필터를 제공함으로써, 병렬적으로 동시에 수행되는 서로 다른 환경의 트랜잭션에 대해 일관성 있는 처리를 지원한다. 아울러, CPU 캐시라인에 최적화된 메모리 기법을 통해, 메모리 할당량이 적은 트랜잭션은 로컬 캐시에 할당함으로써 트랜잭션의 빠른 처리를 지원한다.
https://doi.org/10.3745/PKIPS.y2017m04a.795 인용 PDF

Multi-queue Hybrid Job Scheduling Mechanism in Grid Computing (그리드 컴퓨팅의 다중 큐 하이브리드 작업스케줄링 기법)

Kang, Chang-Hoon;Choi, Chang-Yeol;Park, Kie-Jin;Kim, Sung-Soo
- Journal of KIISE:Computer Systems and Theory
- /
- v.34 no.7
- /
- pp.304-318
- /
- 2007
Grid computing is a service that share geographically distributed computing resources through high speed network. In this paper, we propose hybrid scheduling scheme which considers not only meta-scheduling scheme to distribute the job between the nodes of grid computing system but also the job scheduling to distribute the job within the local nodes. According to the number of processors needed and expected execution time, the job with high priority is allocated to job queue while the one with low priority and remote job are allocated to backfill queue. We evaluate the proposing scheme through the various experiments and the results show that the utilization of grid computing system increases and the job slowdown decreases.
PDF KSCI

A Study on complement techniques for an efficient instruction scheduling (효과적인 명령어 스케쥴링 보완 기술 연구)

Cho, Jungseok;Cho, Doosan;Jung, Yoojin;Hyun, Heasook;Kim, Dongkyu;Jung, Insang;Choi, Changmoon;Youn, Jonghee
- Proceedings of the Korea Information Processing Society Conference
- /
- 2015.10a
- /
- pp.164-165
- /
- 2015
고성능 복수 연산 처리 장치를 갖는 VLIW (Very Long Instruction Word)와 같은 프로세서 아키텍처는 정밀한 명령어 스케쥴링을 하드웨어가 아닌 소프트웨어가 처리해 주어야 한다. 통상 컴파일러가 하드웨어의 풍부한 자원을 충분히 활용할 수 있도록 이러한 기능을 수행하여 준다. 기존에 다양한 명령어 스케쥴링 알고리즘이 연구되었고 수 십년에 걸쳐 새로운 스케쥴링 기법들이 소개되었다. 이러한 스케쥴링 기법의 성능은 알고리즘의 효율뿐만 아니라 프로그램 코드에 내재된 의존관계 (dependence relation)의 복잡도에 따라 상당한 영향을 받는다. 본 연구에서는 의존도 완화기법으로서 레지스터 재할당 (register reallocation) 기법을 살펴보고 이를 활용하여 스케쥴링 성능 개선을 시도하여 보았다.
https://doi.org/10.3745/PKIPS.y2015m10a.164 인용 PDF

Search Result 141, Processing Time 0.028 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)