• Title/Summary/Keyword: 프로세서 구조

Search Result 1,042, Processing Time 0.029 seconds

A Processor Architecture with Effective Memory System for Sort-Last Parallel Rendering (Sort-Last 병렬 렌더링을 위한 효과적인 메모리 프로세서 구조)

  • Yoon Duk-Ki;Kim Kyoung-So;Lee Kyung-Ho;Park Wo-Chan
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2006.05a
    • /
    • pp.1363-1366
    • /
    • 2006
  • 본 논문에서는 각각의 그래픽 가속기에 픽셀 캐시를 사용가능 하게 하면서 성능을 증가시키고 일관성 문제를 해결하는 병렬 렌더링 프로세서를 제안한다. 제안하는 구조에서는 픽셀 캐시 미스에 의한 latency를 감소시켰다. 이러한 2가지 성과를 위하여 현재의 새로운 픽셀 캐시 구조에 효과적인 메모리 구조를 포함시켰다. 실험 결과는 제안하는 구조가 16개 이상의 레스터라이저에서 거의 선형적으로 속도 향상을 가져옴을 보여준다..

  • PDF

Adaptive Beamforming and Detection Algorithms Based on the cholesky Decomposition of the Inverse Covariance Matrix (역 공분산 행렬의 Cholesky 분할에 근거한 적응 빔 형성 및 검출 알고리즘)

  • 박영철;차일환;윤대희
    • The Journal of the Acoustical Society of Korea
    • /
    • v.12 no.2E
    • /
    • pp.47-62
    • /
    • 1993
  • SMI 방법은 수치적인 불안정성과 아울러 많은 계산량을 갖는다. 본 논문에서는 역 공분산 행렬의 Cholesky 분할을 이용하여 SMI 방법보다 효율적인 방법을 제안한다. 제안한 방법에서는 적응 빔 형상과 검출이 하나의 구조로 실현되며 이에 피룡한 역 공분산 행렬의 Cholesky factor는 secondary 입력으로부터 GS 프로세서를 이용하여 추정한다. 제안한 구조의 중요한 특징은 공분산 행렬과 Cholesky factor를 직접 구할 필요가 없다는 점이며, 또한 GS 프로세서의 장점을 이용한 systolic 구조를 사용함으로써 효율적인 계산을 수행할 수 있다. 모의 실험을 통하여 제안한 방법의 성능과 SMI 방법의 성능을 서로 비교하였다. 또한 nonhomogeneous 환경에서 동작하기 위한 방법이 제시되었으며, 아울러 계산량이 많은 GS 구조의 단점을 극복하기 위해 lattice-GS 구조를 이용하는 방법을 제안하였다.

  • PDF

A dual-link CC-NUMA System Tolerant to the Multiprogramming Environment (다중 프로그램 환경에 적합한 이중 연결 CC-NUMA 시스템)

  • Suh, Hyo-Joong
    • The KIPS Transactions:PartA
    • /
    • v.11A no.3
    • /
    • pp.199-206
    • /
    • 2004
  • Under the multiprogrammed situation, the performance of multiprocessor system is affected by the process allocation policy of the operating systems. The lowest communication cost can be achieved when the related processes positioned to the adjacent processors. While the effective allocation is quite difficult to the real situation, and the processing of the allocation policy consumes some computation time. The dual-ring CC-NUMA systems exhibit a quite performance difference according to the process a1location policy due to a lot of unbalanced memory transactions on the interconnection networks. In this paper, I propose a load balanced dual-link CC-NUMA system that does not requires the processes allocation policy. By the program-driven simulation results. the proposed system shows no remarkable difference according to the allocation policy while the dual-ring systems shows 10% performance improvement by the process allocation. In addition, the proposed system outperforms the dual~ring systems about 1.5 times.

병렬컴퓨터 구조가 업무에 미치는 영향

  • Korea Database Promotion Center
    • Digital Contents
    • /
    • no.10 s.65
    • /
    • pp.89-97
    • /
    • 1998
  • 본 기사에서는 현재 시장을 지배하는 세 종류의 병렬 컴퓨터 구조들, 즉 대칭 멀티 프로세서 구조(SMP), 클러스터 구조, 그리고 ccNUMA 구조(NUMA로 더 잘 알려져 있음)에 대한 오해에 관해 이야기하고자 한다. 각각의 구조들이 가진 특성들이 OLTP환경, 의사 결정 지원 작업 부하량, 고효용성, 그리고 시스템 관리 측면에 어떠한 영향을 끼치는가 하는 문제에 토론의 초점을 맞추고자 한다.

  • PDF

The ATM SAR Processor Optimized for VoDSL Service (VoDSL 서비스에 최적화된 ATM SAR 프로세서)

  • 손윤식;정정화
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.40 no.10
    • /
    • pp.9-16
    • /
    • 2003
  • In this paper, we propose an ATM processor suitable for VoDSL subscriber's equipments. The processor is composed of ATM block, AAL protocol block and ATS scheduler, and provides up to 4 VCC which service data and voice traffics on the ATM network. The proposed ATS scheduler can guarantee QoS of the voice traffic and supports multiple AAL2 packet. The ATM processor is manufactured on the 0.35 micron fabrication line of HYNIX semiconductor and provides the maximum data transfer rate of up to 52 Mbps. We implement the LAD, which is the VoDSL subscriber's equipment. The experimental results on the test bed network shows that the proposed hardware scheme successfully services most of the applications of the VoDSL services.

A 32-bit Microprocessor with enhanced digital signal process functionality (디지털 신호처리 기능을 강화한 32비트 마이크로프로세서)

  • Moon, Sang-ook
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • v.9 no.2
    • /
    • pp.820-822
    • /
    • 2005
  • We have designed a 32-bit microprocessor with fixed point digital signal processing functionality. This processor, combines both general-purpose microprocessor and digital signal processor functionality using the reduced instruction set computer design principles. It has functional units for arithmetic operation, digital signal processing and memory access. They operate in parallel in order to remove stall cycles after DSP or load/store instructions, which usually need one or more issue latency cycles in addition to the first issue cycle. High performance was achieved with these parallel functional units while adopting a sophisticated five-stage pipeline stucture.

  • PDF

Low Power Mapping Algorithm Considering Data Transfer Time for CGRA (데이터를 고려한 저전력 소모 CGRA 매핑 알고리즘)

  • Kim, Yong-Joo;Youn, Jong-Hee;Cho, Doo-San;Paek, Yun-Heung
    • The KIPS Transactions:PartA
    • /
    • v.19A no.1
    • /
    • pp.17-22
    • /
    • 2012
  • The demand of high performance processor is soaring due to the extending of mobile and small electronic device market. CGRA(Coarse Grained Reconfigurable Architecture) is the processor satisfying both of performance and low-power demands and a great alternative of ASIC that can be reconfigured. This paper presents a novel low-power mapping algorithm that optimizes the number of used computation resource in the mapping phase by considering data transfer time. Compared with previous mapping algorithm, ours reduce energy consumption by up to 73%, and 56.4% on average.

Implementation and Translation of Major OpenMP Directives for Chip Multiprocessor without using OS (단일 칩 다중 프로세서상에서 운영체제를 사용하지 않은 OpenMP 구현 및 주요 디렉티브 변환)

  • Jeun, Woo-Chul;Ha, Soon-Hoi
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.34 no.4
    • /
    • pp.145-157
    • /
    • 2007
  • OpenMP is an attractive parallel programming model for a chip multiprocessor because there is no standard parallel programming method for a chip multiprocessor and it is easy to write a parallel program in OpenMP. Then, chip multiprocessor systems can have various architectures according to target application programs. So, we need to implement OpenMP in different way for each system. In this paper, we propose the implementation and the effective translation of major OpenMP directives for a chip multiprocessor without using OS to improve the performance without using special hardware and without extending the OpenMP directives. We present the experimental results on our target platform CT3400.

Optimized Implementation of Scalable Multi-Precision Multiplication Method on RISC-V Processor for High-Speed Computation of Post-Quantum Cryptography (차세대 공개키 암호 고속 연산을 위한 RISC-V 프로세서 상에서의 확장 가능한 최적 곱셈 구현 기법)

  • Seo, Hwa-jeong;Kwon, Hyeok-dong;Jang, Kyoung-bae;Kim, Hyunjun
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.31 no.3
    • /
    • pp.473-480
    • /
    • 2021
  • To achieve the high-speed implementation of post-quantum cryptography, primitive operations should be tailored to the architecture of the target processor. In this paper, we present the optimized implementation of multiplier operation on RISC-V processor for post-quantum cryptography. Particularly, the column-wise multiplication algorithm is optimized with the primitive instruction of RISC-V processor, which improved the performance of 256-bit and 512-bit multiplication by 19% and 8% than previous works, respectively. Lastly, we suggest the instruction extension for the high-speed multiplication on the RISC-V processor.

SAMBA Type MPSoC Bus Architecture Optimization under Performance Constraints (성능 제약 조건 하에서의 SAMBA 형 MPSoC 버스 구조 최적화)

  • Kim, Hong-Yeom;Jung, Sung-Chul;Shin, Hyun-Chul
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.47 no.1
    • /
    • pp.94-101
    • /
    • 2010
  • Optimization of interconnects among processors and memories becomes important as multiple processors and memories can be integrated on a Multi-Processor System-on-Chip (MPSoC). Since the optimal interconnection architecture is usually dependent on the applications, systematic design methodology for various data transfer requirements is necessary. In this paper, we focus on bus interconnection for MPSoC applications which use 4 ~ 16 processors. We propose a new systematic bus design methodology under performance constraints using Single Arbitration Multiple Bus Accesses (SAMBA) style bus architectures. Optimized bus architecture is found to satisfy performance constraints for a single or multiple applications. When compared to the unoptimized architecture, our method can reduce the bus switch logic circuits significantly (by more than 50% sometimes). Furthermore, low cost bus architectures can be found to satisfy the performance constraints for multiple applications.