• Title/Summary/Keyword: Multiply paper

Search Result 157, Processing Time 0.024 seconds

Analysis of Tensor Processing Unit and Simulation Using Python (텐서 처리부의 분석 및 파이썬을 이용한 모의실행)

  • Lee, Jongbok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.19 no.3
    • /
    • pp.165-171
    • /
    • 2019
  • The study of the computer architecture has shown that major improvements in price-to-energy performance stems from domain-specific hardware development. This paper analyzes the tensor processing unit (TPU) ASIC which can accelerate the reasoning of the artificial neural network (NN). The core device of the TPU is a MAC matrix multiplier capable of high-speed operation and software-managed on-chip memory. The execution model of the TPU can meet the reaction time requirements of the artificial neural network better than the existing CPU and the GPU execution models, with the small area and the low power consumption even though it has many MAC and large memory. Utilizing the TPU for the tensor flow benchmark framework, it can achieve higher performance and better power efficiency than the CPU or CPU. In this paper, we analyze TPU, simulate the Python modeled OpenTPU, and synthesize the matrix multiplication unit, which is the key hardware.

A Study on Illumination Normalization Method based on Bilateral Filter for Illumination Invariant Face Recognition (조명 환경에 강인한 얼굴인식 성능향상을 위한 Bilateral 필터 기반 조명 정규화 방법에 관한 연구)

  • Lee, Sang-Seop;Lee, Su-Young;Kim, Joong-Kyu
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.47 no.4
    • /
    • pp.49-55
    • /
    • 2010
  • Cast shadow caused by an illumination condition can produce troublesome effects for face recognition system using reflectance image. Consequently, we need to separate cast shadow area from feature area for improvement of recognition accuracy. A Bilateral filter smooths image while preserving edges, by means of a nonlinear combination of nearby pixel values. Processing such characteristics, this method is suited to our purpose in illumination estimation process based on Retinex. Therefore, in this paper, we propose a new illumination normalization method based on the Bilateral filter in face images. The proposed method produces a reflectance image that is preserved relatively exact cast shadow area, because coefficient of filter is designed to multiply proximity and discontinuity of pixels in input image. Performance of our method is measured by a recognition accuracy of principle component analysis(PCA) and evaluated to compare with other conventional illumination normalization methods.

Design of Bit Manipulation Accelerator fo Communication DSP (통신용 DSP를 위한 비트 조작 연산 가속기의 설계)

  • Jeong Sug H.;Sunwoo Myung H.
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • v.42 no.8 s.338
    • /
    • pp.11-16
    • /
    • 2005
  • This paper proposes a bit manipulation accelerator (BMA) having application specific instructions, which efficiently supports scrambling, convolutional encoding, puncturing, and interleaving. Conventional DSPs cannot effectively perform bit manipulation functions since かey have multiply accumulate (MAC) oriented data paths and word-based functions. However, the proposed accelerator can efficiently process bit manipulation functions using parallel shift and Exclusive-OR (XOR) operations and bit jnsertion/extraction operations on multiple data. The proposed BMA has been modeled by VHDL and synthesized using the SEC $0.18\mu m$ standard cell library and the gate count of the BMA is only about 1,700 gates. Performance comparisons show that the number of clock cycles can be reduced about $40\%\sim80\%$ for scrambling, convolutional encoding and interleaving compared with existing DSPs.

A Design of Low Power MAC Operator with Fault Tolerance (에러 내성을 갖는 저전력 MAC 연산기 설계)

  • Jung, Han-Sam;Ku, Sung-Kwan;Chung, Ki-Seok
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.45 no.11
    • /
    • pp.50-55
    • /
    • 2008
  • As more DSP functionalities are integrated into an embedded mobile device, power consumption and device reliability have emerged as crucial issues. As the complexity of mobile embedded designs increases very rapidly, verifying the functionality of the mobile devices has become extremely difficult. Therefore, designs with error (fault) tolerance are often required since these capabilities will enable the design to operate properly even with some existence of errors. However, designs with fault tolerance may suffer from significant power overhead since fault tolerance is often achieved by resource replication. In this paper, we propose a low power and fault tolerant MAC (multiply-and-accumulate) design. The proposed MAC design is based on multiple barrel shifters since MAC designs with barrel-shifters and adders are known to be excellent in terms of power consumption.

Performance Analysis of Bandwidth Allocation Scheme using POBP Method in ATM Networks. (ATM 네트워크에서 POBP 방식을 이용한 대역폭 할당 방법의 성능분석)

  • 한상엽;박광채
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.4 no.3
    • /
    • pp.537-548
    • /
    • 2000
  • ATM is the effective information transmission method which multiply statistical and can accept effective the traffic of the various transmission rate. However, it can happen excessive cell loss probability and cell delay when it have temporarily overload. Therefore, it is required the effective traffic control and network resource management for which guarantee QoS(Quality of Service) in terms of users and bandwidth utilization maximization in terms of networks. In this paper, we proposed POBP(PushOut BP) scheme which mixed pushout scheme with BP(Back Pressure), reactive control scheme recommended at the ATM Forum, to guarantee QoS in two stages ATM switch networks. In proposed method, we have to understand exactly using bandwidth information in real-time and become traffic shaping. Thorough these processes, we can not only use effectively unused bandwidth, but also guarantee the fair bandwidth utilization and then can improve cell-loss possibility happened by congestion states in two stage ATM networks.

  • PDF

The Improvement of High Convergence Speed using LMS Algorithm of Data-Recycling Adaptive Transversal Filter in Direct Sequence Spread Spectrum (직접순차 확산 스펙트럼 시스템에서 데이터 재순환 적응 횡단선 필터의 LMS 알고리즘을 이용한 고속 수렴 속도 개선)

  • Kim, Gwang-Jun;Yoon, Chan-Ho;Kim, Chun-Suk
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.9 no.1
    • /
    • pp.22-33
    • /
    • 2005
  • In this paper, an efficient signal interference control technique to improve the high convergence speed of LMS algorithms is introduced in the adaptive transversal filter of DS/SS. The convergence characteristics of the proposed algorithm, whose coefficients are multiply adapted in a symbol time period by recycling the received data, is analyzed to prove theoretically the improvement of high convergence speed. According as the step-size parameter ${\mu}$ is increased, the rate of convergence of the algorithm is controlled. Also, an increase in the stop-size parameter ${\mu}$ has the effect of reducing the variation in the experimentally computed learning curve. Increasing the eigenvalue spread has the effect of controlling which is downed the rate of convergence of the adaptive equalizer. Increasing the steady-state value of the average squared error, proposed algorithm also demonstrate the superiority of signal interference control to the filter algorithm increasing convergence speed by (B+1) times due to the data-recycling LMS technique.

An Aspect-based Testing Framework for Performance Evaluation of Composite Service (조합된 서비스의 성능 평가를 위한 Aspect 기반 테스팅 프레임워크)

  • Kim, Jong-Phil;Hong, Jang-Eui
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.1 no.3
    • /
    • pp.149-158
    • /
    • 2012
  • As service-oriented software is considered as one of solutions to fulfill the users' needs in internet service environment, it has been increased the demands of reliable service development by the composition of internet services. However a critical issue in the service development approach is to satisfy the performance requirement as well as the functional correctness for the developing services, because impatient user multiply clicks the request-button of service without a short waiting. This paper proposes a framework to examine the performance of composite service. Our testing framework provides the data of service response time to service developer by measuring the service execution time. We develope an Aspect-based timer service, and weave the service with existing services to measure the execution time. Additionally, we perform some experiments to confirm the usefulness of performance test for composite service. This framework can support to develop a good performance service by substitution of the dragging service with another new service that will be a component of composite service.

AE32000B: a Fully Synthesizable 32-Bit Embedded Microprocessor Core

  • Kim, Hyun-Gyu;Jung, Dae-Young;Jung, Hyun-Sup;Choi, Young-Min;Han, Jung-Su;Min, Byung-Gueon;Oh, Hyeong-Cheol
    • ETRI Journal
    • /
    • v.25 no.5
    • /
    • pp.337-344
    • /
    • 2003
  • In this paper, we introduce a fully synthesizable 32-bit embedded microprocessor core called the AE32000B. The AE32000B core is based on the extendable instruction set computer architecture, so it has high code density and a low memory access rate. In order to improve the performance of the core, we developed and adopted various design options, including the load extension register instruction (LERI) folding unit, a high performance multiply and accumulate (MAC) unit, various DSP units, and an efficient coprocessor interface. The instructions per cycle count of the Dhrystone 2.1 benchmark for the designed core is about 0.86. We verified the synthesizability and the area and time performances of our design using two CMOS standard cell libraries: a 0.35-${\mu}m$ library and a 0.18-${\mu}m$ library. With the 0.35-${\mu}m$ library, the core can be synthesized with about 47,000 gates and operate at 70 MHz or higher, while it can be synthesized with about 53,000 gates and operate at 120 MHz or higher with the 0.18-${\mu}m$ library.

  • PDF

Implementation of 24bit Sigma-delta D/A Converter for an Audio (오디오용 24bit 시그마-델타 D/A 컨버터 구현)

  • Heo, Jeong-Hwa;Park, Sang-Bong
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.8 no.4
    • /
    • pp.53-58
    • /
    • 2008
  • This paper designs sigma-delta D/A Converter with a high resolution and low power consumption. It reorganizes the input data along LJ, RJ, I2S mode and bit mode to the output data of A/D converter. The D/A converter decodes the original analog signal through HBF, Hold and 5th CIFB(Cascaded Integrators with distributed Feedback as well as distributed input coupling) sigma-delta modulation blocks. It uses repeatedly the addition operation in instead of the multiply operation for the chip area and the performance. Also, the half band filters of similar architecture composed the one block and it used the sample-hold block instead of the sinc filter. We supposed simple D/A Converter decreased in area. The filters of the block analyzed using the matlab tool. The top block designed using the top-down method by verilog language. The designed block is fabricated using Samsung 0.35um CMOS standard cell library. The chip area is 1500*1500um.

  • PDF

IEEE-754 Floating-Point Divider for Embedded Processors (내장형 프로세서를 위한 IEEE-754 고성능 부동소수점 나눗셈기의 설계)

  • Jeong, Jae-Won;Hong, In-Pyo;Jeong, Woo-Kyong;Lee, Yong-Surk
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.39 no.7
    • /
    • pp.66-73
    • /
    • 2002
  • As floating-point operations become widely used in various applications such as computer graphics and high-definition DSP, the needs for fast division become increased. However, conventional floating-point dividers occupy a large hardware area, and bring bottle-becks to the entire floating-point operations. In this paper, a high-performance and small-area floating-point divider, which is suitable for embedded processors, is designed using he series expansion algorithm. The algorithm is selected to utilize two MAC(Multiply-ACcumulate) units for quadratic convergence to the correct quotient. The two MAC units for SIMD-DSP features are shared and the additional area for the division only is very small. The proposed divider supports all rounding modes defined by IEEE 754 standard, and error estimations are performed for appropriate precision.