• Title/Summary/Keyword: FPGA 설계

Search Result 1,067, Processing Time 0.023 seconds

Microcode based Controller for Compact CNN Accelerators Aimed at Mobile Devices (모바일 디바이스를 위한 소형 CNN 가속기의 마이크로코드 기반 컨트롤러)

  • Na, Yong-Seok;Son, Hyun-Wook;Kim, Hyung-Won
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.3
    • /
    • pp.355-366
    • /
    • 2022
  • This paper proposes a microcode-based neural network accelerator controller for artificial intelligence accelerators that can be reconstructed using a programmable architecture and provide the advantages of low-power and ultra-small chip size. In order for the target accelerator to support various neural network models, the neural network model can be converted into microcode through microcode compiler and mounted on accelerator to control the operators of the accelerator such as datapath and memory access. While the proposed controller and accelerator can run various CNN models, in this paper, we tested them using the YOLOv2-Tiny CNN model. Using a system clock of 200 MHz, the Controller and accelerator achieved an inference time of 137.9 ms/image for VOC 2012 dataset to detect object, 99.5ms/image for mask detection dataset to detect wearing mask. When implementing an accelerator equipped with the proposed controller as a silicon chip, the gate count is 618,388, which corresponds to 65.5% reduction in chip area compared with an accelerator employing a CPU-based controller (RISC-V).

Improvement of Residual Delay Compensation Algorithm of KJJVC (한일상관기의 잔차 지연 보정 알고리즘의 개선)

  • Oh, Se-Jin;Yeom, Jae-Hwan;Roh, Duk-Gyoo;Oh, Chung-Sik;Jung, Jin-Seung;Chung, Dong-Kyu;Oyama, Tomoaki;Kawaguchi, Noriyuki;Kobayashi, Hideyuki;Kawakami, Kazuyuki;Ozeki, Kensuke;Onuki, Hirohumi
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.14 no.2
    • /
    • pp.136-146
    • /
    • 2013
  • In this paper, the residual delay compensation algorithm is proposed for FX-type KJJVC. In case of initial version as that design algorithm of KJJVC, the integer calculation and the cos/sin table for the phase compensation coefficient were introduced in order to speed up of calculation. The mismatch between data timing and residual delay phase and also between bit-jump and residual delay phase were found and fixed. In final design of KJJVC residual delay compensation algorithm, the initialization problem on the rotation memory of residual delay compensation was found when the residual delay compensated value was applied to FFT-segment, and this problem is also fixed by modifying the FPGA code. Using the proposed residual delay compensation algorithm, the band shape of cross power spectrum becomes flat, which means there is no significant loss over the whole bandwidth. To verify the effectiveness of proposed residual delay compensation algorithm, we conducted the correlation experiments for real observation data using the simulator and KJJVC. We confirmed that the designed residual delay compensation algorithm is well applied in KJJVC, and the signal to noise ratio increases by about 8%.

Design and Implementation of a Hardware-based Transmission/Reception Accelerator for a Hybrid TCP/IP Offload Engine (하이브리드 TCP/IP Offload Engine을 위한 하드웨어 기반 송수신 가속기의 설계 및 구현)

  • Jang, Han-Kook;Chung, Sang-Hwa;Yoo, Dae-Hyun
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.34 no.9
    • /
    • pp.459-466
    • /
    • 2007
  • TCP/IP processing imposes a heavy load on the host CPU when it is processed by the host CPU on a very high-speed network. Recently the TCP/IP Offload Engine (TOE), which processes TCP/IP on a network adapter instead of the host CPU, has become an attractive solution to reduce the load in the host CPU. There have been two approaches to implement TOE. One is the software TOE in which TCP/IP is processed by an embedded processor and the other is the hardware TOE in which TCP/IP is processed by a dedicated ASIC. The software TOE has poor performance and the hardware TOE is neither flexible nor expandable enough to add new features. In this paper we designed and implemented a hybrid TOE architecture, in which TCP/IP is processed by cooperation of hardware and software, based on an FPGA that has two embedded processor cores. The hybrid TOE can have high performance by processing time-critical operations such as making and processing data packets in hardware. The software based on the embedded Linux performs operations that are not time-critical such as connection establishment, flow control and congestions, thus the hybrid TOE can have enough flexibility and expandability. To improve the performance of the hybrid TOE, we developed a hardware-based transmission/reception accelerator that processes important operations such as creating data packets. In the experiments the hybrid TOE shows the minimum latency of about $19{\mu}s$. The CPU utilization of the hybrid TOE is below 6 % and the maximum bandwidth of the hybrid TOE is about 675 Mbps.

Development of a Small Animal Positron Emission Tomography Using Dual-layer Phoswich Detector and Position Sensitive Photomultiplier Tube: Preliminary Results (두층 섬광결정과 위치민감형광전자증배관을 이용한 소동물 양전자방출단층촬영기 개발: 기초실험 결과)

  • Jeong, Myung-Hwan;Choi, Yong;Chung, Yong-Hyun;Song, Tae-Yong;Jung, Jin-Ho;Hong, Key-Jo;Min, Byung-Jun;Choe, Yearn-Seong;Lee, Kyung-Han;Kim, Byung-Tae
    • The Korean Journal of Nuclear Medicine
    • /
    • v.38 no.5
    • /
    • pp.338-343
    • /
    • 2004
  • Purpose: The purpose of this study was to develop a small animal PET using dual layer phoswich detector to minimize parallax error that degrades spatial resolution at the outer part of field-of-view (FOV). Materials and Methods: A simulation tool GATE (Geant4 Application for Tomographic Emission) was used to derive optimal parameters of small PET, and PET was developed employing the parameters. Lutetium Oxyorthosilicate (LSO) and Lutetium-Yttrium Aluminate-Perovskite(LuYAP) was used to construct dual layer phoswitch crystal. $8{\times}8$ arrays of LSO and LuYAP pixels, $2mm{\times}2mm{\times}8mm$ in size, were coupled to a 64-channel position sensitive photomultiplier tube. The system consisted of 16 detector modules arranged to one ring configuration (ring inner diameter 10 cm, FOV of 8 cm). The data from phoswich detector modules were fed into an ADC board in the data acquisition and preprocessing PC via sockets, decoder block, FPGA board, and bus board. These were linked to the master PC that stored the events data on hard disk. Results: In a preliminary test of the system, reconstructed images were obtained by using a pair of detectors and sensitivity and spatial resolution were measured. Spatial resolution was 2.3 mm FWHM and sensitivity was 10.9 $cps/{\mu}Ci$ at the center of FOV. Conclusion: The radioactivity distribution patterns were accurately represented in sinograms and images obtained by PET with a pair of detectors. These preliminary results indicate that it is promising to develop a high performance small animal PET.

An Improvement of Implementation Method for Multi-Layer AHB BusMatrix (ML-AHB 버스 매트릭스 구현 방법의 개선)

  • Hwang Soo-Yun;Jhang Kyoung-Sun
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.32 no.11_12
    • /
    • pp.629-638
    • /
    • 2005
  • In the System on a Chip design, the on chip bus is one of the critical factors that decides the overall system performance. Especially, in the case or reusing the IPs such as processors, DSPs and multimedia IPs that requires higher bandwidth, the bandwidth problems of on chip bus are getting more serious. Recently ARM proposes the Multi-Layer AHB BusMatrix that is a highly efficient on chip bus to solve the bandwidth problems. The Multi-Layer AHB BusMatrix allows parallel access paths between multiple masters and slaves in a system. This is achieved by using a more complex interconnection matrix and gives the benefit of increased overall bus bandwidth, and a more flexible system architecture. However, there is one clock cycle delay for each master in existing Multi-Layer AHB BusMatrix whenever the master starts new transactions or changes the slave layers because of the Input Stage and arbitration logic realized with Moore type. In this paper, we improved the existing Multi-Layer AHB BusMatrix architecture to solve the one clock cycle delay problems and to reduce the area overhead of the Input Stage. With the elimination of the Input Stage and some restrictions on the arbitration scheme, we tan take away the one clock cycle delay and reduce the area overhead. Experimental results show that the end time of total bus transaction and the average latency time of improved Multi-Layer AHB BusMatrix are improved by $20\%\;and\;24\%$ respectively. in ease of executing a number of transactions by 4-beat incrementing burst type. Besides the total area and the clock period are reduced by $22\%\;and\;29\%$ respectively, compared with existing Multi-layer AHB BusMatrix.

A Hardware Implementation of the Underlying Field Arithmetic Processor based on Optimized Unit Operation Components for Elliptic Curve Cryptosystems (타원곡선을 암호시스템에 사용되는 최적단위 연산항을 기반으로 한 기저체 연산기의 하드웨어 구현)

  • Jo, Seong-Je;Kwon, Yong-Jin
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.8 no.1
    • /
    • pp.88-95
    • /
    • 2002
  • In recent years, the security of hardware and software systems is one of the most essential factor of our safe network community. As elliptic Curve Cryptosystems proposed by N. Koblitz and V. Miller independently in 1985, require fewer bits for the same security as the existing cryptosystems, for example RSA, there is a net reduction in cost size, and time. In this thesis, we propose an efficient hardware architecture of underlying field arithmetic processor for Elliptic Curve Cryptosystems, and a very useful method for implementing the architecture, especially multiplicative inverse operator over GF$GF (2^m)$ onto FPGA and futhermore VLSI, where the method is based on optimized unit operation components. We optimize the arithmetic processor for speed so that it has a resonable number of gates to implement. The proposed architecture could be applied to any finite field $F_{2m}$. According to the simulation result, though the number of gates are increased by a factor of 8.8, the multiplication speed We optimize the arithmetic processor for speed so that it has a resonable number of gates to implement. The proposed architecture could be applied to any finite field $F_{2m}$. According to the simulation result, though the number of gates are increased by a factor of 8.8, the multiplication speed and inversion speed has been improved 150 times, 480 times respectively compared with the thesis presented by Sarwono Sutikno et al. [7]. The designed underlying arithmetic processor can be also applied for implementing other crypto-processor and various finite field applications.

Family Structure and Succession of the Late Chosun Seen through Male Adoption (양자제도를 통해 본 조선후기 가족구조와 가계계승: 의성김씨 호구단자 분석을 중심으로)

  • Park, Soo-Mi
    • Korea journal of population studies
    • /
    • v.30 no.2
    • /
    • pp.71-95
    • /
    • 2007
  • This paper attempts to identify the principle of family succession and family patterns of yangban in the late Chosun period through an analysis of male adaptation cases found in family registration records. The primary source of analysis is the family registration documents of Uiseong Kim's from the late 17th century to the early 20th century. As a result, it is found that there is a substantial change in the patterns of family from the early and mid Chosun period to the late Chosun period. The change is the strengthening of the principle of patriarchy succession through male adoption. Looking at the data as a whole, the average number of household members is increased and the membership of kinship also expanded. In contrast to the family patterns of the early Chosun period, not only the patterns of Uiseong Kim's family are predominately immediate family or collateral family but also the majority is extended family in the 18th and 19th centuries. The male adoption cases recorded in Uiseong Kim's family registration documents take up 33.8% of the male adoption cases in the entire family registration documents. This goes to show that the strengthening of the principle of primogeniture succession at a time when child mortality rate is very high resulted in the increase of male adoption. In conclusion, the late Chosun society was a society where the seat of primogeniture was much more important than immediate hereditary members in the family succession.