Search | Korea Science

Memory data layout and DMA transfer technique research For efficient data transfer of CNN accelerator (CNN 가속기의 효율적인 데이터 전송을 위한 메모리 데이터 레이아웃 및 DMA 전송기법 연구)

Cho, Seok-Jae;Park, Sungkyung;Park, Chester Sungchung
- Journal of IKEEE
- /
- v.24 no.2
- /
- pp.559-569
- /
- 2020
One of the deep-running algorithms, CNN's artificial intelligence application uses off-chip memory to store data on the Convolution Layer. DMA can reduce processor load at every data transfer. It can also reduce application performance degradation by varying the order in which data from the Convolution layer is transmitted to the global buffer of the accelerator. For basic layouts with continuous memory addresses, SG-DMA showed about 3.4 times performance improvement in pre-setting DMA compared to using ordinaly DMA, and for Ideal layouts with discontinuous memory addresses, the ordinal DMA was about 1396 cycles faster than SG-DMA. Experiments have shown that a combination of memory data layout and DMA can reduce the DMA preset load by about 86 percent.
https://doi.org/10.7471/ikeee.2020.24.2.559 인용 PDF KSCI

System Design of 900MHz RFID Eucational System including the Active Tag (능동형 태그를 포함한 900MHz RFID 교육용 시스템의 설계)

Kim, H.C.;Ohlzahas, A.;Kim, J.M.;Jin, H.S.;Cho, D.G.;Chung, J.S.;Kang, O.H.;Jung, K.W.
- Journal of Internet Computing and Services
- /
- v.8 no.4
- /
- pp.51-59
- /
- 2007
This paper presents the development of RFID educational system based on using 900MHz air interface between the reader and the active tag. The software of reader and the active tag is developed on embedded environment, and the software of PC controlling the reader is based on window OS operated as the server. The AT89C51ED2 VLSI chip is used for the processor of the reader and the active tag. As the development environment, Keil compiler is used for the reader and the active tag of which the programing language is C. The visual C language of the visual studio on the PC activated as the server is used for development language. To verify the function of the system, PC gets the tag's identification number through the reader and send the data to with the active tag memory a certain contents as "wite" operation. Finally the data written from the active tag's memory is sent to the PC via the reader as "read" operation and compare the received data with one already sent to the tag.
PDF

Design of Real-Time PreProcessor for Image Enhancement of CMOS Image Sensor (CMOS 이미지 센서의 영상 개선을 위한 실시간 전처리 프로세서의 설계)

Jung, Yun-Ho;Lee, Joon-Hwan;Kim, Jae-Seok;Lim, Won-Bae;Hur, Bong-Soo;Kang, Moon-Gi
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.38 no.8
- /
- pp.62-71
- /
- 2001
This paper presents a design of the real-time digital image enhancement preprocessor for CMOS image sensor. CMOS image sensor offers various advantages while it provides lower-quality images than CCD does. In order to compensate for the physical limitation of CMOS sensor, the spatially adaptive contrast enhancement algorithm was incorporated into the preprocessor with color interpolation, gamma correction, and automatic exposure control. The efficient hardware architecture for the preprocessor is proposed and was simulated in VHDL. It is composed of about 19K logic gates, which is suitable for low-cost one-chip PC camera. The test system was implemented on Altera Flex EPF10KGC503-3 FPGA chip in real-time mode, and performed successfully.
PDF

DEVELOPMENT OF CCD IMAGING SYSTEM USING THERMOELECTRIC COOLING METHOD (열전 냉각방식을 이용한 극미광 영상장비 개발)

Park, Young-Sik;Lee, Chung-Woo;Jin, Ho;Han, Won-Yong;Nam, Uk-Won;Lee, Yong-Sam
- Journal of Astronomy and Space Sciences
- /
- v.17 no.1
- /
- pp.53-66
- /
- 2000
We developed low light CCD imaging system using thermoelectric cooling method collaboration with a company to design a commercial model. It consists of Kodak KAF-0401E(768$\times$512 pixels) CCD chip, thermoelectric module manufactured by Thermotek. This TEC system can reach an operative temperature of $-25^{\circ}C$. We employed an Uniblitz VS25s shutter and it has capability a minimum exposure time 80ms. The system components are an interface card using a Korea Astronomy Observatory (hereafter KAO) ISA bus controller, image acquisition with AD9816 chip, that is 12bit video processor. The performance test with this imaging system showed good operation within the initial specification of our design. It shows a dark current less than 0.4e-/pixel/sec at a temperature of $-10^{\circ}C$, a linearity 99.9$\pm$0.1%, gain 4.24e-/adu, and system noise is 25.3e-(rms). For low temperature CCD operation, we designed a TEC, which uses a one-stage peltier module and forced air heat exchanger. This TEC imaging system enables accurate photometry($\pm$0.01mag) even though the CCD is not at 'conventional' cryogenic temperatures(140k). The system can be a useful instrument for any other imaging applications. Finally, with this system, we obtained several images of astronomical objects for system performance tests.
PDF

Parallel SystemC Cosimulation using Virtual Synchronization (가상 동기화 기법을 이용한 SystemC 통합시뮬레이션의 병렬 수행)

Yi, Young-Min;Kwon, Seong-Nam;Ha, Soon-Hoi
- Journal of KIISE:Computer Systems and Theory
- /
- v.33 no.12
- /
- pp.867-879
- /
- 2006
This paper concerns fast and time accurate HW/SW cosimulation for MPSoC(Multi-Processor System-on-chip) architecture where multiple software and/or hardware components exist. It is becoming more and more common to use MPSoC architecture to design complex embedded systems. In cosimulation of such architecture, as the number of the component simulators participating in the cosimulation increases, the time synchronization overhead among simulators increases, thereby resulting in low overall cosimulation performance. Although SystemC cosimulation frameworks show high cosimulation performance, it is in inverse proportion to the number of simulators. In this paper, we extend the novel technique, called virtual synchronization, which boosts cosimulation speed by reducing time synchronization overhead: (1) SystemC simulation is supported seamlessly in the virtual synchronization framework without requiring the modification on SystemC kernel (2) Parallel execution of component simulators with virtual synchronization is supported. We compared the performance and accuracy of the proposed parallel SystemC cosimulation framework with MaxSim, a well-known commercial SystemC cosimulation framework, and the proposed one showed 11 times faster performance for H.263 decoder example, while the accuracy was maintained below 5%.
PDF KSCI

A Level One Cache Organization for Chip-Size Limited Single Processor (칩의 크기가 제한된 단일칩 프로세서를 위한 레벨 1 캐시구조)

Ju YoungKwan;Kim Sukil
- The KIPS Transactions:PartA
- /
- v.12A no.2 s.92
- /
- pp.127-136
- /
- 2005
This paper measured a proper ratio of the size of demand fetch cache $L_1$ to that of prefetch cache $L_P$ by imulation when the size of $L_1$ and $L_P$ are constant which organize space-limited level 1 cache of a single microprocessor chip. The analysis of our experiment showed that in the condition of the sum of the size of $L_1$ and $L_P$ are 16 KB, the level 1 cache organization by constituting $L_P$ with 4 KB and employing OBL and FIFO as a prefetch technique and a cache replacement policy respectively resulted in the best performance. Also, this analysis showed that in the condition of the sum of the size of $L_1$ and $L_P$ are over 32 KB, employing dynamic filtering as prefetch technique of $L_P$ are more advantageous and splitting level 1 cache by constituting $L_1$ with 28 KB and $L_P$ with 4 KB in the case of 32 KB of space are available, by constituting $L_1$ with 48 KB and $L_P$ with 16 KB in the case of 64 KB elicited the best performance.
https://doi.org/10.3745/KIPSTA.2005.12A.2.127 인용 PDF KSCI

Low Power Digital Servo Architecture for Optical Disc (광디스크 디지털 서보의 저전력 구현 아키텍쳐)

Huh, Jun-Ho;Kim, Soo-Won
- Journal of the Institute of Electronics Engineers of Korea SC
- /
- v.38 no.2
- /
- pp.31-37
- /
- 2001
Digital servo implementation in optical servo chip has been spotlighted since it is easy to integrate with other blocks and it has less sensitive characteristics change in terms of temperature variation and better flexibility to the system variation like pick-up. Therefore, Optical disc players adopted digital servo are increasing in market. However, one drawback of digital signal processor embedded digital servo is power consumption that is one of the most important factors of portable optical disc player system. For that reason, this paper introduces new architecture to reduce power consumption of digital servo by means of reducing DSP load but increasing minimum hardware size. The main idea of reducing power consumption of digital servo greatly is utilizing CDP characteristics as most operations are done and used up most operating steps of DSP at the initial time, but most power consumption is occurred in play mode. Therefore, if operating steps for digital filtering in play mode could be reduced greatly, power consumption of overall system can be reduced greatly. This paper shows an example that low power digital servo architecture whose current is reduced almost 83%, compare to that of digital servo which is not applied by the low power architecture introduced in this paper.
PDF

Design and Implementation of a Main-Memory Database System for Real-time Mobile GIS Application (실시간 모바일 GIS 응용 구축을 위한 주기억장치 데이터베이스 시스템 설계 및 구현)

Kang, Eun-Ho;Yun, Suk-Woo;Kim, Kyung-Chang
- The KIPS Transactions:PartD
- /
- v.11D no.1
- /
- pp.11-22
- /
- 2004
As random access memory chip gets cheaper, it becomes affordable to realize main memory-based database systems. Consequently, reducing cache misses emerges as the most important issue in current main memory databases, in which CPU speeds have been increasing at 60% per year, compared to the memory speeds at 10% per you. In this paper, we design and implement a main-memory database system for real-time mobile GIS. Our system is composed of 5 modules： the interface manager provides the interface for PDA users； the memory data manager controls spatial and non-spatial data in main-memory using virtual memory techniques； the query manager processes spatial and non-spatial query : the index manager manages the MR-tree index for spatial data and the T-tree index for non-spatial index : the GIS server interface provides the interface with disk-based GIS. The MR-tree proposed propagates node splits upward only if one of the internal nodes on the insertion path has empty space. Thus, the internal nodes of the MR-tree are almost 100% full. Our experimental study shows that the two-dimensional MR-tree performs search up to 2.4 times faster than the ordinary R-tree. To use virtual memory techniques, the memory data manager uses page tables for spatial data, non- spatial data, T-tree and MR-tree. And, it uses indirect addressing techniques for fast reloading from disk.
https://doi.org/10.3745/KIPSTD.2004.11D.1.011 인용 PDF KSCI

Memory Reduction of IFFT Using Combined Integer Mapping for OFDM Transmitters (CIM(Combined Integer Mapping)을 이용한 OFDM 송신기의 IFFT 메모리 감소)

Lee, Jae-Kyung;Jang, In-Gul;Chung, Jin-Gyun;Lee, Chul-Dong
- Journal of the Institute of Electronics Engineers of Korea TC
- /
- v.47 no.10
- /
- pp.36-42
- /
- 2010
FFT(Fast Fourier Transform) processor is one of the key components in the implementation of OFDM systems for many wireless standards such as IEEE 802.22. To improve the performances of FFT processors, various studies have been carried out to reduce the complexities of multipliers, memory interface, control schemes and so on. While the number of FFT stages increases logarithmically $log_2N$) as the FFT point-size (N) increases, the number of required registers (or, memories) increases linearly. In large point-size FFT designs, the registers occupy more than 70% of the chip area. In this paper, to reduce the memory size of IFFT for OFDM transmitters, we propose a new IFFT design method based on a combined mapping of modulated data, pilot and null signals. The proposed method focuses on reducing the sizes of the registers in the first two stages of the IFFT architectures since the first two stages require 75% of the total registers. By simulations of 2048-point IFFT design for cognitive radio systems, it is shown that the proposed IFFT design method achieves more than 38.5% area reduction compared with previous IFFT designs.
PDF KSCI

Search Result 109, Processing Time 0.022 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)