• Title/Summary/Keyword: Processing-in-Memory

Search Result 1,836, Processing Time 0.03 seconds

Design and Verification of PCI 2.2 Target Controller to support Prefetch Request (프리페치 요구를 지원하는 PCI 2.2 타겟 컨트롤러 설계 및 검증)

  • Hyun Eugin;Seong Kwang-Su
    • The KIPS Transactions:PartA
    • /
    • v.12A no.6 s.96
    • /
    • pp.523-530
    • /
    • 2005
  • When a PCI 2.2 bus master requests data using Memory Read command, a target device may hold PCI bus without data to be transferred for long time because a target device needs time to prepare data infernally. Because the usage efficiency of the PCI bus and the data transfer efficiency are decreased due to this situation, the PCI specification recommends to use the Delayed Transaction mechanism to improve the system performance. But the mechanism cann't fully improve performance because a target device doesn't know the exact size of prefetched data. In the previous work, we propose a new method called Prefetch Request when a bus master intends to read data from the target device. In this paper, we design PCI 2.2 controller and local device that support the proposed method. The designed PCI 2.2 controller has simple local interface and it is used to convert the PCI protocol into the local protocol. So the typical users, who don't know the PCI protocol, can easily design the PCI target device using the proposed PCI controller. We propose the basic behavioral verification, hardware design verification, and random test verification to verify the designed hardware. We also build the test bench and define assembler instructions. And we propose random testing environment, which consist of reference model, random generator ,and compare engine, to efficiently verify corner case. This verification environment is excellent to find error which is not detected by general test vector. Also, the simulation under the proposed test environment shows that the proposed method has the higher data transfer efficiency than the Delayed Transaction about $9\%$.

Data Cache System based on the Selective Bank Algorithm for Embedded System (내장형 시스템을 위한 선택적 뱅크 알고리즘을 이용한 데이터 캐쉬 시스템)

  • Jung, Bo-Sung;Lee, Jung-Hoon
    • The KIPS Transactions:PartA
    • /
    • v.16A no.2
    • /
    • pp.69-78
    • /
    • 2009
  • One of the most effective way to improve cache performance is to exploit both temporal and spatial locality given by any program executive characteristics. In this paper we present a high performance and low power cache structure with a bank selection mechanism that enhances exploitation of spatial and temporal locality. The proposed cache system consists of two parts, i.e., a main direct-mapped cache with a small block size and a fully associative buffer with a large block size as a multiple of the small block size. Especially, the main direct-mapped cache is constructed as two banks for low power consumption and stores a small block which is selected from fully associative buffer by the proposed bank selection algorithm. By using the bank selection algorithm and three state bits, We selectively extend the lifetime of those small blocks with high temporal locality by storing them in the main direct-mapped caches. This approach effectively reduces conflict misses and cache pollution at the same time. According to the simulation results, the average miss ratio, compared with the Victim and STAS caches with the same size, is improved by about 23% and 32% for Mibench applications respectively. The average memory access time is reduced by about 14% and 18% compared with the he victim and STAS caches respectively. It is also shown that energy consumption of the proposed cache is around 10% lower than other cache systems that we examine.

A Service Architecture to support IP Multicast Service over UNI 4.0 based ATM Networks (UNI 4.0 기반 ATM 망에서의 IP 멀티캐스트 지원 방안을 위한 서비스 구조)

  • Lee, Mee-Jeong;Jung, Sun;Kim, Ye-kyung
    • Journal of KIISE:Information Networking
    • /
    • v.27 no.3
    • /
    • pp.348-359
    • /
    • 2000
  • Most of the important real time multimedia applications require multipoint transmissions. To support these applications in ATM based Intermet environments, it is important to provide efficient IP multicast transportations over ATM networks. IETF proposed MARS(Multicast Address Resolution Server) as the service architecture to transport connectionless IP multicast flows over connection oriented ATM VCs. MARS assumes UNI3.0/3.1 signalling. Since UNI3.0/3.1 does not provide any means for receivers to request a join for a multicast ATM VC, MARS provides overlay service to relay join request from IP multicast group members to the sources of the multicast group. Later on, ATM Forum standardized UNI4.0 signalling which is provisioned with a new signalling mechanism called LIJ(Leaf Initiated Join). LIJ enables receivers to directly signal the source of an ATM VC to join. In this paper, we propose a new service architecture providing IP multicast flow transportation over ATM networks deploying UNI4.0 signalling. The proposed architecture is named UNI4MARS. It comprises service components same as those of the MARS. The main functionality provided by the UNI4MARS is to provide source information to the receivers so that the receivers may exploit LIJ to join multicast ATM VCs dynamically. The implementation overhead of UNI4MARS and that of MARS are compared by a course of simulations. The simulation results show that the UNI4MARS supports the dynamic IP multicast group changes more efficiently with respect to processing, memory and bandwidth overhead.

  • PDF

A Security SoC embedded with ECDSA Hardware Accelerator (ECDSA 하드웨어 가속기가 내장된 보안 SoC)

  • Jeong, Young-Su;Kim, Min-Ju;Shin, Kyung-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.7
    • /
    • pp.1071-1077
    • /
    • 2022
  • A security SoC that can be used to implement elliptic curve cryptography (ECC) based public-key infrastructures was designed. The security SoC has an architecture in which a hardware accelerator for the elliptic curve digital signature algorithm (ECDSA) is interfaced with the Cortex-A53 CPU using the AXI4-Lite bus. The ECDSA hardware accelerator, which consists of a high-performance ECC processor, a SHA3 hash core, a true random number generator (TRNG), a modular multiplier, BRAM, and control FSM, was designed to perform the high-performance computation of ECDSA signature generation and signature verification with minimal CPU control. The security SoC was implemented in the Zynq UltraScale+ MPSoC device to perform hardware-software co-verification, and it was evaluated that the ECDSA signature generation or signature verification can be achieved about 1,000 times per second at a clock frequency of 150 MHz. The ECDSA hardware accelerator was implemented using hardware resources of 74,630 LUTs, 23,356 flip-flops, 32kb BRAM, and 36 DSP blocks.

An Efficient WLAN Device Power Control Technique for Streaming Multimedia Contents over Mobile IP Storage (모바일 IP 스토리지 상에서 멀티미디어 컨텐츠 실행을 위한 효율적인 무선랜 장치 전력제어 기법)

  • Nam, Young-Jin;Choi, Min-Seok
    • The KIPS Transactions:PartA
    • /
    • v.16A no.5
    • /
    • pp.357-368
    • /
    • 2009
  • Mobile IP storage has been proposed to overcome storage limitation in the flash memory and hard disks. It provides almost capacity-free space for mobile devices over wireless IP networks. However, battery lifetime of the mobile devices is reduced rapidly because of power consumption with continuous use of a WLAN device when multimedia contents are being streamed through the mobile IP storage. This paper proposes an energy-efficient WLAN device power control technique for streaming multimedia contents with the mobile IP storage. The proposed technique consists of a prefetch buffer input/output module, a WLAN device power control module, and a reconfigurable prefetch buffer module. Besides, it adaptively determines the size of the prefetch buffer according to a quality of the multimedia contents, and it dynamically controls the power mode of the WLAN device on the basis of power on-off operations while streaming the multimedia contents. We evaluate the performance of the proposed technique on a PXA270-based mobile device that employs the embedded linux 2.6.11, Intel iSCSI reference codes, and a WLAN device. Extensive experiments reveal that the proposed technique can save the energy consumption of the WLAN device up to 8.5 times with QVGA multimedia contents, as compared with no power control.

Adverse Effects on EEGs and Bio-Signals Coupling on Improving Machine Learning-Based Classification Performances

  • SuJin Bak
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.10
    • /
    • pp.133-153
    • /
    • 2023
  • In this paper, we propose a novel approach to investigating brain-signal measurement technology using Electroencephalography (EEG). Traditionally, researchers have combined EEG signals with bio-signals (BSs) to enhance the classification performance of emotional states. Our objective was to explore the synergistic effects of coupling EEG and BSs, and determine whether the combination of EEG+BS improves the classification accuracy of emotional states compared to using EEG alone or combining EEG with pseudo-random signals (PS) generated arbitrarily by random generators. Employing four feature extraction methods, we examined four combinations: EEG alone, EG+BS, EEG+BS+PS, and EEG+PS, utilizing data from two widely-used open datasets. Emotional states (task versus rest states) were classified using Support Vector Machine (SVM) and Long Short-Term Memory (LSTM) classifiers. Our results revealed that when using the highest accuracy SVM-FFT, the average error rates of EEG+BS were 4.7% and 6.5% higher than those of EEG+PS and EEG alone, respectively. We also conducted a thorough analysis of EEG+BS by combining numerous PSs. The error rate of EEG+BS+PS displayed a V-shaped curve, initially decreasing due to the deep double descent phenomenon, followed by an increase attributed to the curse of dimensionality. Consequently, our findings suggest that the combination of EEG+BS may not always yield promising classification performance.

Verifying Execution Prediction Model based on Learning Algorithm for Real-time Monitoring (실시간 감시를 위한 학습기반 수행 예측모델의 검증)

  • Jeong, Yoon-Seok;Kim, Tae-Wan;Chang, Chun-Hyon
    • The KIPS Transactions:PartA
    • /
    • v.11A no.4
    • /
    • pp.243-250
    • /
    • 2004
  • Monitoring is used to see if a real-time system provides a service on time. Generally, monitoring for real-time focuses on investigating the current status of a real-time system. To support a stable performance of a real-time system, it should have not only a function to see the current status of real-time process but also a function to predict executions of real-time processes, however. The legacy prediction model has some limitation to apply it to a real-time monitoring. First, it performs a static prediction after a real-time process finished. Second, it needs a statistical pre-analysis before a prediction. Third, transition probability and data about clustering is not based on the current data. We propose the execution prediction model based on learning algorithm to solve these problems and apply it to real-time monitoring. This model gets rid of unnecessary pre-processing and supports a precise prediction based on current data. In addition, this supports multi-level prediction by a trend analysis of past execution data. Most of all, We designed the model to support dynamic prediction which is performed within a real-time process' execution. The results from some experiments show that the judgment accuracy is greater than 80% if the size of a training set is set to over 10, and, in the case of the multi-level prediction, that the prediction difference of the multi-level prediction is minimized if the number of execution is bigger than the size of a training set. The execution prediction model proposed in this model has some limitation that the model used the most simplest learning algorithm and that it didn't consider the multi-regional space model managing CPU, memory and I/O data. The execution prediction model based on a learning algorithm proposed in this paper is used in some areas related to real-time monitoring and control.

Automatic gasometer reading system using selective optical character recognition (관심 문자열 인식 기술을 이용한 가스계량기 자동 검침 시스템)

  • Lee, Kyohyuk;Kim, Taeyeon;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.2
    • /
    • pp.1-25
    • /
    • 2020
  • In this paper, we suggest an application system architecture which provides accurate, fast and efficient automatic gasometer reading function. The system captures gasometer image using mobile device camera, transmits the image to a cloud server on top of private LTE network, and analyzes the image to extract character information of device ID and gas usage amount by selective optical character recognition based on deep learning technology. In general, there are many types of character in an image and optical character recognition technology extracts all character information in an image. But some applications need to ignore non-of-interest types of character and only have to focus on some specific types of characters. For an example of the application, automatic gasometer reading system only need to extract device ID and gas usage amount character information from gasometer images to send bill to users. Non-of-interest character strings, such as device type, manufacturer, manufacturing date, specification and etc., are not valuable information to the application. Thus, the application have to analyze point of interest region and specific types of characters to extract valuable information only. We adopted CNN (Convolutional Neural Network) based object detection and CRNN (Convolutional Recurrent Neural Network) technology for selective optical character recognition which only analyze point of interest region for selective character information extraction. We build up 3 neural networks for the application system. The first is a convolutional neural network which detects point of interest region of gas usage amount and device ID information character strings, the second is another convolutional neural network which transforms spatial information of point of interest region to spatial sequential feature vectors, and the third is bi-directional long short term memory network which converts spatial sequential information to character strings using time-series analysis mapping from feature vectors to character strings. In this research, point of interest character strings are device ID and gas usage amount. Device ID consists of 12 arabic character strings and gas usage amount consists of 4 ~ 5 arabic character strings. All system components are implemented in Amazon Web Service Cloud with Intel Zeon E5-2686 v4 CPU and NVidia TESLA V100 GPU. The system architecture adopts master-lave processing structure for efficient and fast parallel processing coping with about 700,000 requests per day. Mobile device captures gasometer image and transmits to master process in AWS cloud. Master process runs on Intel Zeon CPU and pushes reading request from mobile device to an input queue with FIFO (First In First Out) structure. Slave process consists of 3 types of deep neural networks which conduct character recognition process and runs on NVidia GPU module. Slave process is always polling the input queue to get recognition request. If there are some requests from master process in the input queue, slave process converts the image in the input queue to device ID character string, gas usage amount character string and position information of the strings, returns the information to output queue, and switch to idle mode to poll the input queue. Master process gets final information form the output queue and delivers the information to the mobile device. We used total 27,120 gasometer images for training, validation and testing of 3 types of deep neural network. 22,985 images were used for training and validation, 4,135 images were used for testing. We randomly splitted 22,985 images with 8:2 ratio for training and validation respectively for each training epoch. 4,135 test image were categorized into 5 types (Normal, noise, reflex, scale and slant). Normal data is clean image data, noise means image with noise signal, relfex means image with light reflection in gasometer region, scale means images with small object size due to long-distance capturing and slant means images which is not horizontally flat. Final character string recognition accuracies for device ID and gas usage amount of normal data are 0.960 and 0.864 respectively.

X-tree Diff: An Efficient Change Detection Algorithm for Tree-structured Data (X-tree Diff: 트리 기반 데이터를 위한 효율적인 변화 탐지 알고리즘)

  • Lee, Suk-Kyoon;Kim, Dong-Ah
    • The KIPS Transactions:PartC
    • /
    • v.10C no.6
    • /
    • pp.683-694
    • /
    • 2003
  • We present X-tree Diff, a change detection algorithm for tree-structured data. Our work is motivated by need to monitor massive volume of web documents and detect suspicious changes, called defacement attack on web sites. From this context, our algorithm should be very efficient in speed and use of memory space. X-tree Diff uses a special ordered labeled tree, X-tree, to represent XML/HTML documents. X-tree nodes have a special field, tMD, which stores a 128-bit hash value representing the structure and data of subtrees, so match identical subtrees form the old and new versions. During this process, X-tree Diff uses the Rule of Delaying Ambiguous Matchings, implying that it perform exact matching where a node in the old version has one-to one corrspondence with the corresponding node in the new, by delaying all the others. It drastically reduces the possibility of wrong matchings. X-tree Diff propagates such exact matchings upwards in Step 2, and obtain more matchings downwsards from roots in Step 3. In step 4, nodes to ve inserted or deleted are decided, We aldo show thst X-tree Diff runs on O(n), woere n is the number of noses in X-trees, in worst case as well as in average case, This result is even better than that of BULD Diff algorithm, which is O(n log(n)) in worst case, We experimented X-tree Diff on reat data, which are about 11,000 home pages from about 20 wev sites, instead of synthetic documets manipulated for experimented for ex[erimentation. Currently, X-treeDiff algorithm is being used in a commeercial hacking detection system, called the WIDS(Web-Document Intrusion Detection System), which is to find changes occured in registered websites, and report suspicious changes to users.

GPU Based Feature Profile Simulation for Deep Contact Hole Etching in Fluorocarbon Plasma

  • Im, Yeon-Ho;Chang, Won-Seok;Choi, Kwang-Sung;Yu, Dong-Hun;Cho, Deog-Gyun;Yook, Yeong-Geun;Chun, Poo-Reum;Lee, Se-A;Kim, Jin-Tae;Kwon, Deuk-Chul;Yoon, Jung-Sik;Kim3, Dae-Woong;You, Shin-Jae
    • Proceedings of the Korean Vacuum Society Conference
    • /
    • 2012.08a
    • /
    • pp.80-81
    • /
    • 2012
  • Recently, one of the critical issues in the etching processes of the nanoscale devices is to achieve ultra-high aspect ratio contact (UHARC) profile without anomalous behaviors such as sidewall bowing, and twisting profile. To achieve this goal, the fluorocarbon plasmas with major advantage of the sidewall passivation have been used commonly with numerous additives to obtain the ideal etch profiles. However, they still suffer from formidable challenges such as tight limits of sidewall bowing and controlling the randomly distorted features in nanoscale etching profile. Furthermore, the absence of the available plasma simulation tools has made it difficult to develop revolutionary technologies to overcome these process limitations, including novel plasma chemistries, and plasma sources. As an effort to address these issues, we performed a fluorocarbon surface kinetic modeling based on the experimental plasma diagnostic data for silicon dioxide etching process under inductively coupled C4F6/Ar/O2 plasmas. For this work, the SiO2 etch rates were investigated with bulk plasma diagnostics tools such as Langmuir probe, cutoff probe and Quadruple Mass Spectrometer (QMS). The surface chemistries of the etched samples were measured by X-ray Photoelectron Spectrometer. To measure plasma parameters, the self-cleaned RF Langmuir probe was used for polymer deposition environment on the probe tip and double-checked by the cutoff probe which was known to be a precise plasma diagnostic tool for the electron density measurement. In addition, neutral and ion fluxes from bulk plasma were monitored with appearance methods using QMS signal. Based on these experimental data, we proposed a phenomenological, and realistic two-layer surface reaction model of SiO2 etch process under the overlying polymer passivation layer, considering material balance of deposition and etching through steady-state fluorocarbon layer. The predicted surface reaction modeling results showed good agreement with the experimental data. With the above studies of plasma surface reaction, we have developed a 3D topography simulator using the multi-layer level set algorithm and new memory saving technique, which is suitable in 3D UHARC etch simulation. Ballistic transports of neutral and ion species inside feature profile was considered by deterministic and Monte Carlo methods, respectively. In case of ultra-high aspect ratio contact hole etching, it is already well-known that the huge computational burden is required for realistic consideration of these ballistic transports. To address this issue, the related computational codes were efficiently parallelized for GPU (Graphic Processing Unit) computing, so that the total computation time could be improved more than few hundred times compared to the serial version. Finally, the 3D topography simulator was integrated with ballistic transport module and etch reaction model. Realistic etch-profile simulations with consideration of the sidewall polymer passivation layer were demonstrated.

  • PDF