• Title/Summary/Keyword: Processing-in-Memory

Search Result 1,850, Processing Time 0.024 seconds

A Design of 4×4 Block Parallel Interpolation Motion Compensation Architecture for 4K UHD H.264/AVC Decoder (4K UHD급 H.264/AVC 복호화기를 위한 4×4 블록 병렬 보간 움직임보상기 아키텍처 설계)

  • Lee, Kyung-Ho;Kong, Jin-Hyeung
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.5
    • /
    • pp.102-111
    • /
    • 2013
  • In this paper, we proposed a $4{\times}4$ block parallel architecture of interpolation for high-performance H.264/AVC Motion Compensation in 4K UHD($3840{\times}2160$) video real time processing. To improve throughput, we design $4{\times}4$ block parallel interpolation. For supplying the $9{\times}9$ reference data for interpolation, we design 2D cache buffer which consists of the $9{\times}9$ memory arrays. We minimize redundant storage of the reference pixel by applying the Search Area Stripe Reuse scheme(SASR), and implement high-speed plane interpolator with 3-stage pipeline(Horizontal Vertical 1/2 interpolation, Diagonal 1/2 interpolation, 1/4 interpolation). The proposed architecture was simulated in 0.13um standard cell library. The maximum operation frequency is 150MHz. The gate count is 161Kgates. The proposed H.264/AVC Motion Compensation can support 4K UHD at 72 frames per second by running at 150MHz.

Algorithm to Search for the Original Song from a Cover Song Using Inflection Points of the Melody Line (멜로디 라인의 변곡점을 활용한 커버곡의 원곡 검색 알고리즘)

  • Lee, Bo Hyun;Kim, Myung
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.10 no.5
    • /
    • pp.195-200
    • /
    • 2021
  • Due to the development of video sharing platforms, the amount of video uploads is exploding. Such videos often include various types of music, among which cover songs are included. In order to protect the copyright of music, an algorithm to find the original song of the cover song is essential. However, it is not easy to find the original song because the cover song is a modification of the composition, speed and overall structure of the original song. So far, there is no known effective algorithm for searching the original song of the cover song. In this paper, we propose an algorithm for searching the original song of the cover song using the inflection points of the melody line. Inflection points represent the characteristic points of change in the melody sequence. The proposed algorithm compares the original song and the cover song using the sequence of inflection points for the representative phrase of the original song. Since the characteristics of the representative phrase are used, even if the cover song is a song made by modifying the overall composition of the song, the algorithm's search performance is excellent. Also, since the proposed algorithm uses only the features of the inflection point sequence, the memory usage is very low. The efficiency of the algorithm was verified through performance evaluation.

Comparison of Korean Real-time Text-to-Speech Technology Based on Deep Learning (딥러닝 기반 한국어 실시간 TTS 기술 비교)

  • Kwon, Chul Hong
    • The Journal of the Convergence on Culture Technology
    • /
    • v.7 no.1
    • /
    • pp.640-645
    • /
    • 2021
  • The deep learning based end-to-end TTS system consists of Text2Mel module that generates spectrogram from text, and vocoder module that synthesizes speech signals from spectrogram. Recently, by applying deep learning technology to the TTS system the intelligibility and naturalness of the synthesized speech is as improved as human vocalization. However, it has the disadvantage that the inference speed for synthesizing speech is very slow compared to the conventional method. The inference speed can be improved by applying the non-autoregressive method which can generate speech samples in parallel independent of previously generated samples. In this paper, we introduce FastSpeech, FastSpeech 2, and FastPitch as Text2Mel technology, and Parallel WaveGAN, Multi-band MelGAN, and WaveGlow as vocoder technology applying non-autoregressive method. And we implement them to verify whether it can be processed in real time. Experimental results show that by the obtained RTF all the presented methods are sufficiently capable of real-time processing. And it can be seen that the size of the learned model is about tens to hundreds of megabytes except WaveGlow, and it can be applied to the embedded environment where the memory is limited.

Radix-4 Trellis Parallel Architecture and Trace Back Viterbi Decoder with Backward State Transition Control (Radix-4 트렐리스 병렬구조 및 역방향 상태천이의 제어에 의한 역추적 비터비 디코더)

  • 정차근
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.40 no.5
    • /
    • pp.397-409
    • /
    • 2003
  • This paper describes an implementation of radix-4 trellis parallel architecture and backward state transition control trace back Viterbi decoder, and presents the application results to high speed wireless LAN. The radix-4 parallelized architecture Vietrbi decoder can not only improve the throughput with simple structure, but also have small processing delay time and overhead circuit compared to M-step trellis architecture one. Based on these features, this paper addresses a novel Viterbi decoder which is composed of branch metric computation, architecture of ACS and trace back decoding by sequential control of backward state transition for the implementation of radix-4 trellis parallelized structure. With the proposed architecture, the decoding of variable code rate due to puncturing the base code can easily be implemented by the unified Viterbi decoder. Moreover, any additional circuit and/or peripheral control logic are not required in the proposed decoder architecture. The trace back decoding scheme with backward state transition control can carry out the sequential decoding according to ACS cycle clock without additional circuit for survivor memory control. In order to evaluate the usefulness, the proposed method is applied to channel CODEC of the IEEE 802.11a high speed wireless LAN, and HDL coding simulation results are presented.

Design and Verification of PCI 2.2 Target Controller to support Prefetch Request (프리페치 요구를 지원하는 PCI 2.2 타겟 컨트롤러 설계 및 검증)

  • Hyun Eugin;Seong Kwang-Su
    • The KIPS Transactions:PartA
    • /
    • v.12A no.6 s.96
    • /
    • pp.523-530
    • /
    • 2005
  • When a PCI 2.2 bus master requests data using Memory Read command, a target device may hold PCI bus without data to be transferred for long time because a target device needs time to prepare data infernally. Because the usage efficiency of the PCI bus and the data transfer efficiency are decreased due to this situation, the PCI specification recommends to use the Delayed Transaction mechanism to improve the system performance. But the mechanism cann't fully improve performance because a target device doesn't know the exact size of prefetched data. In the previous work, we propose a new method called Prefetch Request when a bus master intends to read data from the target device. In this paper, we design PCI 2.2 controller and local device that support the proposed method. The designed PCI 2.2 controller has simple local interface and it is used to convert the PCI protocol into the local protocol. So the typical users, who don't know the PCI protocol, can easily design the PCI target device using the proposed PCI controller. We propose the basic behavioral verification, hardware design verification, and random test verification to verify the designed hardware. We also build the test bench and define assembler instructions. And we propose random testing environment, which consist of reference model, random generator ,and compare engine, to efficiently verify corner case. This verification environment is excellent to find error which is not detected by general test vector. Also, the simulation under the proposed test environment shows that the proposed method has the higher data transfer efficiency than the Delayed Transaction about $9\%$.

Data Cache System based on the Selective Bank Algorithm for Embedded System (내장형 시스템을 위한 선택적 뱅크 알고리즘을 이용한 데이터 캐쉬 시스템)

  • Jung, Bo-Sung;Lee, Jung-Hoon
    • The KIPS Transactions:PartA
    • /
    • v.16A no.2
    • /
    • pp.69-78
    • /
    • 2009
  • One of the most effective way to improve cache performance is to exploit both temporal and spatial locality given by any program executive characteristics. In this paper we present a high performance and low power cache structure with a bank selection mechanism that enhances exploitation of spatial and temporal locality. The proposed cache system consists of two parts, i.e., a main direct-mapped cache with a small block size and a fully associative buffer with a large block size as a multiple of the small block size. Especially, the main direct-mapped cache is constructed as two banks for low power consumption and stores a small block which is selected from fully associative buffer by the proposed bank selection algorithm. By using the bank selection algorithm and three state bits, We selectively extend the lifetime of those small blocks with high temporal locality by storing them in the main direct-mapped caches. This approach effectively reduces conflict misses and cache pollution at the same time. According to the simulation results, the average miss ratio, compared with the Victim and STAS caches with the same size, is improved by about 23% and 32% for Mibench applications respectively. The average memory access time is reduced by about 14% and 18% compared with the he victim and STAS caches respectively. It is also shown that energy consumption of the proposed cache is around 10% lower than other cache systems that we examine.

A Service Architecture to support IP Multicast Service over UNI 4.0 based ATM Networks (UNI 4.0 기반 ATM 망에서의 IP 멀티캐스트 지원 방안을 위한 서비스 구조)

  • Lee, Mee-Jeong;Jung, Sun;Kim, Ye-kyung
    • Journal of KIISE:Information Networking
    • /
    • v.27 no.3
    • /
    • pp.348-359
    • /
    • 2000
  • Most of the important real time multimedia applications require multipoint transmissions. To support these applications in ATM based Intermet environments, it is important to provide efficient IP multicast transportations over ATM networks. IETF proposed MARS(Multicast Address Resolution Server) as the service architecture to transport connectionless IP multicast flows over connection oriented ATM VCs. MARS assumes UNI3.0/3.1 signalling. Since UNI3.0/3.1 does not provide any means for receivers to request a join for a multicast ATM VC, MARS provides overlay service to relay join request from IP multicast group members to the sources of the multicast group. Later on, ATM Forum standardized UNI4.0 signalling which is provisioned with a new signalling mechanism called LIJ(Leaf Initiated Join). LIJ enables receivers to directly signal the source of an ATM VC to join. In this paper, we propose a new service architecture providing IP multicast flow transportation over ATM networks deploying UNI4.0 signalling. The proposed architecture is named UNI4MARS. It comprises service components same as those of the MARS. The main functionality provided by the UNI4MARS is to provide source information to the receivers so that the receivers may exploit LIJ to join multicast ATM VCs dynamically. The implementation overhead of UNI4MARS and that of MARS are compared by a course of simulations. The simulation results show that the UNI4MARS supports the dynamic IP multicast group changes more efficiently with respect to processing, memory and bandwidth overhead.

  • PDF

A Security SoC embedded with ECDSA Hardware Accelerator (ECDSA 하드웨어 가속기가 내장된 보안 SoC)

  • Jeong, Young-Su;Kim, Min-Ju;Shin, Kyung-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.7
    • /
    • pp.1071-1077
    • /
    • 2022
  • A security SoC that can be used to implement elliptic curve cryptography (ECC) based public-key infrastructures was designed. The security SoC has an architecture in which a hardware accelerator for the elliptic curve digital signature algorithm (ECDSA) is interfaced with the Cortex-A53 CPU using the AXI4-Lite bus. The ECDSA hardware accelerator, which consists of a high-performance ECC processor, a SHA3 hash core, a true random number generator (TRNG), a modular multiplier, BRAM, and control FSM, was designed to perform the high-performance computation of ECDSA signature generation and signature verification with minimal CPU control. The security SoC was implemented in the Zynq UltraScale+ MPSoC device to perform hardware-software co-verification, and it was evaluated that the ECDSA signature generation or signature verification can be achieved about 1,000 times per second at a clock frequency of 150 MHz. The ECDSA hardware accelerator was implemented using hardware resources of 74,630 LUTs, 23,356 flip-flops, 32kb BRAM, and 36 DSP blocks.

An Efficient WLAN Device Power Control Technique for Streaming Multimedia Contents over Mobile IP Storage (모바일 IP 스토리지 상에서 멀티미디어 컨텐츠 실행을 위한 효율적인 무선랜 장치 전력제어 기법)

  • Nam, Young-Jin;Choi, Min-Seok
    • The KIPS Transactions:PartA
    • /
    • v.16A no.5
    • /
    • pp.357-368
    • /
    • 2009
  • Mobile IP storage has been proposed to overcome storage limitation in the flash memory and hard disks. It provides almost capacity-free space for mobile devices over wireless IP networks. However, battery lifetime of the mobile devices is reduced rapidly because of power consumption with continuous use of a WLAN device when multimedia contents are being streamed through the mobile IP storage. This paper proposes an energy-efficient WLAN device power control technique for streaming multimedia contents with the mobile IP storage. The proposed technique consists of a prefetch buffer input/output module, a WLAN device power control module, and a reconfigurable prefetch buffer module. Besides, it adaptively determines the size of the prefetch buffer according to a quality of the multimedia contents, and it dynamically controls the power mode of the WLAN device on the basis of power on-off operations while streaming the multimedia contents. We evaluate the performance of the proposed technique on a PXA270-based mobile device that employs the embedded linux 2.6.11, Intel iSCSI reference codes, and a WLAN device. Extensive experiments reveal that the proposed technique can save the energy consumption of the WLAN device up to 8.5 times with QVGA multimedia contents, as compared with no power control.

Adverse Effects on EEGs and Bio-Signals Coupling on Improving Machine Learning-Based Classification Performances

  • SuJin Bak
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.10
    • /
    • pp.133-153
    • /
    • 2023
  • In this paper, we propose a novel approach to investigating brain-signal measurement technology using Electroencephalography (EEG). Traditionally, researchers have combined EEG signals with bio-signals (BSs) to enhance the classification performance of emotional states. Our objective was to explore the synergistic effects of coupling EEG and BSs, and determine whether the combination of EEG+BS improves the classification accuracy of emotional states compared to using EEG alone or combining EEG with pseudo-random signals (PS) generated arbitrarily by random generators. Employing four feature extraction methods, we examined four combinations: EEG alone, EG+BS, EEG+BS+PS, and EEG+PS, utilizing data from two widely-used open datasets. Emotional states (task versus rest states) were classified using Support Vector Machine (SVM) and Long Short-Term Memory (LSTM) classifiers. Our results revealed that when using the highest accuracy SVM-FFT, the average error rates of EEG+BS were 4.7% and 6.5% higher than those of EEG+PS and EEG alone, respectively. We also conducted a thorough analysis of EEG+BS by combining numerous PSs. The error rate of EEG+BS+PS displayed a V-shaped curve, initially decreasing due to the deep double descent phenomenon, followed by an increase attributed to the curse of dimensionality. Consequently, our findings suggest that the combination of EEG+BS may not always yield promising classification performance.