• Title/Summary/Keyword: Memory Bandwidth

Search Result 240, Processing Time 0.023 seconds

Performance Comparison of GMM and HMM Approaches for Bandwidth Extension of Speech Signals (음성신호의 대역폭 확장을 위한 GMM 방법 및 HMM 방법의 성능평가)

  • Song, Geun-Bae;Kim, Austin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.27 no.3
    • /
    • pp.119-128
    • /
    • 2008
  • This paper analyzes the relationship between two representative statistical methods for bandwidth extension (BWE): Gaussian Mixture Model (GMM) and Hidden Markov Model (HMM) ones, and compares their performances. The HMM method is a memory-based system which was developed to take advantage of the inter-frame dependency of speech signals. Therefore, it could be expected to estimate better the transitional information of the original spectra from frame to frame. To verify it, a dynamic measure that is an approximation of the 1st-order derivative of spectral function over time was introduced in addition to a static measure. The comparison result shows that the two methods are similar in the static measure, while, in the dynamic measure, the HMM method outperforms explicitly the GMM one. Moreover, this difference increases in proportion to the number of states of HMM model. This indicates that the HMM method would be more appropriate at least for the 'blind BWE' problem. On the other hand, nevertheless, the GMM method could be treated as a preferable alternative of the HMM one in some applications where the static performance and algorithm complexity are critical.

Bi-directional Bus Architecture Suitable to Multitasking in MPEG System (MPEG 시스템용 다중 작업에 적합한 양방향 버스 구조)

  • Jun Chi-hoon;Yeon Gyu-sung;Hwang Tae-jin;Wee Jae-Kyung
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.42 no.4 s.334
    • /
    • pp.9-18
    • /
    • 2005
  • This paper proposes the novel synchronous segmented bus architecture that has the pipeline bus architecture based on OCP(open core protocol) and the memory-oriented bus for MPEG system. The proposed architecture has bus architectures that support the memory interface for image data processing of MPEG system. Also it has the segmented hi-directional multiple bus architecture for multitasking processing by using multi -masters/multi - slave. In the scheme address of masters and slaves are fixed so that they are arranged for the location of IP cores according to operational characteristics of the system for efficient data processing. Also the bus architecture adopts synchronous segmented bus architecture for reuse of IP's and architecture or developed chips. This feature is suitable to the high performance and low power multimedia SoC systum by inherent characteristics of multitasking operation and segmented bus. Proposed bus architecture can have up to 3.7 times improvement in the effective bandwidth md up to 4 times reduction in the communication latency.

A Design of 256GB volume DRAM-based SSD(Solid State Drive) (256GB 용량 DRAM기반 SSD의 설계)

  • Ko, Dea-Sik;Jeong, Seung-Kook
    • Journal of Advanced Navigation Technology
    • /
    • v.13 no.4
    • /
    • pp.509-514
    • /
    • 2009
  • In this paper, we designed and analyzed 256GB DRAM-based SSD storage using DDR1 memory and PCI-e interface. SSD is a storage system that uses DRAM or NAND Flash as primary storage media. Since the SSD read and write data directly to memory chips, which results in storage speeds far greater than conventional magnetic storage devices, HDD. Architecture of the proposed SSD system has performance of high speed data processing duo to use multiple RAM disks as primary storage and PCI-e interface bus as communication path of RAM disks. We constructed experimental system with UNIX, Windows/Linux server, SAN Switch, and Ethernet Switch and measured IOPS and bandwidth of proposed SSD using IOmeter. In experimental results, it has been shown that IOPS, 470,000 and bandwidth,800MB/sec of the DDR-1 SSD is better than those of the HDD and Flash-based SSD.

  • PDF

A Design of Pipeline Chain Algorithm Based on Circuit Switching for MPI Broadcast Communication System (MPI 브로드캐스트 통신을 위한 서킷 스위칭 기반의 파이프라인 체인 알고리즘 설계)

  • Yun, Heejun;Chung, Wonyoung;Lee, Yong-Surk
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.37B no.9
    • /
    • pp.795-805
    • /
    • 2012
  • This paper proposes an algorithm and a hardware architecture for a broadcast communication which has the worst bottleneck among multiprocessor using distributed memory architectures. In conventional system, The pipelined broadcast algorithm is an algorithm which takes advantage of maximum bandwidth of communication bus. But unnecessary synchronization process are repeated, because the pipelined broadcast sends the data divided into many parts. In this paper, the MPI unit for pipeline chain algorithm based on circuit switching removing the redundancy of synchronization process was designed, the proposed architecture was evaluated by modeling it with systemC. Consequently, the performance of the proposed architecture was highly improved for broadcast communication up to 3.3 times that of systems using conventional pipelined broadcast algorithm, it can almost take advantage of the maximum bandwidth of transmission bus. Then, it was implemented with VerilogHDL, synthesized with TSMC 0.18um library and implemented into a chip. The area of synthesis results occupied 4,700 gates(2 input NAND gate) and utilization of total area is 2.4%. The proposed architecture achieves improvement in total performance of MPSoC occupying relatively small area.

An Adaptive Buffer Tuning Mechanism for striped transport layer connection on multi-homed mobile host (멀티홈 모바일 호스트상에서 스트라이핑 전송계층 연결을 위한 적응형 버퍼튜닝기법)

  • Khan, Faraz-Idris;Huh, Eui-Nam
    • Journal of Internet Computing and Services
    • /
    • v.10 no.4
    • /
    • pp.199-211
    • /
    • 2009
  • Recent advancements in wireless networks have enabled support for mobile applications to transfer data over heterogeneous wireless paths in parallel using data striping technique [2]. Traditionally, high performance data transfer requires tuning of multiple TCP sockets, at sender's end, based on bandwidth delay product (BDP). Moreover, traditional techniques like Automatic TCP Buffer Tuning (ATBT), which balance memory and fulfill network demand, is designed for wired infrastructure assuming single flow on a single socket. Hence, in this paper we propose a buffer tuning technique at senders end designed to ensure high performance data transfer by striping data at transport layer across heterogeneous wireless paths. Our mechanism has the capability to become a resource management system for transport layer connections running on multi-homed mobile host supporting features for wireless link i.e. mobility, bandwidth fluctuations, link level losses. We show that our proposed mechanism performs better than ATBT, in efficiently utilizing memory and achieving aggregate throughput.

  • PDF

Design of HEVC Motion Estimation Engine with Search Window Data Reuse and Early Termination (탐색 영역 데이터의 재사용 및 조기중단이 가능한 HEVC 움직임 추정 엔진 설계)

  • Hur, Ahrum;Park, Taewook;Lee, Seongsoo
    • Journal of IKEEE
    • /
    • v.20 no.3
    • /
    • pp.273-278
    • /
    • 2016
  • In HEVC variable block size motion estimation, same search window data are duplicatedly used in each block size. It increases memory bandwidth, and it is difficult to exploit early termination. In this paper, largest block size and its corresponding smaller block sizes with same positions are performed at the same time. It reduces memory bandwidth and computation by reusing search window data and computation results. In the early termination, image quality can be degraded when it determines early termination by observing largest block size only, since smaller block sizes cannot be equally terminated due to their relative positions. So, in this paper, processing order of early termination is changed to perform smaller block sizes in turns. The designed motion estimation engine was described in Verilog HDL and it was synthesized and verified in 0.18um process technology. Its gate count and maximum operating frequency are 36,101 gates and 263.15 MHz, respectively.

A VIA-based RDMA Mechanism for High Performance PC Cluster Systems (고성능 PC 클러스터 시스템을 위한 VIA 기반 RDMA 메커니즘 구현)

  • Jung In-Hyung;Chung Sang-Hwa;Park Sejin
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.31 no.11
    • /
    • pp.635-642
    • /
    • 2004
  • The traditional communication protocols such as TCP/IP are not suitable for PC cluster systems because of their high software processing overhead. To eliminate this overhead, industry leaders have defined the Virtual Interface Architecture (VIA). VIA provides two different data transfer mechanisms, a traditional Send/Receive model and the Remote Direct Memory Access (RDMA) model. RDMA is extremely efficient way to reduce software overhead because it can bypass the OS and use the network interface controller (NIC) directly for communication, also bypass the CPU on the remote host. In this paper, we have implemented VIA-based RDMA mechanism in hardware. Compared to the traditional Send/Receive model, the RDMA mechanism improves latency and bandwidth. Our RDMA mechanism can also communicate without using remote CPU cycles. Our experimental results show a minimum latency of 12.5${\mu}\textrm{s}$ and a maximum bandwidth of 95.5MB/s. As a result, our RDMA mechanism allows PC cluster systems to have a high performance communication method.

Design for Enhanced Precision in 300 mm Wafer Full-Field TTV Measurement (300 mm 웨이퍼의 전영역 TTV 측정 정밀도 향상을 위한 모듈 설계)

  • An-Mok Jeong;Hak-Jun Lee
    • Journal of the Microelectronics and Packaging Society
    • /
    • v.30 no.3
    • /
    • pp.88-93
    • /
    • 2023
  • As the demand for High Bandwidth Memory (HBM) increases and the handling capability of larger wafers expands, ensuring reliable Total Thickness Variation (TTV) measurement for stacked wafers becomes essential. This study presents the design of a measurement module capable of measuring TTV across the entire area of a 300mm wafer, along with estimating potential mechanical measurement errors. The module enables full-area measurement by utilizing a center chuck and lift pin for wafer support. Modal analysis verifies the structural stability of the module, confirming that both the driving and measuring parts were designed with stiffness exceeding 100 Hz. The mechanical measurement error of the designed module was estimated, resulting in a predicted measurement error of 1.34 nm when measuring the thickness of a bonding wafer with a thickness of 1,500 ㎛.

4-way Search Window for Improving The Memory Bandwidth of High-performance 2D PE Architecture in H.264 Motion Estimation (H.264 움직임추정에서 고속 2D PE 아키텍처의 메모리대역폭 개선을 위한 4-방향 검색윈도우)

  • Ko, Byung-Soo;Kong, Jin-Hyeung
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.46 no.6
    • /
    • pp.6-15
    • /
    • 2009
  • In this paper, a new 4-way search window is designed for the high-performance 2D PE architecture in H.264 Motion Estimation(ME) to improve the memory bandwidth. While existing 2D PE architectures reuse the overlapped data of adjacent search windows scanned in 1 or 3-way, the new window utilizes the overlapped data of adjacent search windows as well as adjacent multiple scanning (window) paths to enhance the reusage of retrieved search window data. In order to scan adjacent windows and multiple paths instead of single raster and zigzag scanning of adjacent windows, bidirectional row and column window scanning results in the 4-way(up. down, left, right) search window. The proposed 4-way search window could improve the reuse of overlapped window data to reduce the redundancy access factor by 3.1, though the 1/3-way search window redundantly requires $7.7{\sim}11$ times of data retrieval. Thus, the new 4-way search window scheme enhances the memory bandwidth by $70{\sim}58%$ compared with 1/3-way search window. The 2D PE architecture in H.264 ME for 4-way search window consists of $16{\times}16$ pe array. computing the absolute difference between current and reference frames, and $5{\times}16$ reusage array, storing the overlapped data of adjacent search windows and multiple scanning paths. The reference data could be loaded upward and downward into the new 2D PE depending on scanning direction, and the reusage array is combined with the pe array rotating left as well as right to utilize the overlapped data of adjacent multiple scan paths. In experiments, the new implementation of 4-way search window on Magnachip 0.18um could deal with the HD($1280{\times}720$) video of 1 reference frame, $48{\times}48$ search area and $16{\times}16$ macroblock by 30fps at 149.25MHz.

BIM Geometry Cache Structure for Data Streaming with Large Volume (대용량 BIM 형상 데이터 스트리밍을 위한 캐쉬 구조)

  • Kang, Tae-Wook
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.18 no.9
    • /
    • pp.1-8
    • /
    • 2017
  • The purpose of this study is to propose a cache structure for processing large-volume building information modeling (BIM) geometry data,whereit is difficult to allocate physical memory. As the number of BIM orders has increased in the public sector, it is becoming more common to visualize and calculate large-volume BIM geometry data. Design and review collaboration can require a lot of time to download large-volume BIM data through the network. If the BIM data exceeds the physical free-memory limit, visualization and geometry computation cannot be possible. In order to utilize large amounts of BIM data on insufficient physical memory or a low-bandwidth network, it is advantageous to cache only the data necessary for BIM geometry rendering and calculation time. Thisstudy proposes acache structure for efficiently rendering and calculating large-volume BIM geometry data where it is difficult to allocate enough physical memory.