Search | Korea Science

Memory Delay Comparison between 2D GPU and 3D GPU (2차원 구조 대비 3차원 구조 GPU의 메모리 접근 효율성 분석)

Jeon, Hyung-Gyu;Ahn, Jin-Woo;Kim, Jong-Myon;Kim, Cheol-Hong
- Journal of the Korea Society of Computer and Information
- /
- v.17 no.7
- /
- pp.1-11
- /
- 2012
As process technology scales down, the number of cores integrated into a processor increases dramatically, leading to significant performance improvement. Especially, the GPU(Graphics Processing Unit) containing many cores can provide high computational performance by maximizing the parallelism. In the GPU architecture, the access latency to the main memory becomes one of the major reasons restricting the performance improvement. In this work, we analyze the performance improvement of the 3D GPU architecture compared to the 2D GPU architecture quantitatively and investigate the potential problems of the 3D GPU architecture. In general, memory instructions account for 30% of total instructions, and global/local memory instructions constitutes 60% of total memory instructions. Therefore, the performance of the 3D GPU is expected to be improved significantly compared to the 2D GPU by reducing the delay of memory instructions. However, according to our experimental results, the 3D architecture improves the GPU performance only by 2% compared to the 2D architecture due to the memory bottleneck, since the performance reduction due to memory bottleneck in the 3D GPU architecture increases by 245% compared to the 2D architecture. This paper provides the guideline for suitable memory design by analyzing the efficiency of the memory architecture in 3D GPU architecture.
https://doi.org/10.9708/jksci.2012.17.7.001 인용 PDF KSCI

A Proposal of fast Algorithms of ITU-T G.723.1 for Efficient Multichannel Implementation (효율적인 다채널 구현을 위한 ITU-T G.723,1 음성 부호화기 고속 알고리듬 제안)

정성교;박영철;윤성완;차일환;윤대희
- Proceedings of the Acoustical Society of Korea Conference
- /
- spring
- /
- pp.67-70
- /
- 2000
최근 들어, 인터넷의 폭넓은 보급과 급속한 대중화에 따라 네트워크를 통하여 음성을 전송하거나 저장하려는 시도가 많이 이루어지고 있다. 본 논문에서는 네트워크를 통한 멀티미디어 전송에서 음성부호화 표준으로 널리 상용되는 ITU-T G.723.1 dual-rate speech coder의 효율적인 다채널 구현을 위한 고속 알고리듬을 제안한다. 고속 알고리듬은 부호화 과정에서 많은 계산량을 차지하는 적응 코드북 검색과 고정 코드북 검색 과정에 적용된다. 적응 코드북 검색 과정에서는 지연과 이득을 동시에 찾는 기존의 방법 대신, 지연과 이득을 순차적으로 검색함으로써 계산량을 개선하였다. 전송률에 따라 다른 알고리듬을 사용하는 고정 코드북 검색 과정에서는 다음과 같은 고속 알고리듬을 제안한다. MP-MLQ(Multi-Pulse Maximum Likely Quantization) 방법을 사용하는 높은 전송률(6.3 kbit/s)인 경우, 펄스를 등 간격으로 검색함으로써 계산량을 줄였다. ACELP(Algebraic CELP) 방법을 사용하는 낮은 전송률(5.3 kbit/s)인 경우는 기존의 nested-loop 검색방법 대신, 펄스를 쌍으로 나누어 순차적으로 찾는 depth-first tree 검색 방법을 적용하여 계산량을 감소시켰다. 제안된 고속 알고리듬에 대해 주관적 음질 평가 방법을 수행한 결과, 제안된 방법이 기존의 방법에 비해 음질의 저하가 없음을 확인하였다. 고정 소수점 DSP인 TMS320C6201을 사용하여 고속 알고리듬을 구현한 결과, 높은 전송률의 경우에는 10.29 MIPS, 낮은 전송률의 경우에는 8.70 MIPS의 연산량으로 구현 가능함을 확인하였다.
PDF

Design of serializability Algorithm for Concurrency Control of Multi Transaction in Database (데이터베이스에서 다중 트랜잭션의 동시성 제어를 위한 직렬성 알고리즘 설계)

김홍진;오상엽;김영선
- Journal of the Korea Society of Computer and Information
- /
- v.6 no.2
- /
- pp.1-7
- /
- 2001
The database development is in need or transaction management composed of operations about data, efficiency database management and security of information data in necessity of as well as the new thinking about data security. When users approach data, transaction concurrency is controlled by the users security authentication and security level of data. So, existing secure algorithm occurred the problems which don't satisfy serializability of high level transaction which is delayed high level transaction repeatedly by the low level transaction, because existing secure algorithm is focused on the part which removes the security channel. Therefore this proposed algorithm which prevents waste of resource from the high level transaction reexecution and delay by stopping serializability offense problem by the increase of efficiency of concurrency control.

Fast Elliptic Curve Cryptosystems using Anomalous Bases over Finite Fields (유한체위에서의 근점기저를 이용한 고속 타원곡선 암호법)

Kim, Yong-Tae
- The Journal of the Korea institute of electronic communication sciences
- /
- v.10 no.3
- /
- pp.387-393
- /
- 2015
In Electronic Commerce and Secret Communication based on ECC over finite field, if the sender and the receiver use different basis of finite fields, then the time of communication should always be delayed. In this paper, we analyze the number of bases-transformations needed for Electronic Signature in Electronic Commerce and Secret Communication based on ECC over finite field between H/W and S/W implementation systems and introduce the anomalous basis of finite fields using AOP which is efficient for H/W, S/W implementation systems without bases-transformations for Electronic Commerce and Secret Communication. And then we propose a new multiplier based on the anomalous basis of finite fields using AOP which reduces the running time by 25% than that of the multiplier based on finite fields using trinomial with polynomial bases.
https://doi.org/10.13067/JKIECS.2015.10.3.387 인용 PDF KSCI

Transcoding Algorithm for SMV and AMR Speech Coder (SMV와 AMR 음성부호화기를 위한 상호부호화 알고리즘)

Lee, Duck-Jong;Jeong, Gyu-Hyeok;Lee, In-Sung
- The Journal of the Acoustical Society of Korea
- /
- v.27 no.8
- /
- pp.427-434
- /
- 2008
In this paper, a transcoding algorithm for SMV and AMR speech coder is proposed. In the application requiring the interoperability of different networks, two speech coders must work together with the structure of cascaded connection, tandem. The tandem which is one of the simplest methods has several problems such as long delay, high complexity and the quality degradation due to twice complete encoding/decoding process. These problems can be solved by using transcoding algorithm. The proposed algorithm consists of LSP (Line Spectral Pair) conversion, pitch delay conversion, and fast fixed codebook search. The evaluation results show that the proposed algorithm achieves equivalent speech quality to that of tandem with reduced computational complexity and delay.
https://doi.org/10.7776/ASK.2008.27.8.427 인용 PDF KSCI

Design of High-Speed Parallel Multiplier over Finite Field $GF(2^m)$ (유한체 $GF(2^m)$상의 고속 병렬 승산기의 설계)

Seong Hyeon-Kyeong
- Journal of the Institute of Electronics Engineers of Korea SC
- /
- v.43 no.5 s.311
- /
- pp.36-43
- /
- 2006
In this paper we present a new high-speed parallel multiplier for Performing the bit-parallel multiplication of two polynomials in the finite fields $GF(2^m)$. Prior to construct the multiplier circuits, we consist of the MOD operation part to generate the result of bit-parallel multiplication with one coefficient of a multiplicative polynomial after performing the parallel multiplication of a multiplicand polynomial with a irreducible polynomial. The basic cells of MOD operation part have two AND gates and two XOR gates. Using these MOD operation parts, we can obtain the multiplication results performing the bit-parallel multiplication of two polynomials. Extending this process, we show the design of the generalized circuits for degree m and a simple example of constructing the multiplier circuit over finite fields $GF(2^4)$. Also, the presented multiplier is simulated by PSpice. The multiplier presented in this paper use the MOD operation parts with the basic cells repeatedly, and is easy to extend the multiplication of two polynomials in the finite fields with very large degree m, and is suitable to VLSI. Also, since this circuit has a low propagation delay time generated by the gates during operating process because of not use the memory elements in the inside of multiplier circuit, this multiplier circuit realizes a high-speed operation.
PDF KSCI

A Study on Improved Image Matching Method using the CUDA Computing (CUDA 연산을 이용한 개선된 영상 매칭 방법에 관한 연구)

Cho, Kyeongrae;Park, Byungjoon;Yoon, Taebok
- Journal of the Korea Academia-Industrial cooperation Society
- /
- v.16 no.4
- /
- pp.2749-2756
- /
- 2015
Recently, Depending on the quality of data increases, the problem of time-consuming to process the image is raised by being required to accelerate the image processing algorithms, in a traditional CPU and CUDA(Compute Unified Device Architecture) based recognition system for computing speed and performance gains compared to OpenMP When character recognition has been learned by the system to measure the input by the character data matching is implemented in an environment that recognizes the region of the well, so that the font of the characters image learning English alphabet are each constant and standardized in size and character an image matching method for calculating the matching has also been implemented. GPGPU (General Purpose GPU) programming platform technology when using the CUDA computing techniques to recognize and use the four cores of Intel i5 2500 with OpenMP to deal quickly and efficiently an algorithm, than the performance of existing CPU does not produce the rate of four times due to the delay of the data of the partition and merge operation proposed a method of improving the rate of speed of about 3.2 times, and the parallel processing of the video card that processes a result, the sequential operation of the process compared to CPU-based who performed the performance gain is about 21 tiems improvement in was confirmed.
https://doi.org/10.5762/KAIS.2015.16.4.2749 인용 PDF KSCI

Robust Double Deadbeat Control of Single-Phase UPS Inverter (단상 UPS 인버터의 강인한 2중 데드비트제어)

박지호;허태원;안인모;이현우;정재륜;우정인
- Journal of the Korean Institute of Illuminating and Electrical Installation Engineers
- /
- v.15 no.6
- /
- pp.65-72
- /
- 2001
This paper deals with a novel full digital control of the single-phase PWM(Pulse Width Modulation) inviter for UPS(Uninterruptible Power Supp1y). The voltage and current of output filter capacitor as a state variable are the feedback control input. In the proposed scheme a double deadbeat control consisting of minor current control loop and major voltage control loop have been developed In addition, a second order deadbeat currents control which should be exactly equal to its reference in two sampling time without error and overshoot is proposed to remove the influence of the calculation time delay. The load current prediction is achieved to compensate the load disturbance. The simulation and experimental result shows that the proposed system offers an output voltage with THD(Total Harmonic Distortion) less than 5% at a full nonlinear load.
PDF

Stochastic Glitch Estimation and Path Balancing for Statistical Optimization (통계적 최적화를 위한 확률적 글리치 예측 및 경로 균등화 방법)

Shin Ho-Soon;Kim Ju-Ho;Lee Hyung-Woo
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.43 no.8 s.350
- /
- pp.35-43
- /
- 2006
In the paper, we propose a new method for power optimization that uses path balancing based on stochastic estimation of glitch in Statistical Static Timing Analysis (SSTA). The proposed method estimates the probability of glitch occurrence using tightness probability of each node in timing graph. In addition, we propose efficient gate sizing technique for glitch reduction using accurate calculation of sizing effect in delay considering probability of glitch occurrence. The efficiency of proposed method has been verified on ISCAS85 benchmark circuits with $0.16{\mu}m$ model parameters. Experimental results show up to 8.6% of accuracy improvement in glitch estimation and 9.5% of optimization improvement.
PDF KSCI

Hardware Design for Timing Synchronization of OFDM-Based WAVE Systems (OFDM 기반 WAVE 시스템의 시간동기 하드웨어 설계)

Huynh, Tronganh;Kim, Jin-Sang;Cho, Won-Kyung
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.33 no.4A
- /
- pp.473-478
- /
- 2008
WAVE is a short-to-medium range communication standard that supports both public safety and private operations in roadside-to-vehicle and vehicle-to-vehicle communication environments. The core technology of physical layer in WAVE is orthogonal frequency division multiplexing (OFDM), which is sensitive to timing synchronization error. Besides, minimizing the latency in communication link is an essential characteristic of WAVE system. In this paper, a robust, low-complexity and small-latency timing synchronization algorithm suitable for WAVE system and its efficient hardware architecture are proposed. The comparison between proposed algorithm and other algorithms in terms of computational complexity and latency has shown the advantage of the proposed algorithm. The proposed architecture does not require RAM (Random Access Memory) which can affect the pipe lining ability and high speed operation of the hardware implementation. Synchronization error rate (SER) evaluation using both Matlab and FPGA implementation shows that the proposed algorithm exhibits a good performance over the existing algorithms.
PDF KSCI

Search Result 451, Processing Time 0.023 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)