• Title/Summary/Keyword: 연산 효율

Search Result 2,610, Processing Time 0.03 seconds

Processing-in-Memory Architecture for Enhanced Convolutional Neural Network Performance (합성곱 신경망 성능 향상을 위한 메모리 내 연산 구조)

  • Kun-Mo Jeong;Ho-Yun Youm;Han-Jun Kim
    • Annual Conference of KIPS
    • /
    • 2024.05a
    • /
    • pp.61-64
    • /
    • 2024
  • 최근 고성능 컴퓨팅 장치의 수요 증가와 함께, 메모리 내에 연산을 가능하게 하는 하드웨어 구조가 새로이 발표되고 있다. 본 논문은 기존 DRAM 에 계산 유닛을 통합하는 새로운 메모리 내 연산 구조를 제안한다. 특히, 데이터 집약적인 합성곱 신경망 작업을 위해 최적화된 이 구조는 기존 메모리 구조를 사용하면서도 기존 구조에 분기를 추가함으로서 CNN 연산의 속도와 에너지 효율을 향상시킨다. VGG19, AlexNet, ResNet-50 과 같은 다양한 CNN 모델을 활용한 실험 결과, PINN 아키텍처는 기존 연구에 비해 최대 2.95 배까지의 성능 향상을 달성할 수 있음을 확인하였다. 이러한 결과는 PINN 기술이 저장 및 연산 성능의 한계를 극복하고, 머신 러닝과 같은 고급 어플리케이션의 요구를 충족시킬 수 있는 방안임을 시사한다.

Fast Geometric Transformations of 3D Images Represented by an Octree (8진트리로 표현된 3차원 영상의 빠른 기학학적 변환)

  • Heo, Yeong-Nam;Park, Seung-Jin;Kim, Eung-Gon
    • The Transactions of the Korea Information Processing Society
    • /
    • v.2 no.6
    • /
    • pp.831-838
    • /
    • 1995
  • Geometric transformations require many operations in displaying moving 3D objects on the screen and a fast computation is a important problem in CAD or animation applications. The general method to compute the transformation coordinates of an object represented by an octree must perform the operations on every node. This paper proposes an efficient method that computes the rectangular coordinates of the vertices of the octree nodes into the coordinates of the universe space using the basicvectors in order to compute quickly geometric transformations of 3D images represented by an octree. The coordinates of the vertices of each octant are computed by using the formula presented here, which requies additions and multiplications by powers of 2. This method has a very fast execution time and is compared with the general computation method.

  • PDF

Modular Exponentiation by m-Numeral System (m-진법 모듈러 지수연산)

  • Lee, Sang-Un
    • The KIPS Transactions:PartC
    • /
    • v.18C no.1
    • /
    • pp.1-6
    • /
    • 2011
  • The performance and practicality of cryptosystem for encryption, decryption, and primality test is primarily determined by the implementation efficiency of the modular exponentiation of $a^b$(mod n). To compute $a^b$(mod n), the standard binary squaring still seems to be the best choice. But, the d-ary, (d=2,3,4,5,6) method is more efficient in large b bits. This paper suggests m-numeral system modular exponentiation. This method can be apply to$b{\equiv}0$(mod m), $2{\leq}m{\leq}16$. And, also suggests the another method that is exit the algorithm in the case of the result is 1 or a.

An Efficient Computation of FFT for MPEG/Audio Psycho-Acoustic Model (MPEG 심리음향모델의 고속 구현을 위한 효율적 FFT 연산)

  • 송건호;이근섭;박영철;윤대희
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.41 no.6
    • /
    • pp.261-269
    • /
    • 2004
  • In this paper, an efficient algorithm for computing in the MPEG/audio Layer Ⅲ (MP3) encoder is proposed. The proposed algerian performs a full-band 1024-point FFT by computing 32-point FFT's of 32 subband outputs. To reduce the aliasing caused by the analysis filter bank, an aliasing cancellation butterfly is developed. A major benefit of the proposed algorithm is the computational saving. By using the proposed algorithm, it is possible to save 40~50% of computations for FFT, which results in about 20% reduction of the PAM-2 complexity.

Efficient Convolutional Neural Network with low Complexity (저연산량의 효율적인 콘볼루션 신경망)

  • Lee, Chanho;Lee, Joongkyung;Ho, Cong Ahn
    • Journal of IKEEE
    • /
    • v.24 no.3
    • /
    • pp.685-690
    • /
    • 2020
  • We propose an efficient convolutional neural network with much lower computational complexity and higher accuracy based on MobileNet V2 for mobile or edge devices. The proposed network consists of bottleneck layers with larger expansion factors and adjusted number of channels, and excludes a few layers, and therefore, the computational complexity is reduced by half. The performance the proposed network is verified by measuring the accuracy and execution times by CPU and GPU using ImageNet100 dataset. In addition, the execution time on GPU depends on the CNN architecture.

Adaptive Motion Vector Resampling Method for Efficient Resizing Transcoding (효율적인 크기조절 트랜스코딩을 위한 적응적 움직임 벡터 재산출 방법)

  • Lee, Kyu-Chan;Kim, Seong-Hoon;Oh, Seoung-Jun;Park, Ho-Chong;Ahn, Chang-Beom;Seo, Jeong-Il
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2005.11a
    • /
    • pp.169-172
    • /
    • 2005
  • 크기조절 트랜스코딩에서 움직임 벡터 재 예측 과정은 많은 연산량을 필요로 하기 때문에, 실시간 처리를 위해서는 이 과정의 연산량을 줄이는 것이 필요하다. 본 논문에서는 여러 영상에 대해 예측 움직임 벡터를 산출하는 방법을 적응적으로 수행함으로써, 기존 방법에 비해 화질열화 없이 연산량을 줄이는 방법을 제안한다. 전체 움직임의 크기와 움직임 벡터들의 균일성(homogeneity)을 이용하여 움직임이 작을 때는 움직임 벡터 재산출 과정 없이 예측 움직임 벡터 성분을 0으로, 움직임이 크면 움직임 벡터들의 균일성의 정도에 따라 평균값 또는 중간값을 예측 움직임 벡터 성분으로 적응적으로 선택하였다. 그리고 좀 더 효율적인 움직임 벡터 수행을 위해 제안된 과정을 수평, 수직 성분에 각각 따로 적용하였다. 가중치를 부여하여 평균값을 취하는 가중평균 방법과 비효 실험한 결과, 같은 PSNR을 유지하는 조건에서 움직임 벡터 재산출 과정의 덧셈과 곱셈 연산의 수가 평균적으로 각각 96%, 42% 정도 감소하였다.

  • PDF

Efficient short-length running convolution algorithm using filter banks (필터 뱅크를 사용한 효율적인 short-length running convolution 알고리즘)

  • Jang Young-Beom;Oh Se-Man;Lee Won-Sang
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.42 no.6
    • /
    • pp.187-194
    • /
    • 2005
  • In this paper, an efficient and fast algerian to reduce calculation amount of FIR(Finite Impulse Responses) filtering is proposed. Proposed algorithm enables arbitrary size of parallel processing, and their structures are also easily derived. Furthermore, it is shown that the number of multiplication/sample is reduced, and number of instructions using MAC(Multiplication and Accumulation) processor are also reduced. For theoretical improvement numbers of sub filters are compared with those of conventional algorithm. In addition to the theoretical improvement, it is shown that number of element for hardwired implementation are reduced comparison to those of the conventional algorithm.

An Technique for the Active Rule Condition (능동규칙의 조건부 처리 기법)

  • 이기욱
    • Journal of the Korea Society of Computer and Information
    • /
    • v.3 no.4
    • /
    • pp.49-54
    • /
    • 1998
  • AS it takes a considerable time for database operations for processing the condition part of active rule, the operations have an important effect on the efficiency of active database system. The processing time of operations should be minimized in order to improve the efficiency of system. The previous works are limited to basic database operations and the partial aggregate functions. In this paper, the processing technique using the structuralization and the state table of relations is suggested. The processing time for basic database operations can be reduced with the structuralization of relations to classification tree and the introduction of deletion information table. With the introduction of binary search tree and relation state table, the aggregate function which has a big of processing cost can be processed effectively and the function of the active database system can be maximized.

  • PDF

Topic maps Matching and Merging Techniques based on Partitioning of Topics (토픽 분할에 의한 토픽맵 매칭 및 통합 기법)

  • Kim, Jung-Min;Chung, Hyun-Sook
    • The KIPS Transactions:PartD
    • /
    • v.14D no.7
    • /
    • pp.819-828
    • /
    • 2007
  • In this paper, we propose a topic maps matching and merging approach based on the syntactic or semantic characteristics and constraints of the topic maps. Previous schema matching approaches have been developed to enhance effectiveness and generality of matching techniques. However they are inefficient because the approaches should transform input ontologies into graphs and take into account all the nodes and edges of the graphs, which ended up requiring a great amount of processing time. Now, standard languages for developing ontologies are RDF/OWL and Topic Maps. In this paper, we propose an enhanced version of matching and merging technique based on topic partitioning, several matching operations and merging conflict detection.

Hardware Design of Efficient Montgomery Multiplier for Low Area RSA (저면적 RSA를 위한 효율적인 Montgomery 곱셈기 하드웨어 설계)

  • Nti, Richard B.;Ryoo, Kwangki
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2017.10a
    • /
    • pp.575-577
    • /
    • 2017
  • In public key cryptography such as RSA, modular exponentiation is the most time-consuming operation. RSA's modular exponentiation can be computed by repeated modular multiplication. To attain high efficiency for RSA, fast modular multiplication algorithms have been proposed to speed up decryption/encryption. Montgomery multiplication is limited by the carry propagation delay from the addition of long operands. In this paper, we propose a hardware structure that reduces the area of the Montgomery multiplication implementation for lightweight applications of RSA. Experimental results showed that the new design can achieve higher performance and reduce hardware area. A frequency of 884.9MHz and 250MHz were achieved with 84K and 56K gates respectively using the 90nm technology.

  • PDF