• Title/Summary/Keyword: vector-parallel performance analysis

Search Result 18, Processing Time 0.025 seconds

A Study on GPU Computing of Bi-conjugate Gradient Method for Finite Element Analysis of the Incompressible Navier-Stokes Equations (유한요소 비압축성 유동장 해석을 위한 이중공액구배법의 GPU 기반 연산에 대한 연구)

  • Yoon, Jong Seon;Jeon, Byoung Jin;Jung, Hye Dong;Choi, Hyoung Gwon
    • Transactions of the Korean Society of Mechanical Engineers B
    • /
    • v.40 no.9
    • /
    • pp.597-604
    • /
    • 2016
  • A parallel algorithm of bi-conjugate gradient method was developed based on CUDA for parallel computation of the incompressible Navier-Stokes equations. The governing equations were discretized using splitting P2P1 finite element method. Asymmetric stenotic flow problem was solved to validate the proposed algorithm, and then the parallel performance of the GPU was examined by measuring the elapsed times. Further, the GPU performance for sparse matrix-vector multiplication was also investigated with a matrix of fluid-structure interaction problem. A kernel was generated to simultaneously compute the inner product of each row of sparse matrix and a vector. In addition, the kernel was optimized to improve the performance by using both parallel reduction and memory coalescing. In the kernel construction, the effect of warp on the parallel performance of the present CUDA was also examined. The present GPU computation was more than 7 times faster than the single CPU by double precision.

Fine-tuning SVM for Enhancing Speech/Music Classification (SVM의 미세조정을 통한 음성/음악 분류 성능향상)

  • Lim, Chung-Soo;Song, Ji-Hyun;Chang, Joon-Hyuk
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.48 no.2
    • /
    • pp.141-148
    • /
    • 2011
  • Support vector machines have been extensively studied and utilized in pattern recognition area for years. One of interesting applications of this technique is music/speech classification for a standardized codec such as 3GPP2 selectable mode vocoder. In this paper, we propose a novel approach that improves the speech/music classification of support vector machines. While conventional support vector machine optimization techniques apply during training phase, the proposed technique can be adopted in classification phase. In this regard, the proposed approach can be developed and employed in parallel with conventional optimizations, resulting in synergistic boost in classification performance. We first analyze the impact of kernel width parameter on the classifications made by support vector machines. From this analysis, we observe that we can fine-tune outputs of support vector machines with the kernel width parameter. To make the most of this capability, we identify strong correlation among neighboring input frames, and use this correlation information as a guide to adjusting kernel width parameter. According to the experimental results, the proposed algorithm is found to have potential for improving the performance of support vector machines.

Sentiment Analysis using Robust Parallel Tri-LSTM Sentence Embedding in Out-of-Vocabulary Word (Out-of-Vocabulary 단어에 강건한 병렬 Tri-LSTM 문장 임베딩을 이용한 감정분석)

  • Lee, Hyun Young;Kang, Seung Shik
    • Smart Media Journal
    • /
    • v.10 no.1
    • /
    • pp.16-24
    • /
    • 2021
  • The exiting word embedding methodology such as word2vec represents words, which only occur in the raw training corpus, as a fixed-length vector into a continuous vector space, so when mapping the words incorporated in the raw training corpus into a fixed-length vector in morphologically rich language, out-of-vocabulary (OOV) problem often happens. Even for sentence embedding, when representing the meaning of a sentence as a fixed-length vector by synthesizing word vectors constituting a sentence, OOV words make it challenging to meaningfully represent a sentence into a fixed-length vector. In particular, since the agglutinative language, the Korean has a morphological characteristic to integrate lexical morpheme and grammatical morpheme, handling OOV words is an important factor in improving performance. In this paper, we propose parallel Tri-LSTM sentence embedding that is robust to the OOV problem by extending utilizing the morphological information of words into sentence-level. As a result of the sentiment analysis task with corpus in Korean, we empirically found that the character unit is better than the morpheme unit as an embedding unit for Korean sentence embedding. We achieved 86.17% accuracy on the sentiment analysis task with the parallel bidirectional Tri-LSTM sentence encoder.

Analytical Prediction and Experimental Verification of Electromagnetic Performance of a Surface-Mounted Permanent Magnet Motor having a Fractional Slot/Pole Number Combination

  • Hong, Sang-A;Choi, Jang-Young;Jang, Seok-Myeong
    • Journal of Magnetics
    • /
    • v.19 no.1
    • /
    • pp.84-89
    • /
    • 2014
  • This paper presents an analytical prediction and experimental verification of the electromagnetic performance of a parallel magnetized surface-mounted permanent magnet (SPM) motor having a fractional number of slots per pole combination. On the basis of a two-dimensional (2-D) polar coordinate system and a magnetic vector potential, analytical solutions for flux density produced by the permanent magnets (PMs) and stator windings are derived. Then, analytical solutions for back-electromotive force (emf) and electromagnetic torque are derived from these field solutions. The analytical results are thoroughly validated with 2-D nonlinear finite element (FE) analysis results. Finally, the experimental back-emf and electromagnetic torque measurements are presented to test the validity of the analysis.

Design and Implementation of a DSP Chip for Portable Multimedia Applications (휴대 멀티미디어 응용을 위한 DSP 칩 설계 및 구현)

  • 윤성현;선우명훈
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.35C no.12
    • /
    • pp.31-39
    • /
    • 1998
  • This paper presents the design and implementation of a new multimedia fixed-point DSP (MDSP) core for portable multimedia applications. The MDSP instruction set is designed through the analysis of multimedia algorithms and DSP instruction sets. The MDSP architecture employs parallel processing techniques, such as SIMD and vector processing as well as DSP techniques. The instruction set can handle various data formats and MDSP can perform two MAC operations in parallel. The switching network and packing network can increase the performance by overlapping data rearrangement cycles with computation cycles. We have designed Verilog HDL models and the 0.6 $\mu\textrm{m}$ Samsung KG75000 SOG library is used. The total gate count is 68,831 and the clock frequency is 30 MHz.

  • PDF

Fast Evaluation of Sound Radiation by Vibrating Structures with ACIRAN/AR

  • Migeot, Jean-Louis;Lielens, Gregory;Coyette, Jean-Pierre
    • Proceedings of the Korean Society for Noise and Vibration Engineering Conference
    • /
    • 2008.11a
    • /
    • pp.561-562
    • /
    • 2008
  • The numerical analysis of sound radiation by vibrating structure is a well known and mature technology used in many industries. Accurate methods based on the boundary or finite element method have been successfully developed over the last two decades and are now available in standard CAE tools. These methods are however known to require significant computational resources which, furthermore, very quickly increase with the frequency of interest. The low speed of most current methods is a main obstacle for a systematic use of acoustic CAE in industrial design processes. In this paper we are going to present a set of innovative techniques that significantly speed-up the calculation of acoustic radiation indicators (acoustic pressure, velocity, intensity and power; contribution vectors). The modeling is based on the well known combination of finite elements and infinite elements but also combines the following ingredients to obtain a very high performance: o a multi-frontal massively parallel sparse direct solver; o a multi-frequency solver based on the Krylov method; o the use of pellicular acoustic modes as a vector basis for representing acoustic excitations; o the numerical evaluation of Green functions related to the specific geometry of the problem under investigation. All these ingredients are embedded in the ACTRAN/AR CAE tool which provides unprecedented performance for acoustic radiation analysis. The method will be demonstrated on several applications taken from various industries.

  • PDF

Parallel Computation on the Three-dimensional Electromagnetic Field by the Graph Partitioning and Multi-frontal Method (그래프 분할 및 다중 프론탈 기법에 의거한 3차원 전자기장의 병렬 해석)

  • Kang, Seung-Hoon;Song, Dong-Hyeon;Choi, JaeWon;Shin, SangJoon
    • Journal of the Korean Society for Aeronautical & Space Sciences
    • /
    • v.50 no.12
    • /
    • pp.889-898
    • /
    • 2022
  • In this paper, parallel computing method on the three-dimensional electromagnetic field is proposed. The present electromagnetic scattering analysis is conducted based on the time-harmonic vector wave equation and the finite element method. The edge-based element and 2nd -order absorbing boundary condition are used. Parallelization of the elemental numerical integration and the matrix assemblage is accomplished by allocating the partitioned finite element subdomain for each processor. The graph partitioning library, METIS, is employed for the subdomain generation. The large sparse matrix computation is conducted by MUMPS, which is the parallel computing library based on the multi-frontal method. The accuracy of the present program is validated by the comparison against the Mie-series analytical solution and the results by ANSYS HFSS. In addition, the scalability is verified by measuring the speed-up in terms of the number of processors used. The present electromagnetic scattering analysis is performed for a perfect electric conductor sphere, isotropic/anisotropic dielectric sphere, and the missile configuration. The algorithm of the present program will be applied to the finite element and tearing method, aiming for the further extended parallel computing performance.

Pose and Expression Invariant Alignment based Multi-View 3D Face Recognition

  • Ratyal, Naeem;Taj, Imtiaz;Bajwa, Usama;Sajid, Muhammad
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.10
    • /
    • pp.4903-4929
    • /
    • 2018
  • In this study, a fully automatic pose and expression invariant 3D face alignment algorithm is proposed to handle frontal and profile face images which is based on a two pass course to fine alignment strategy. The first pass of the algorithm coarsely aligns the face images to an intrinsic coordinate system (ICS) through a single 3D rotation and the second pass aligns them at fine level using a minimum nose tip-scanner distance (MNSD) approach. For facial recognition, multi-view faces are synthesized to exploit real 3D information and test the efficacy of the proposed system. Due to optimal separating hyper plane (OSH), Support Vector Machine (SVM) is employed in multi-view face verification (FV) task. In addition, a multi stage unified classifier based face identification (FI) algorithm is employed which combines results from seven base classifiers, two parallel face recognition algorithms and an exponential rank combiner, all in a hierarchical manner. The performance figures of the proposed methodology are corroborated by extensive experiments performed on four benchmark datasets: GavabDB, Bosphorus, UMB-DB and FRGC v2.0. Results show mark improvement in alignment accuracy and recognition rates. Moreover, a computational complexity analysis has been carried out for the proposed algorithm which reveals its superiority in terms of computational efficiency as well.