• Title/Summary/Keyword: in-memory computing

Search Result 766, Processing Time 0.053 seconds

A Study on GPU Computing of Bi-conjugate Gradient Method for Finite Element Analysis of the Incompressible Navier-Stokes Equations (유한요소 비압축성 유동장 해석을 위한 이중공액구배법의 GPU 기반 연산에 대한 연구)

  • Yoon, Jong Seon;Jeon, Byoung Jin;Jung, Hye Dong;Choi, Hyoung Gwon
    • Transactions of the Korean Society of Mechanical Engineers B
    • /
    • v.40 no.9
    • /
    • pp.597-604
    • /
    • 2016
  • A parallel algorithm of bi-conjugate gradient method was developed based on CUDA for parallel computation of the incompressible Navier-Stokes equations. The governing equations were discretized using splitting P2P1 finite element method. Asymmetric stenotic flow problem was solved to validate the proposed algorithm, and then the parallel performance of the GPU was examined by measuring the elapsed times. Further, the GPU performance for sparse matrix-vector multiplication was also investigated with a matrix of fluid-structure interaction problem. A kernel was generated to simultaneously compute the inner product of each row of sparse matrix and a vector. In addition, the kernel was optimized to improve the performance by using both parallel reduction and memory coalescing. In the kernel construction, the effect of warp on the parallel performance of the present CUDA was also examined. The present GPU computation was more than 7 times faster than the single CPU by double precision.

Korean Abbreviation Generation using Sequence to Sequence Learning (Sequence-to-sequence 학습을 이용한 한국어 약어 생성)

  • Choi, Su Jeong;Park, Seong-Bae;Kim, Kweon-Yang
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.3
    • /
    • pp.183-187
    • /
    • 2017
  • Smart phone users prefer fast reading and texting. Hence, users frequently use abbreviated sequences of words and phrases. Nowadays, abbreviations are widely used from chat terms to technical terms. Therefore, gathering abbreviations would be helpful to many services, including information retrieval, recommendation system, and so on. However, manually gathering abbreviations needs to much effort and cost. This is because new abbreviations are continuously generated whenever a new material such as a TV program or a phenomenon is made. Thus it is required to generate of abbreviations automatically. To generate Korean abbreviations, the existing methods use the rule-based approach. The rule-based approach has limitations, in that it is unable to generate irregular abbreviations. Another problem is to decide the correct abbreviation among candidate abbreviations generated rules. To address the limitations, we propose a method of generating Korean abbreviations automatically using sequence-to-sequence learning in this paper. The sequence-to-sequence learning can generate irregular abbreviation and does not lead to the problem of deciding correct abbreviation among candidate abbreviations. Accordingly, it is suitable for generating Korean abbreviations. To evaluate the proposed method, we use dataset of two type. As experimental results, we prove that our method is effective for irregular abbreviations.

Performance Evaluation of the GPU Architecture Executing Parallel Applications (병렬 응용프로그램 실행 시 GPU 구조에 따른 성능 분석)

  • Choi, Hong-Jun;Kim, Cheol-Hong
    • The Journal of the Korea Contents Association
    • /
    • v.12 no.5
    • /
    • pp.10-21
    • /
    • 2012
  • The role of GPU has evolved from graphics-specific processing to general-purpose processing with the development of unified shader core architecture. Especially, execution methods for general-purpose parallel applications using GPU have been researched intensively, since the parallel hardware architecture can be utilized efficiently when the parallel applications are executed. However, current GPU architecture has limitations in executing general-purpose parallel applications, since the GPU is not specialized for general-purpose computing yet. To improve the GPU performance when general-purpose parallel applications are executed, the GPU architecture should be evolved. In this work, we analyze the GPU performance according to the architecture varying the number of cores and clock frequency. Our simulation results show that the GPU performance improves by up to 125.8% and 16.2% as the number of cores increases and the clock frequency increases, respectively. However, note that the improvement of the GPU performance is saturated even though the number of cores increases and the clock frequency increases continuously, since the data cannot be provided to the GPU due to the limit of memory bandwidth. Consequently, to accomplish high performance effectiveness on GPU, computational resources must be more suitably considered.

Frequently Occurred Information Extraction from a Collection of Labeled Trees (라벨 트리 데이터의 빈번하게 발생하는 정보 추출)

  • Paik, Ju-Ryon;Nam, Jung-Hyun;Ahn, Sung-Joon;Kim, Ung-Mo
    • Journal of Internet Computing and Services
    • /
    • v.10 no.5
    • /
    • pp.65-78
    • /
    • 2009
  • The most commonly adopted approach to find valuable information from tree data is to extract frequently occurring subtree patterns from them. Because mining frequent tree patterns has a wide range of applications such as xml mining, web usage mining, bioinformatics, and network multicast routing, many algorithms have been recently proposed to find the patterns. However, existing tree mining algorithms suffer from several serious pitfalls in finding frequent tree patterns from massive tree datasets. Some of the major problems are due to (1) modeling data as hierarchical tree structure, (2) the computationally high cost of the candidate maintenance, (3) the repetitious input dataset scans, and (4) the high memory dependency. These problems stem from that most of these algorithms are based on the well-known apriori algorithm and have used anti-monotone property for candidate generation and frequency counting in their algorithms. To solve the problems, we base a pattern-growth approach rather than the apriori approach, and choose to extract maximal frequent subtree patterns instead of frequent subtree patterns. The proposed method not only gets rid of the process for infrequent subtrees pruning, but also totally eliminates the problem of generating candidate subtrees. Hence, it significantly improves the whole mining process.

  • PDF

Adaptation Techniques of an Object-based MPEG-4 Player to PDA (객체 기반 MPEG-4 재생 기술의 PDA 적응 기법)

  • Kim, Nam-Young;Kim, Sang-Wook
    • Journal of KIISE:Software and Applications
    • /
    • v.33 no.2
    • /
    • pp.220-230
    • /
    • 2006
  • As the computing technique and mobile devices get developed, the demand of multimedia contents for mobile environment has been increased. The multimedia contents provided on PDA has so far been limited to the materials such as video and audio. MPEG-4 is the international standard used for supporting to properly save, communicate multimedia in formation such as video, audio, image, text and two-dimensional object, which can present various, multimedia contents by using adaptation techniques. However, since most MPEG-4 contents are not used for PDA but desktop, it seems not to be played on PDA where needs low power consumption, limited memory capability and GUI, and so on. In this paper, we propose the adaptation techniques, which can present the MPEG-4 contents on PDA, using scene composition with MPEG-4. The proposed scheme consists of three subparts: physical adaptation, variation adaptation and resource adaptation. Physical adaptation adjusts the physical difference of between the authoring environment and playback environment. Event adaptation part transforms events used for desktop to the events used for playback on PDA. The resource adaptation enables efficiency of playback to promote by using the essential information table on BIFS parser. As the proposed scheme is applied to MPEG-4 player, we see that the MPEG-4 contents are efficiently played on PDA.

Reverse Baby-step 2k-ary Adult-step Method for 𝜙((n) Decryption of Asymmetric-key RSA (비대칭키 RSA의 𝜙(n) 해독을 위한 역 아기걸음- 2k-ary 성인걸음법)

  • Lee, Sang-Un
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.14 no.6
    • /
    • pp.25-31
    • /
    • 2014
  • When the public key e and the composite number n=pq are disclosed but not the private key d in an asymmetric-key RSA, message decryption is carried out by obtaining ${\phi}(n)=(p-1)(q-1)=n+1-(p+q)$ and subsequently computing $d=e^{-1}(mod{\phi}(n))$. The most commonly used decryption algorithm is integer factorization of n/p=q or $a^2{\equiv}b^2$(mod n), a=(p+q)/2, b=(q-p)/2. But many of the RSA numbers remain unfactorable. This paper therefore applies baby-step giant-step discrete logarithm and $2^k$-ary modular exponentiation to directly obtain ${\phi}(n)$. The proposed algorithm performs a reverse baby-step and $2^k$-ary adult-step. As a results, it reduces the execution time of basic adult-step to $1/2^k$ times and the memory $m={\lceil}\sqrt{n}{\rceil}$ to l, $a^l$ > n, hence obtaining ${\phi}(n)$ by executing within l times.

Development of a Remote Multi-Task Debugger for Qplus-T RTOS (Qplus-T RTOS를 위한 원격 멀티 태스크 디버거의 개발)

  • 이광용;김흥남
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.9 no.4
    • /
    • pp.393-409
    • /
    • 2003
  • In this paper, we present a multi-task debugging environment for Qplus-T embedded-system such as internet information appliances. We will propose the structure and functions of a remote multi-task debugging environment supporting environment effective ross-development. And, we are going enhance the communication architecture between the host and target system to provide more efficient cross-development environment. The remote development toolset called Q+Esto consists to several independent support tools: an interactive shell, a remote debugger, a resource monitor, a target manager and a debug agent. Excepting a debug agent, all these support tools reside on the host systems. Using the remote multi-task debugger on the host, the developer can spawn and debug tasks on the target run-time system. It can also be attached to already-running tasks spawned from the application or from interactive shell. Application code can be viewed as C/C++ source, or as assembly-level code. It incorporates a variety of display windows for source, registers, local/global variables, stack frame, memory, event traces and so on. The target manager implements common functions that are shared by Q+Esto tools, e.g., the host-target communication, object file loading, and management of target-resident host tool´s memory pool and target system´s symbol-table, and so on. These functions are called OPEn C APIs and they greatly improve the extensibility of the Q+Esto Toolset. The Q+Esto target manager is responsible for communicating between host and target system. Also, there exist a counterpart on the target system communicating with the host target manager, which is called debug agent. Debug agent is a daemon task on real-time operating systems in the target system. It gets debugging requests from the host tools including debugger via target manager, interprets the requests, executes them and sends the results to the host.

Performance Analysis of Frequent Pattern Mining with Multiple Minimum Supports (다중 최소 임계치 기반 빈발 패턴 마이닝의 성능분석)

  • Ryang, Heungmo;Yun, Unil
    • Journal of Internet Computing and Services
    • /
    • v.14 no.6
    • /
    • pp.1-8
    • /
    • 2013
  • Data mining techniques are used to find important and meaningful information from huge databases, and pattern mining is one of the significant data mining techniques. Pattern mining is a method of discovering useful patterns from the huge databases. Frequent pattern mining which is one of the pattern mining extracts patterns having higher frequencies than a minimum support threshold from databases, and the patterns are called frequent patterns. Traditional frequent pattern mining is based on a single minimum support threshold for the whole database to perform mining frequent patterns. This single support model implicitly supposes that all of the items in the database have the same nature. In real world applications, however, each item in databases can have relative characteristics, and thus an appropriate pattern mining technique which reflects the characteristics is required. In the framework of frequent pattern mining, where the natures of items are not considered, it needs to set the single minimum support threshold to a too low value for mining patterns containing rare items. It leads to too many patterns including meaningless items though. In contrast, we cannot mine any pattern if a too high threshold is used. This dilemma is called the rare item problem. To solve this problem, the initial researches proposed approximate approaches which split data into several groups according to item frequencies or group related rare items. However, these methods cannot find all of the frequent patterns including rare frequent patterns due to being based on approximate techniques. Hence, pattern mining model with multiple minimum supports is proposed in order to solve the rare item problem. In the model, each item has a corresponding minimum support threshold, called MIS (Minimum Item Support), and it is calculated based on item frequencies in databases. The multiple minimum supports model finds all of the rare frequent patterns without generating meaningless patterns and losing significant patterns by applying the MIS. Meanwhile, candidate patterns are extracted during a process of mining frequent patterns, and the only single minimum support is compared with frequencies of the candidate patterns in the single minimum support model. Therefore, the characteristics of items consist of the candidate patterns are not reflected. In addition, the rare item problem occurs in the model. In order to address this issue in the multiple minimum supports model, the minimum MIS value among all of the values of items in a candidate pattern is used as a minimum support threshold with respect to the candidate pattern for considering its characteristics. For efficiently mining frequent patterns including rare frequent patterns by adopting the above concept, tree based algorithms of the multiple minimum supports model sort items in a tree according to MIS descending order in contrast to those of the single minimum support model, where the items are ordered in frequency descending order. In this paper, we study the characteristics of the frequent pattern mining based on multiple minimum supports and conduct performance evaluation with a general frequent pattern mining algorithm in terms of runtime, memory usage, and scalability. Experimental results show that the multiple minimum supports based algorithm outperforms the single minimum support based one and demands more memory usage for MIS information. Moreover, the compared algorithms have a good scalability in the results.

Enhancing Predictive Accuracy of Collaborative Filtering Algorithms using the Network Analysis of Trust Relationship among Users (사용자 간 신뢰관계 네트워크 분석을 활용한 협업 필터링 알고리즘의 예측 정확도 개선)

  • Choi, Seulbi;Kwahk, Kee-Young;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.113-127
    • /
    • 2016
  • Among the techniques for recommendation, collaborative filtering (CF) is commonly recognized to be the most effective for implementing recommender systems. Until now, CF has been popularly studied and adopted in both academic and real-world applications. The basic idea of CF is to create recommendation results by finding correlations between users of a recommendation system. CF system compares users based on how similar they are, and recommend products to users by using other like-minded people's results of evaluation for each product. Thus, it is very important to compute evaluation similarities among users in CF because the recommendation quality depends on it. Typical CF uses user's explicit numeric ratings of items (i.e. quantitative information) when computing the similarities among users in CF. In other words, user's numeric ratings have been a sole source of user preference information in traditional CF. However, user ratings are unable to fully reflect user's actual preferences from time to time. According to several studies, users may more actively accommodate recommendation of reliable others when purchasing goods. Thus, trust relationship can be regarded as the informative source for identifying user's preference with accuracy. Under this background, we propose a new hybrid recommender system that fuses CF and social network analysis (SNA). The proposed system adopts the recommendation algorithm that additionally reflect the result analyzed by SNA. In detail, our proposed system is based on conventional memory-based CF, but it is designed to use both user's numeric ratings and trust relationship information between users when calculating user similarities. For this, our system creates and uses not only user-item rating matrix, but also user-to-user trust network. As the methods for calculating user similarity between users, we proposed two alternatives - one is algorithm calculating the degree of similarity between users by utilizing in-degree and out-degree centrality, which are the indices representing the central location in the social network. We named these approaches as 'Trust CF - All' and 'Trust CF - Conditional'. The other alternative is the algorithm reflecting a neighbor's score higher when a target user trusts the neighbor directly or indirectly. The direct or indirect trust relationship can be identified by searching trust network of users. In this study, we call this approach 'Trust CF - Search'. To validate the applicability of the proposed system, we used experimental data provided by LibRec that crawled from the entire FilmTrust website. It consists of ratings of movies and trust relationship network indicating who to trust between users. The experimental system was implemented using Microsoft Visual Basic for Applications (VBA) and UCINET 6. To examine the effectiveness of the proposed system, we compared the performance of our proposed method with one of conventional CF system. The performances of recommender system were evaluated by using average MAE (mean absolute error). The analysis results confirmed that in case of applying without conditions the in-degree centrality index of trusted network of users(i.e. Trust CF - All), the accuracy (MAE = 0.565134) was lower than conventional CF (MAE = 0.564966). And, in case of applying the in-degree centrality index only to the users with the out-degree centrality above a certain threshold value(i.e. Trust CF - Conditional), the proposed system improved the accuracy a little (MAE = 0.564909) compared to traditional CF. However, the algorithm searching based on the trusted network of users (i.e. Trust CF - Search) was found to show the best performance (MAE = 0.564846). And the result from paired samples t-test presented that Trust CF - Search outperformed conventional CF with 10% statistical significance level. Our study sheds a light on the application of user's trust relationship network information for facilitating electronic commerce by recommending proper items to users.

Crosshole EM 2.5D Modeling by the Extended Born Approximation (확장된 Born 근사에 의한 시추공간 전자탐사 2.5차원 모델링)

  • Cho, In-Ky;Suh, Jung-Hee
    • Geophysics and Geophysical Exploration
    • /
    • v.1 no.2
    • /
    • pp.127-135
    • /
    • 1998
  • The Born approximation is widely used for solving the complex scattering problems in electromagnetics. Approximating total internal electric field by the background field is reasonable for small material contrasts as long as scatterer is not too large and the frequency is not too high. However in many geophysical applications, moderate and high conductivity contrasts cause both real and imaginary part of internal electric field to differ greatly from background. In the extended Born approximation, which can improve the accuracy of Born approximation dramatically, the total electric field in the integral over the scattering volume is approximated by the background electric field projected to a depolarization tensor. The finite difference and elements methods are usually used in EM scattering problems with a 2D model and a 3D source, due to their capability for simulating complex subsurface conductivity distributions. The price paid for a 3D source is that many wavenumber domain solutions and their inverse Fourier transform must be computed. In these differential equation methods, all the area including homogeneous region should be discretized, which increases the number of nodes and matrix size. Therefore, the differential equation methods need a lot of computing time and large memory. In this study, EM modeling program for a 2D model and a 3D source is developed, which is based on the extended Born approximation. The solution is very fast and stable. Using the program, crosshole EM responses with a vertical magnetic dipole source are obtained and the results are compared with those of 3D integral equation solutions. The agreement between the integral equation solution and extended Born approximation is remarkable within the entire frequency range, but degrades with the increase of conductivity contrast between anomalous body and background medium. The extended Born approximation is accurate in the case conductivity contrast is lower than 1:10. Therefore, the location and conductivity of the anomalous body can be estimated effectively by the extended Born approximation although the quantitative estimate of conductivity is difficult for the case conductivity contrast is too high.

  • PDF