• Title/Summary/Keyword: Parallel Search

Search Result 318, Processing Time 0.023 seconds

Opitmal Design Technique of Nielsen Arch Bridges by Using Genetic Algorithm (유전자 알고리즘을 이용한 닐센아치교의 최적설계기법)

  • Lee, Kwang Su;Chung, Young Soo
    • Journal of Korean Society of Steel Construction
    • /
    • v.21 no.4
    • /
    • pp.361-373
    • /
    • 2009
  • Using the genetic algorithm, the optimal-design technique of the Nielsen arch bridge was proposed in this paper. The design parameters were the arch-rise ratio and the steel weight ratio of the Nielsen arch bridge, and optimal-design techniques were utilized to analyze the behavior of the bridge. The optimal parameter values were determined for the estimated optimal level. The parameter determination requires the standardization of the safety, utility, and economic concepts as the critical factors of a structure. For this, a genetic algorithm was used, whose global-optimal-solution search ability is superior to the optimization technique, and whose object function in the optimal design is the total weight of the structure. The constraints for the optimization were displacement, internal stress, and time and space. The structural analysis was a combination of the small displacement theory and the genetic algorithm, and the runtime was reduced for parallel processing. The optimal-design technique that was developed in this study was employed and deduced using the optimal arch-rise ratio, steel weight ratio, and optimal-design domain. The optimal-design technique was presented so it could be applied in the industry.

A Heuristic for parallel Machine Scheduling Depending on Job Characteristics (작업의 특성에 종속되는 병렬기계의 일정계획을 위한 발견적 기법)

  • 이동현;이경근;김재균;박창권;장길상
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.17 no.1
    • /
    • pp.41-41
    • /
    • 1992
  • In the real world situations that some jobs need be processed only on certain limited machines frequently occur due to the capacity restrictions of machines such as tools fixtures or material handling equipment. In this paper we consider n-job non-preemptive and m parallel machines scheduling problem having two machines group. The objective function is to minimize the sum of earliness and tardiness with different release times and due dates. The problem is formulated as a mixed integer programming problem. The problem is proved to be Np-complete. Thus a heuristic is developed to solve this problem. To illustrate its suitability and efficiency a proposed heuristic is compared with a genetic algorithm and tabu search for a large number of randomly generated test problems in ship engine assembly shop. Through the experimental results it is showed that the proposed algorithm yields good solutions efficiently.

Parallel Range Query Processing with R-tree on Multi-GPUs (다중 GPU를 이용한 R-tree의 병렬 범위 질의 처리 기법)

  • Ryu, Hongsu;Kim, Mincheol;Choi, Wonik
    • Journal of KIISE
    • /
    • v.42 no.4
    • /
    • pp.522-529
    • /
    • 2015
  • Ever since the R-tree was proposed to index multi-dimensional data, many efforts have been made to improve its query performances. One common trend to improve query performance is to parallelize query processing with the use of multi-core architectures. To this end, a GPU-base R-tree has been recently proposed. However, even though a GPU-based R-tree can exhibit an improvement in query performance, it is limited in its ability to handle large volumes of data because GPUs have limited physical memory. To address this problem, we propose MGR-tree (Multi-GPU R-tree), which can manage large volumes of data by dividing nodes into multiple GPUs. Our experiments show that MGR-tree is up to 9.1 times faster than a sequential search on a GPU and up to 1.6 times faster than a conventional GPU-based R-tree.

Performance evaluation of hybrid acquisition in CDMA systems (DS/CDMA 시스템에서 하이브리드 동기 획득의 성능 분석)

  • 강법주;강창언
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.23 no.4
    • /
    • pp.914-925
    • /
    • 1998
  • This paper considers the evaluation of the hybrid acquistion perdformance for the pilot signal in the direct sequence code division multiple access(DS/CDMA) forward link. the hybrid acquisition is introduced by the combination of two schemes, the parallel and serial acquisions. The mean acquisition time of the proposed scheme is derived to consider both the best case(the correct code-phase offsets are included i one subset) and the worst case(the correct code-phase offsets exist at the boundary of two subsets), which are cause by the distribution of the correct code-phase offsets in the subset. Expressions for the detection, false alarm, and miss probabilities are derived for the case of multiple correct code-phase offsets and multipath Rayleigh fading channel. Numerical results present the hybrid acquistion performance with repect to design parameters such as postdetectio integration length in the search and verification modes, subset size, and number of I/Q noncoherent correlators, and compare the hybrid acquistion with the parallel acquistion in terms of the minimum acquistion time under the same hardware complexity.

  • PDF

Optimization of Warp-wide CUDA Implementation for Parallel Shifted Sort Algorithm (병렬 Shifted Sort 알고리즘의 Warp 단위 CUDA 구현 최적화)

  • Park, Taejung
    • Journal of Digital Contents Society
    • /
    • v.18 no.4
    • /
    • pp.739-745
    • /
    • 2017
  • This paper presents and discusses an implementation of the GPU shifted sorting method to find approximate k nearest neighbors which executes within "warp", the minimum execution unit in GPU parallel architecture. Also, this paper presents the comparison results with other two common nearest neighbor searching methods, GPU-based kd-tree and ANN (Approximate Nearest Neighbor) library. The proposed implementation focuses on the cases when k is small, i.e. 2, 4, 8, and 16, which are handled efficiently within warp to consider it is very common for applications to handle small k's. Also, this paper discusses optimization ways to implementation by improving memory management in a loop for the CUB open library and adopting CUDA commands which are supported by GPU hardware. The proposed implementation shows more than 16-fold speed-up against GPU-based other methods in the tests, implying that the improvement would become higher for more larger input data.

Development and Performance Evaluation of Parallel Sequence Analysis System on PC-Cluster (PC-Cluster 기반 병렬형 유전자 서열 검색 시스템의 개발 및 성능 평가)

  • Shin Yong-Won;Park Jeong-Seon
    • Journal of Biomedical Engineering Research
    • /
    • v.25 no.6
    • /
    • pp.617-621
    • /
    • 2004
  • In recent, researchers in the field of Bioinformatics need to analyze thousands of genome sequences efficiently according to introduce of new analysis methods and technologies such as genome expression microchip. This rapid growth in the field of bio-engineering needs computing resources to analyze rapidly for genome sequences, but it does not introduce the computing resources due to an enormous investment expense. The core factor of this study is integrated environment based PC-Cluster system & high speed access rate up to 155Mbps, continuous collection system for bio-information at home and abroad. The results of the study are establishment & stabilization of information and communication infrastructure, establishment & stabilization of high performance computer network up to 155Mbps, development of PC-Cluster system with 32 nodes, a parallel BLAST on Cluster system, which can provides scalable speedup in terms of response time, and development of collection & search system for bio-information.

Conversion of Large RDF Data using Hash-based ID Mapping Tables with MapReduce Jobs (맵리듀스 잡을 사용한 해시 ID 매핑 테이블 기반 대량 RDF 데이터 변환 방법)

  • Kim, InA;Lee, Kyu-Chul
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.10a
    • /
    • pp.236-239
    • /
    • 2021
  • With the growth of AI technology, the scale of Knowledge Graphs continues to be expanded. Knowledge Graphs are mainly expressed as RDF representations that consist of connected triples. Many RDF storages compress and transform RDF triples into the condensed IDs. However, if we try to transform a large scale of RDF triples, it occurs the high processing time and memory overhead because it needs to search the large ID mapping table. In this paper, we propose the method of converting RDF triples using Hash-based ID mapping tables with MapReduce, which is the software framework with a parallel, distributed algorithm. Our proposed method not only transforms RDF triples into Integer-based IDs, but also improves the conversion speed and memory overhead. As a result of our experiment with the proposed method for LUBM, the size of the dataset is reduced by about 3.8 times and the conversion time was spent about 106 seconds.

  • PDF

VLSI Design for Folded Wavelet Transform Processor using Multiple Constant Multiplication (MCM과 폴딩 방식을 적용한 웨이블릿 변환 장치의 VLSI 설계)

  • Kim, Ji-Won;Son, Chang-Hoon;Kim, Song-Ju;Lee, Bae-Ho;Kim, Young-Min
    • Journal of Korea Multimedia Society
    • /
    • v.15 no.1
    • /
    • pp.81-86
    • /
    • 2012
  • This paper presents a VLSI design for lifting-based discrete wavelet transform (DWT) 9/7 filter using multiplierless multiple constant multiplication (MCM) architecture. This proposed design is based on the lifting scheme using pattern search for folded architecture. Shift-add operation is adopted to optimize the multiplication process. The conventional serial operations of the lifting data flow can be optimized into parallel ones by employing paralleling and pipelining techniques. This optimized design has simple hardware architecture and requires less computation without performance degradation. Furthermore, hardware utilization reaches 100%, and the number of registers required is significantly reduced. To compare our work with previous methods, we implemented the architecture using Verilog HDL. We also executed simulation based on the logic synthesis using $0.18{\mu}m$ CMOS standard cells. The proposed architecture shows hardware reduction of up to 60.1% and 44.1% respectively at 200 MHz clock compared to previous works. This implementation results indicate that the proposed design performs efficiently in hardware cost, area, and power consumption.

Improving Scalability using Parallelism in RFID Privacy Protection (RFID 프라이버시 보호에서 병행성을 이용한 확장성 개선)

  • Shin Myeong-Sook;Lee Joon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.10 no.8
    • /
    • pp.1428-1434
    • /
    • 2006
  • In this paper, we propose the scheme solving privacy infringement in RFID systems with improving the scalability of back-end server. With RFID/USN becoming important subject, many approaches have been proposed and applied. However, limits of RFID, low computation power and storage, make the protection of privacy difficult. The Hash Chain scheme has been known as one guaranteeing forward security, confidentiality and indistinguishability. In spite of that, it is a problem that requires much of computation to identify tags in Back-End server. In this paper, we introduce an efficient key search method, the Hellman Method, to reduce computing complexity in Back-End server. Hellman Method algorism progresses pre-computation and (re)search. In this paper, after applying Hellman Method to Hash chain theory, We compared Preservation and key reference to analyze and apply to parallel With guaranteeing requistes of security for existing privacy protecting Comparing key reference reduced computation time of server to reduce computation complex from O(m) to $O(\frac{m{^2/3}}{w})$ than the existing form.

VLSI Implementation of Low-Power Motion Estimation Using Reduced Memory Accesses and Computations (메모리 호출과 연산횟수 감소기법을 이용한 저전력 움직임추정 VLSI 구현)

  • Moon, Ji-Kyung;Kim, Nam-Sub;Kim, Jin-Sang;Cho, Won-Kyung
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.32 no.5A
    • /
    • pp.503-509
    • /
    • 2007
  • Low-power motion estimation is required for video coding in portable information devices. In this paper, we propose a low-power motion estimation algorithm and 1-D systolic may VLSI architecture using full search block matching algorithm (FSBMA). Main power dissipation sources of FSBMA are complex computations and frequent memory accesses for data in the search area. In the proposed algorithm, memory accesses and computations are reduced by using 1D PE (processing array) array architecture performing motion estimation of two neighboring blocks in parallel and by skipping unnecessary computations during motion estimation. The VLSI implementation results of the algorithm show that the proposed VLSI architecture can save 9.3% power dissipation and can operate two times faster than an existing low-power motion estimator.