• Title/Summary/Keyword: Parallel Search

Search Result 317, Processing Time 0.031 seconds

Two-Phase Distributed Evolutionary algorithm with Inherited Age Concept

  • Kang, Young-Hoon;Z. Zenn Bien
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2001.10a
    • /
    • pp.101.4-101
    • /
    • 2001
  • Evolutionary algorithm has been receiving a remarkable attention due to the model-free and population-based parallel search attributes and much successful results are coming out. However, there are some problems in most of the evolutionary algorithms. The critical one is that it takes much time or large generations to search the global optimum in case of the objective function with multimodality. Another problem is that it usually cannot search all the local optima because it pays great attention to the search of the global optimum. In addition, if the objective function has several global optima, it may be very difficult to search all the global optima due to the global characteristics of the selection methods. To cope with these problems, at first we propose a preprocessing process, grid-filtering algorithm(GFA), and propose a new distributed evolutionary ...

  • PDF

K-Nearest Neighbor Associative Memory with Reconfigurable Word-Parallel Architecture

  • An, Fengwei;Mihara, Keisuke;Yamasaki, Shogo;Chen, Lei;Mattausch, Hans Jurgen
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.16 no.4
    • /
    • pp.405-414
    • /
    • 2016
  • IC-implementations provide high performance for solving the high computational cost of pattern matching but have relative low flexibility for satisfying different applications. In this paper, we report an associative memory architecture for k nearest neighbor (KNN) search, which is one of the most basic algorithms in pattern matching. The designed architecture features reconfigurable vector-component parallelism enabled by programmable switching circuits between vector components, and a dedicated majority vote circuit. In addition, the main time-consuming part of KNN is solved by a clock mapping concept based weighted frequency dividers that drastically reduce the in principle exponential increase of the worst-case search-clock number with the bit width of vector components to only a linear increase. A test chip in 180 nm CMOS technology, which has 32 rows, 8 parallel 8-bit vector-components in each row, consumes altogether in peak 61.4 mW and only 11.9 mW for nearest squared Euclidean distance search (at 45.58 MHz and 1.8 V).

Problem space based search algorithm for manufacturing process with rework probabilities affecting product quality and tardiness (Rework 확률이 제품의 품질과 납기준수에 영향을 주는 공정을 위한 문제공간기반 탐색 알고리즘)

  • Kang, Yong-Ha;Lee, Young-Sup;Shin, Hyun-Joon
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.10 no.7
    • /
    • pp.1702-1710
    • /
    • 2009
  • In this paper, we propose a problem space based search(PSBS) algorithm to solve parallel machine scheduling problem considering rework probabilities. For each pair of a machine and a job type, rework probability of each job on a machine can be known through historical data acquisition. Neighborhoods are generated by perturbing four problem data vectors (processing times, due dates, setup times, and rework probabilities) and evaluated through the efficient dispatching heuristic (EDDR). The proposed algorithm is measured by maximum lateness and the number of reworked jobs. We show that the PSBS algorithm is considerably improved from the result obtained by EDDR.

Graph Convolutional - Network Architecture Search : Network architecture search Using Graph Convolution Neural Networks (그래프 합성곱-신경망 구조 탐색 : 그래프 합성곱 신경망을 이용한 신경망 구조 탐색)

  • Su-Youn Choi;Jong-Youel Park
    • The Journal of the Convergence on Culture Technology
    • /
    • v.9 no.1
    • /
    • pp.649-654
    • /
    • 2023
  • This paper proposes the design of a neural network structure search model using graph convolutional neural networks. Deep learning has a problem of not being able to verify whether the designed model has a structure with optimized performance due to the nature of learning as a black box. The neural network structure search model is composed of a recurrent neural network that creates a model and a convolutional neural network that is the generated network. Conventional neural network structure search models use recurrent neural networks, but in this paper, we propose GC-NAS, which uses graph convolutional neural networks instead of recurrent neural networks to create convolutional neural network models. The proposed GC-NAS uses the Layer Extraction Block to explore depth, and the Hyper Parameter Prediction Block to explore spatial and temporal information (hyper parameters) based on depth information in parallel. Therefore, since the depth information is reflected, the search area is wider, and the purpose of the search area of the model is clear by conducting a parallel search with depth information, so it is judged to be superior in theoretical structure compared to GC-NAS. GC-NAS is expected to solve the problem of the high-dimensional time axis and the range of spatial search of recurrent neural networks in the existing neural network structure search model through the graph convolutional neural network block and graph generation algorithm. In addition, we hope that the GC-NAS proposed in this paper will serve as an opportunity for active research on the application of graph convolutional neural networks to neural network structure search.

Design of Omok AI using Genetic Algorithm and Game Trees and Their Parallel Processing on the GPU (유전 알고리즘과 게임 트리를 병합한 오목 인공지능 설계 및 GPU 기반 병렬 처리 기법)

  • Ahn, Il-Jun;Park, In-Kyu
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.37 no.2
    • /
    • pp.66-75
    • /
    • 2010
  • This paper proposes an efficient method for design and implementation of the artificial intelligence (AI) of 'omok' game on the GPU. The proposed AI is designed on a cooperative structure using min-max game tree and genetic algorithm. Since the evaluation function needs intensive computation but is independently performed on a lot of candidates in the solution space, it is computed on the GPU in a massive parallel way. The implementation on NVIDIA CUDA and the experimental results show that it outperforms significantly over the CPU, in which parallel game tree and genetic algorithm on the GPU runs more than 400 times and 300 times faster than on the CPU. In the proposed cooperative AI, selective search using genetic algorithm is performed subsequently after the full search using game tree to search the solution space more efficiently as well as to avoid the thread overflow. Experimental results show that the proposed algorithm enhances the AI significantly and makes it run within the time limit given by the game's rule.

A Study on the Efficient m-step Parallel Generalization

  • Kim, Sun-Kyung
    • Proceedings of the Korea Society of Information Technology Applications Conference
    • /
    • 2005.11a
    • /
    • pp.13-16
    • /
    • 2005
  • It would be desirable to have methods for specific problems, which have low communication costs compared to the computation costs, and in specific applications, algorithms need to be developed and mapped onto parallel computer architectures. Main memory access for shared memory system or global communication in message passing system deteriorate the computation speed. In this paper, it is found that the m-step generalization of the block Lanczos method enhances parallel properties by forming m simultaneous search direction vector blocks. QR factorization, which lowers the speed on parallel computers, is not necessary in the m-step block Lanczos method. The m-step method has the minimized synchronization points, which resulted in the minimized global communications compared to the standard methods.

  • PDF

Implementation of High-Speed Reed-Solomon Decoder Using the Modified Euclid's Algorithm (개선된 수정 유클리드 알고리듬을 이용한 고속의 Reed-Solomon 복호기의 설계)

  • 김동선;최종찬;정덕진
    • The Transactions of the Korean Institute of Electrical Engineers A
    • /
    • v.48 no.7
    • /
    • pp.909-915
    • /
    • 1999
  • In this paper, we propose an efficient VLSI architecture of Reed-Solomon(RS) decoder. To improve the speed. we develope an architecture featuring parallel and pipelined processing. To implement the parallel and pipelined processing architecture, we analyze the RS decoding algorithm and the honor's algorithm for parallel processing and we also modified the Euclid's algorithm to apply the efficient parallel structure in RS decoder. To show the proposed architecture, the performance of the proposed RS decoder is compared to Shao's and we obtain the 10 % efficiency in area and three times faster in speed when it's compared to Shao's time domain decoder. In addition, we implemented the proposed RS decoder with Altera FPGA Flex10K-50.

  • PDF

Resource management for moldable parallel tasks supporting slot time in the Cloud

  • Li, Jianmin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.9
    • /
    • pp.4349-4371
    • /
    • 2019
  • Moldable parallel tasks are widely used in different areas, such as weather forecast, biocomputing, mechanical calculation, and so on. Considering the deadline and the speedup, scheduling moldable parallel tasks becomes a difficulty. Past work majorly focuses on the LA (List Algorithms) or OMA (Optimizing the Middle Algorithms). Different from prior work, our work normalizes execution time and makes all tasks have the same scope in normalized execution time: [0,1], and then according to the normalized execution time, a method is used to search for the reference execution time without considering the deadline of tasks. According to the reference execution time, we get an initial scheduling result based on AFCFS (Adaptive First Comes First Served) policy. Finally, a heuristic approach is used to improve the performance of the initial scheduling result. We call our method HSRET (a Heuristic Scheduling method based on Reference Execution Time). Comparisons to other methods show that HSRET has good performance in AWT (Average Waiting Time), AET (Average Execution Time), and PUT (Percentages of Unfinished Tasks).

An Efficient Parallel Simulation Algorithm on Recursive Feedforward Network (Recursive Feedforword Network 상에서의 효율적인 병렬 시뮬레이션 알고리즘)

  • 옥시건
    • Journal of the Korea Society for Simulation
    • /
    • v.4 no.2
    • /
    • pp.79-92
    • /
    • 1995
  • In this paper we present an efficient parallel simulation algorithm in recursive feedforward network(RFN) which can reduce the simulation delay while decreasing the number of null messages compared to the previous result. As a preprocessing step, we first determine the group and type of each oupput channel for the nodes using DFS(Depth First Search) algorithm, and show that the number of null messages as well as the simulation scheme. By the new scheme we decide if null messages are sent to the output channels or not according to the group to which it belongs.

  • PDF

Built-in self test for testing neighborhood pattern sensitive faults in content addressable memories (Content addressable memory의 이웃패턴감응고장 테스트를 위한 내장된 자체 테스트 기법)

  • 강용석;이종철;강성호
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.35C no.8
    • /
    • pp.1-9
    • /
    • 1998
  • A new parallel test algorithm and a built-in self test (BIST) architecture are developed to test various types of functional faults efficiently in content addressable memories (CAMs). In test mode, the read oepratin is replaced by one parallel content addressable search operation and the writing operating is performed parallely with small peripheral circuit modificatins. The results whow that an efficient and practical testing with very low complexity and area overhead can be achieved.

  • PDF