• Title/Summary/Keyword: parallel computer processing

Search Result 652, Processing Time 0.028 seconds

A Graph Model and Analysis Algorithm for cDNA Microarray Image (cDNA 마이크로어레이 이미지를 위한 그래프 모델과 분석 알고리즘)

  • Jung, Ho-Youl;Hwang, Mi-Nyeong;Yu, Young-Jung;Cho, Hwan-Gue
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.29 no.7
    • /
    • pp.411-421
    • /
    • 2002
  • In this Paper we propose a new Image analysis algorithm for microarray processing and a method to locate the position of the grid cell using the topology of the grid spots. Microarray is a device which enables a parallel experiment of 10 to 100 thousands of test genes in order to measure the gene expression. Because of the huge data obtained by a experiment automated image analysis is needed. The final output of this microarray experiment is a set of 16-bit gray level image files which consist of grid-structured spots. In this paper we propose one algorithm which located the address of spots (spot indices) using graph structure from image data and a method which determines the precise location and shape of each spot by measuring the inclination of grid structure. Several experiments are given from real data sets.

Accelerating Medical Image Processing on Integrated GPU Using OpenCL (OpenCL을 이용한 내장형 GPU에서의 의학영상처리 가속화)

  • Kim, Beom-Jun;Shin, Byeong-seok
    • Journal of the Korea Computer Graphics Society
    • /
    • v.23 no.2
    • /
    • pp.1-10
    • /
    • 2017
  • A variety of filters are applied to improve the quality of noise and low resolution medical images. This is necessary to reduce the radiation dose of the patient and to improve the utilization of the conventional spherical imaging equipment. In the conventional method, it is common to perform filtering using the CPU of the PC. However, it is difficult to produce results in real time by applying various calculations and filters to high-resolution human images using only the CPU performance of a PC used in a hospital. In this paper, we analyze the structure and performance of Intel integrated GPU in CPU and propose a method to perform image filtering using OpenCL parallel processing function. By applying complex filters with high computational complexity to medical images, high quality images can be generated in real time.

Developing a Simulator of the Capture Process in Towed Fishing Gears by Chaotic Fish Behavior Model and Parallel Computing

  • Kim Yong-Hae;Ha Seok-Wun;Jun Yong-Kee
    • Fisheries and Aquatic Sciences
    • /
    • v.7 no.3
    • /
    • pp.163-170
    • /
    • 2004
  • A fishing simulator for towed fishing gear was investigated in order to mimic the fish behavior in capture process and investigate fishing selectivity. A fish behavior model using a psycho-hydraulic wheel activated by stimuli is established to introduce Lorenz chaos equations and a neural network system and to generate the components of realistic fish capture processes. The fish positions within the specified gear geometry are calculated from normalized intensities of the stimuli of the fishing gear components or neighboring fish and then these are related to the sensitivities and the abilities of the fish. This study is applied to four different towed gears i.e. a bottom trawl, a midwater trawl, a two-boat seine, and an anchovy boat seine and for 17 fish species as mainly caught. The Alpha cluster computer system and Fortran MPI (Message-Passing Interface) parallel programming were used for rapid calculation and mass data processing in this chaotic behavior model. The results of the simulation can be represented as animation of fish movements in relation to fishing gear using Open-GL and C graphic programming and catch data as well as selectivity analysis. The results of this simulator mimicked closely the field studies of the same gears and can therefore be used in further study of fishing gear design, predicting selectivity and indoor training systems.

Design of a set of One-to-Many Node-Disjoint and Nearly Shortest Paths on Recursive Circulant Networks

  • Chung, Ilyong
    • Journal of Korea Multimedia Society
    • /
    • v.16 no.7
    • /
    • pp.897-904
    • /
    • 2013
  • The recursive circulant network G(N,d) can be widely used in the design and implementation of parallel processing architectures. It consists of N identical nodes, each node is connected through bidirectional, point-to-point communication channels to different neighbors by jumping $d^i$, where $0{\leq}i{\leq}{\lceil}{\log}_dN{\rceil}$ - 1. In this paper, we investigate the routing of a message on $G(2^m,4)$, a special kind of RCN, that is key to the performance of this network. On $G(2^m,4)$ we would like to transmit k packets from a source node to k destination nodes simultaneously along paths on this network, the $i^{th}$ packet will be transmitted along the $i^{th}$ path, where $1{\leq}k{\leq}m-1$, $0{{\leq}}i{{\leq}}m-1$. In order for all packets to arrive at a destination node quickly and securely, we present an $O(m^4)$ routing algorithm on $G(2^m,4)$ for generating a set of one-to-many node-disjoint and nearly shortest paths, where each path is either shortest or nearly shortest and the total length of these paths is nearly minimum since the path is mainly determined by employing the Hungarian method.

Refined fixed granularity algorithm on Networks of Workstations (NOW 환경에서 개선된 고정 분할 단위 알고리즘)

  • Gu, Bon-Geun
    • The KIPS Transactions:PartA
    • /
    • v.8A no.2
    • /
    • pp.117-124
    • /
    • 2001
  • At NOW (Networks Of Workstations), the load sharing is very important role for improving the performance. The known load sharing strategy is fixed-granularity, variable-granularity and adaptive-granularity. The variable-granularity algorithm is sensitive to the various parameters. But Send algorithm, which implements the fixed-granularity strategy, is robust to task granularity. And the performance difference between Send and variable-granularity algorithm is not substantial. But, in Send algorithm, the computing time and the communication time are not overlapped. Therefore, long latency time at the network has influence on the execution time of the parallel program. In this paper, we propose the preSend algorithm. In the preSend algorithm, the master node can send the data to the slave nodes in advance without the waiting for partial results from the slaves. As the master node sent the next data to the slaves in advance, the slave nodes can process the data without the idle time. As stated above, the preSend algorithm can overlap the computing time and the communication time. Therefore we reduce the influence of the long latency time at the network and the execution time of the parallel program on the NOW. To compare the execution time of two algorithms, we use the $320{\times}320$ matrix multiplication. The comparison results of execution times show that the preSend algorithm has the shorter execution time than the Send algorithm.

  • PDF

Critical Current Degradation Analysis in HTS Pancake Coil due to Self Field Effects

  • Nah, Wan-Soo;Joo, Jin-Ho;Yoo, Jai-Moo
    • Progress in Superconductivity
    • /
    • v.1 no.1
    • /
    • pp.68-72
    • /
    • 1999
  • Since the discovery of high Tc superconductors, great efforts have been focused to develop high performance HTS magnets for the ultimate applications to power system devices. Magnet designers, however, have had difficulties in the estimation of the maximum operating current of the designed magnet from the tested short sample data, due to the degradation of the critical current density in the magnet. Similar story applies to the HTS electrical bus bar. It has been found that the critical current of Bi-2223 stacked tapes is much less than the total summation of critical currents of each tape, which is mainly attributed to the self magnetic fields. Furthermore, since the critical current degradation of Bi-2223 tape is greater in the normal magnetic field (to the tape surface) than in the parallel one, detailed magnetic field configurations are required to reduce the self field effects. In this paper, we calculate the self field effects of a stacked conductor, defining self field factors of normal and parallel magnetic fields to the tape surface. Finally, the critical current degradations in the HTS magnet are explained by the introduced self field factors of the stacked conductor.

  • PDF

7.7 Gbps Encoder Design for IEEE 802.11ac QC-LDPC Codes

  • Jung, Yong-Min;Chung, Chul-Ho;Jung, Yun-Ho;Kim, Jae-Seok
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.14 no.4
    • /
    • pp.419-426
    • /
    • 2014
  • This paper proposes a high-throughput encoding process and encoder architecture for quasi-cyclic low-density parity-check codes in IEEE 802.11ac standard. In order to achieve the high throughput with low complexity, a partially parallel processing based encoding process and encoder architecture are proposed. Forward and backward accumulations are performed in one clock cycle to increase the encoding throughput. A low complexity cyclic shifter is also proposed to minimize the hardware overhead of combinational logic in the encoder architecture. In IEEE 802.11ac systems, the proposed encoder is rate compatible to support various code rates and codeword block lengths. The proposed encoder is implemented with 130-nm CMOS technology. For (1944, 1620) irregular code, 7.7 Gbps throughput is achieved at 100 MHz clock frequency. The gate count of the proposed encoder core is about 96 K.

Design of modified HN for High Data Transmission (고속 데이터 전송을 위한 변형 해밍망 설계)

  • Kwon, Yong-Kwang
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.51 no.7
    • /
    • pp.251-257
    • /
    • 2014
  • The Viterbi algorithm(VA) is used to estimate the state transition of discrete-time finite state machine(FSM) that is in an uncorrelated noisy environment. This paper modified the Hamming Network to estimate the state transitions in the finite state machines, and proposed state-parallel and block-parallel Viterbi decoder. The modified Hamming Network(mHN) can perform the decoding of convolutional codes correctly as conventional Viterbi decoder. Furthermore, the complexities of the proposed Viterbi decoder are reduced approximately 10% less than conventional Viterbi decoder, and the processing times are improved approximately 40% more than conventional Viterbi decoder.

A novel hardware design for SIFT generation with reduced memory requirement

  • Kim, Eung Sup;Lee, Hyuk-Jae
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.13 no.2
    • /
    • pp.157-169
    • /
    • 2013
  • Scale Invariant Feature Transform (SIFT) generates image features widely used to match objects in different images. Previous work on hardware-based SIFT implementation requires excessive internal memory and hardware logic [1]. In this paper, a new hardware organization is proposed to implement SIFT with less memory and hardware cost than the previous work. To this end, a parallel Gaussian filter bank is adopted to eliminate the buffers that store intermediate results because parallel operations allow all intermediate results available at the same time. Furthermore, the processing order is changed from the raster-scan order to the block-by-block order so that the line buffer size storing the source image is also reduced. These techniques trade the reduction of memory size with a slight increase of the execution time and external memory bandwidth. As a result, the memory size is reduced by 94.4%. The proposed hardware for SIFT implementation includes the Descriptor generation block, which is omitted in the previous work [1]. The addition of the hardwired descriptor generation improves the computation speed by about 30 times when compared with the previous work.

A Parallelization Technique with Integrated Multi-Threading for Video Decoding on Multi-core Systems

  • Hong, Jung-Hyun;Kim, Won-Jin;Chung, Ki-Seok
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.7 no.10
    • /
    • pp.2479-2496
    • /
    • 2013
  • Increasing demand for Full High-Definition (FHD) video and Ultra High-Definition (UHD) video services has led to active research on high speed video processing. Widespread deployment of multi-core systems has accelerated studies on high resolution video processing based on parallelization of multimedia software. Even if parallelization of a specific decoding step may improve decoding performance partially, such partial parallelization may not result in sufficient performance improvement. Particularly, entropy decoding has often been considered separately from other decoding steps since the entropy decoding step could not be parallelized easily. In this paper, we propose a parallelization technique called Integrated Multi-Threaded Parallelization (IMTP) which takes parallelization of the entropy decoding step, with other decoding steps, into consideration in an integrated fashion. We used the Simultaneous Multi-Threading (SMT) technique with appropriate thread scheduling techniques to achieve the best performance for the entire decoding step. The speedup of the proposed IMTP method is up to 3.35 times faster with respect to the entire decoding time over a conventional decoding technique for H.264/AVC videos.