• Title/Summary/Keyword: and Parallel Processing

Search Result 2,013, Processing Time 0.028 seconds

A Study on the Hierachical Coding of the Angiography by Using the Scalable Structure in the MPACS System (MPACS 시스템에서 Scalable 구조를 이용한 심장 조영상의 계층적 부호화에 관한 연구)

  • Han, Young-Oh;Jung, Jae-Woo;Ahn, Jin-Ho;Park, Jong-Kwan;Shin, Joon-In;Park, Sang-Hui
    • Proceedings of the KOSOMBE Conference
    • /
    • v.1995 no.05
    • /
    • pp.235-238
    • /
    • 1995
  • In this paper, we propose an effective coding method of the angiography by using the scalable structure in the frequency domain for MPACS(Medical Picture Archiving and Communication System). We employed the subband decomposition method and MPEG-2 system which is the international standard coding method of the general moving picture. After the subband decomposition is applied to split an input image into 4 bands in the spatial frequency domain, the motion compensated DPCM coding method of MPEG-2 is carried out for each subband. As a result, an easily controllable coding Structure is accomplished by composing the compound hit stream for each subband group. Follows are the simulation results of the proposed sheme for the angiography. A scalable structure which can be easily controlled for a loss of transmission or the band limit can be accomplisbed in the MPEG-2 stucture by the subband decomposition minimizing the side information. And by reducing the search area of the motion vector between -4 and 3, the processing speed of a codec is enhanced by more than two times without a loss of the picture quality compare with the conventional DCT coefficients decompositon method. And the processing speed is considerably improved in the case of the parallel construction of each subband in the hardware.

  • PDF

Cerebrocortical Regions Associated with Implicit and Explicit Memory Retrieval Under the Conceptual Processing: BOLD Functional MR Imaging

  • Kim, Hyung-Joong;Kang, Hyung-Geun;Seo, Jung-Jin;Jung, Kwang-Woo;Eun, Sung-Jong;Park, Jin-Kyun;Yoon, Woong;Park, Tae-Jin
    • Proceedings of the KSMRM Conference
    • /
    • 2002.11a
    • /
    • pp.111-111
    • /
    • 2002
  • Purpose: This study is to compare the distinct brain activation between implicit and explicit memory retrieval tasks using a non-invasive blood-oxygenation-level-dependent (BOLD) functional magnetic resonance imaging(fMRI). Materials & Methods: We studied seven right-handed, healthy volunteers aged 21-25 years(mean;22 years) were scanned under a 1.5T Signa Horizon Echospeed MR imager(GE Medical Systems, Milwaukee, U.S.A.). During the implicit and explicit memory retrieval tasks of previously teamed words under the conceptual processing, we acquired fMRI data using gradient-echo EPI with 50ms TE, 3000ms TR, 26cm${\times}$26cm field-of-view, 128${\times}$128 matrix, and ten slices(6mm slice thickness, 1 mm gap) parallel to the AC-PC(anterior commissure and posterior commissure) line. By using the program of statistical parametric mapping(SPM99), functional activation maps were reconstructed and quantified.

  • PDF

Efficient Construction of Large Scale Steiner Tree using Polynomial-Time Approximation Scheme (PTAS를 이용한 대형 스타이너 트리의 효과적인 구성)

  • Kim, In-Bum
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.47 no.5
    • /
    • pp.25-34
    • /
    • 2010
  • By introducing additional nodes called Steiner points, the problem of Steiner Minimum Tree whose length can be shorter than Minimum Spanning Tree and which connects all input terminal nodes belongs to Non-Polynomial Complete domain. Though diverse heuristic methods can be applied to the problem, most of them may meet serious pains in computing and waiting for a solution of the problem with numerous input nodes. For numerous input nodes, an efficient PTAS approximation method producing candidate unit steiner trees with portals in most bottom layer, merging them hierarchically to construct their parent steiner trees in upper layer and building swiftly final approximation Steiner tree in most top layer is suggested in this paper. The experiment with 16,000 input nodes and designed 16 unit areas in most bottom layer shows 85.4% execution time improvement in serial processing and 98.9% in parallel processing comparing with pure Steiner heuristic method, though 0.24% overhead of tree length. Therefore, the suggested PTAS Steiner tree method can have a wide range applications to build a large scale approximation Steiner tree quickly.

Embedding a Mesh into a Crossed Cube (메쉬의 교차큐브에 대한 임베딩)

  • Kim, Sook-Yeon
    • The KIPS Transactions:PartA
    • /
    • v.15A no.6
    • /
    • pp.301-308
    • /
    • 2008
  • The crossed cube has received great attention because it has equal or superior properties to the hypercube that is widely known as a versatile parallel processing system. It has been known that a mesh of size $2{\times}2^m$ can be embedded into a crossed cube with dilation 1 and expansion 1 and a mesh of size $4{\times}2^m$ with dilation 1 and expansion 2. However, as we know, it has been a conjecture that a mesh with more than eight rows and columns can be embedded into a crossed cube with dilation 1. In this paper, we show that a mesh of size $2^n{\times}2^m$ can be embedded into a crossed cube with dilation 1 and expansion $2^{n-1}$ where $n{\geq}1$ and $m{\geq}3$.

Real-time Depth Image Refinement using Hierarchical Joint Bilateral Filter (계층적 결합형 양방향 필터를 이용한 실시간 깊이 영상 보정 방법)

  • Shin, Dong-Won;Hoa, Yo-Sung
    • Journal of Broadcast Engineering
    • /
    • v.19 no.2
    • /
    • pp.140-147
    • /
    • 2014
  • In this paper, we propose a method for real-time depth image refinement. In order to improve the quality of the depth map acquired from Kinect camera, we employ constant memory and texture memory which are suitable for a 2D image processing in the graphics processing unit (GPU). In addition, we applied the joint bilateral filter (JBF) in parallel to accelerate the overall execution. To enhance the quality of the depth image, we applied the JBF hierarchically using the compute unified device architecture (CUDA). Finally, we obtain the refined depth image. Experimental results showed that the proposed real-time depth image refinement algorithm improved the subjective quality of the depth image and the computational time was 260 frames per second.

Hybrid FFT processor design using Parallel PD adder circuit (병렬 PD가산회로를 이용한 Hybrid FFT 연산기 설계)

  • 김성대;최전균;안점영;송홍복
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2000.10a
    • /
    • pp.499-503
    • /
    • 2000
  • The use of Multiple-Valued FFT(Fast fourier Transform) is extended from binary to multiple-valued logic(MVL) circuits. A multiple-valued FFT circuit can be implemented using current-mode CMOS techniques, reducing the transitor, wires count between devices to half compared to that of a binary implementation. For adder processing in FFT, We give the number representation using such redundant digit sets are called redundant positive-digit number representation and a Redundant set uses the carry-propagation-free addition method. As the designed Multiple-valued FFT internally using PD(positive digit) adder with the digit set 0,1,2,3 has attractive features on speed, regularity of the structure and reduced complexities of active elements and interconnections. for the mutiplier processing, we give Multiple-valued LUT(Look up table)to facilitate simple mathmatical operations on the stored digits. Finally, Multiple-valued 8point FFT operation is used as an example in this paper to illuatrates how a multiple-valued FFT can be beneficial.

  • PDF

A Efficient Architecture of MBA-based Parallel MAC for High-Speed Digital Signal Processing (고속 디지털 신호처리를 위한 MBA기반 병렬 MAC의 효율적인 구조)

  • 서영호;김동욱
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.41 no.7
    • /
    • pp.53-61
    • /
    • 2004
  • In this paper, we proposed a new architecture of MAC(Multiplier-Accumulator) to operate high-speed multiplication-accumulation. We used the MBA(Modified radix-4 Booth Algorithm) which is based on the 1's complement number system, and CSA(Carry Save Adder) for addition of the partial products. During the addition of the partial product, the signed numbers with the 1's complement type after Booth encoding are converted in the 2's complement signed number in the CSA tree. Since 2-bit CLA(Carry Look-ahead Adder) was used in adding the lower bits of the partial product, the input bit width of the final adder and whole delay of the critical path were reduced. The proposed MAC was applied into the DWT(Discrete Wavelet Transform) filtering operation for JPEG2000, and it showed the possibility for the practical application. Finally we identified the improved performance according to the comparison with the previous architecture in the aspect of hardware resource and delay.

Fast View Synthesis Using GPGPU (GPGPU를 이용한 고속 영상 합성 기법)

  • Shin, Hong-Chang;Park, Han-Hoon;Park, Jong-Il
    • Journal of Broadcast Engineering
    • /
    • v.13 no.6
    • /
    • pp.859-874
    • /
    • 2008
  • In this paper, we develop a fast view synthesis method that generates multiple intermediate views in real-time for the 3D display system when the camera geometry and depth map of reference views are given in advance. The proposed method achieves faster view synthesis than previous approaches in GPU by processing in parallel the entire computations required for the view synthesis. Specifically, we use $CUDA^{TM}$ (by NVIDIA) to control GPU device. For increasing the processing speed, we adapted all the processes for the view synthesis to single instruction multiple data (SIMD) structure that is a main feature of CUDA, maximized the use of the high-speed memories on GPU device, and optimized the implementation. As a result, we could synthesize 9 intermediate view images with the size of 720 by 480 pixels within 0.128 second.

A Study of the Radiation Characteristics of Novel Printed Antenna Composed of Dual Elements with Different Shape (다른 형태를 가진 2소자 프린트 안테나의 방사특성에 관한 연구)

  • Lee, Chai-Bong;Kim, Jung-Hyun
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.9 no.2
    • /
    • pp.141-145
    • /
    • 2008
  • When the current flows to parallel lines with different length, it is researched that the radiation is occured by the common-mode current radiates, and the small light weight antenna composed of dual elements by using the principle is proposed. However, there is a problem in production about this antenna because this liner antenna is structured by combining with wires. In this paper, we improved this liner antenna, and designed the plane antenna composed of dual elements with different length in the plane printed board to produce and to design easily. Furthermore, the antenna with the wide-band characteristic is also designed in the same board. The radiation pattern is similar to the dipole antenna on account of designing the triangular patch S, the notch and two tapers in patch S, the notch and two tapers in the antenna element. In result, it was able to design the antenna working wider band-width(the bandwidth ratio about 58%, $VSWR{\le}2$).

  • PDF

Analysis of GPU Performance and Memory Efficiency according to Task Processing Units (작업 처리 단위 변화에 따른 GPU 성능과 메모리 접근 시간의 관계 분석)

  • Son, Dong Oh;Sim, Gyu Yeon;Kim, Cheol Hong
    • Smart Media Journal
    • /
    • v.4 no.4
    • /
    • pp.56-63
    • /
    • 2015
  • Modern GPU can execute mass parallel computation by exploiting many GPU core. GPGPU architecture, which is one of approaches exploiting outstanding computational resources on GPU, executes general-purpose applications as well as graphics applications, effectively. In this paper, we investigate the impact of memory-efficiency and performance according to number of CTAs(Cooperative Thread Array) on a SM(Streaming Multiprocessors), since the analysis of relation between number of CTA on a SM and them provides inspiration for researchers who study the GPU to improve the performance. Our simulation results show that almost benchmarks increasing the number of CTAs on a SM improve the performance. On the other hand, some benchmarks cannot provide performance improvement. This is because the number of CTAs generated from same kernel is a little or the number of CTAs executed simultaneously is not enough. To precisely classify the analysis of performance according to number of CTA on a SM, we also analyze the relations between performance and memory stall, dram stall due to the interconnect congestion, pipeline stall at the memory stage. We expect that our analysis results help the study to improve the parallelism and memory-efficiency on GPGPU architecture.