• Title/Summary/Keyword: and Parallel Processing

Search Result 2,013, Processing Time 0.033 seconds

Accelerating Medical Image Processing on Integrated GPU Using OpenCL (OpenCL을 이용한 내장형 GPU에서의 의학영상처리 가속화)

  • Kim, Beom-Jun;Shin, Byeong-seok
    • Journal of the Korea Computer Graphics Society
    • /
    • v.23 no.2
    • /
    • pp.1-10
    • /
    • 2017
  • A variety of filters are applied to improve the quality of noise and low resolution medical images. This is necessary to reduce the radiation dose of the patient and to improve the utilization of the conventional spherical imaging equipment. In the conventional method, it is common to perform filtering using the CPU of the PC. However, it is difficult to produce results in real time by applying various calculations and filters to high-resolution human images using only the CPU performance of a PC used in a hospital. In this paper, we analyze the structure and performance of Intel integrated GPU in CPU and propose a method to perform image filtering using OpenCL parallel processing function. By applying complex filters with high computational complexity to medical images, high quality images can be generated in real time.

Prefetching Policy based on File Acess Pattern and Cache Area (파일 접근 패턴과 캐쉬 영역을 고려한 선반입 기법)

  • Lim, Jae-Deok;Hwang-Bo, Jun-Hyeong;Koh, Kwang-Sik;Seo, Dae-Hwa
    • The KIPS Transactions:PartA
    • /
    • v.8A no.4
    • /
    • pp.447-454
    • /
    • 2001
  • Various caching and prefetching algorithms have been investigated to identify and effective method for improving the performance of I/O devices. A prefetching algorithm decreases the processing time of a system by reducing the number of disk accesses when an I/O is needed. This paper proposes an AMBA prefetching method that is an extended version of the OBA prefetching method. The AMBA prefetching method will prefetching blocks continuously as long as disk bandwidth is enough. In this method, though there were excessive data request rate, we would expect efficient prefetching. And in the AMBA prefetching method, to prevent the cache pollution, it limits the number of data blocks to be prefetched within the cache area. It can be implemented in a user-level File System based on a Linux Operating System. In particular, the proposed prefetching policy improves the system performance by about 30∼40% for large files that are accessed sequentially.

  • PDF

Parallel solution of linear systems on the CRAY-2 using multi/micro tasking library (CRAY-2에서 멀티/마이크로 태스킹 라이브러리를 이용한 선형시스템의 병렬해법)

  • Ma, Sang-Back
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.11
    • /
    • pp.2711-2720
    • /
    • 1997
  • Multitasking and microtasking on the CRAY machine provides still another way to improve computational power. Since CRAY-2 has 4 processors we can achieve speedup up to 4 properly designed algorithms. In this paper we present two parallelizations of linear system solution in the CRAY-2 with multitasking and microtasking library. One is the LU decomposition on the dense matrices and the other is the iterative solution of large sparse linear systems with the preconditioner proposed by Radicati di Brozolo. In the first case we realized a speedup of 1.3 with 2 processors for a matrix of dimension 600 with the multitasking and in the second case a speedup of around 3 with 4 processors for a matrix of dimension 600 with the multitasking and in the second case a speedup of around 3 with 4 processors for a matrix of dimension 8192 with the microtasking. In the first case the speedup is limited because of the nonuniform vector lenghts. In the second case the ILU(0) preconditioner with Radicati's technique seem to realize a reasonable high speedup with 4 processors.

  • PDF

Implementation of Omni-directional Image Viewer Program for Effective Monitoring (효과적인 감시를 위한 전방위 영상 기반 뷰어 프로그램 구현)

  • Jeon, So-Yeon;Kim, Cheong-Hwa;Park, Goo-Man
    • Journal of Broadcast Engineering
    • /
    • v.23 no.6
    • /
    • pp.939-946
    • /
    • 2018
  • In this paper, we implement a viewer program that can monitor effectively using omni-directional images. The program consists of four modes: Normal mode, ROI(Region of Interest) mode, Tracking mode, and Auto-rotation mode, and the results for each mode is displayed simultaneously. In the normal mode, the wide angle image is rendered as a spherical image to enable pan, tilt, and zoom. In ROI mode, the area is displayed expanded by selecting an area. And, in Auto-rotation mode, it is possible to track the object by mapping the position of the object with the rotation angle of the spherical image to prevent the object from deviating from the spherical image in Tracking mode. Parallel programming for processing of multiple modes is performed to improve the processing speed. This has the advantage that various angles can be seen compared with surveillance system having a limited angle of view.

Artificial Intelligence-based Classification Scheme to improve Time Series Data Accuracy of IoT Sensors (IoT 센서의 시계열 데이터 정확도 향상을 위한 인공지능 기반 분류 기법)

  • Kim, Jin-Young;Sim, Isaac;Yoon, Sung-Hoon
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.21 no.4
    • /
    • pp.57-62
    • /
    • 2021
  • As the parallel computing capability for artificial intelligence improves, the field of artificial intelligence technology is expanding in various industries. In particular, artificial intelligence is being introduced to process data generated from IoT sensors that have enoumous data. However, the limitation exists when applying the AI techniques on IoT network because IoT has time series data, where the importance of data changes over time. In this paper, we propose time-weighted and user-state based artificial intelligence processing techniques to effectively process IoT sensor data. This technique aims to effectively classify IoT sensor data through a data pre-processing process that personalizes time series data and places a weight on the time series data before artificial intelligence learning and use status of personal data. Based on the research, it is possible to propose a method of applying artificial intelligence learning in various fields.

Customized AI Exercise Recommendation Service for the Balanced Physical Activity (균형적인 신체활동을 위한 맞춤형 AI 운동 추천 서비스)

  • Chang-Min Kim;Woo-Beom Lee
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.23 no.4
    • /
    • pp.234-240
    • /
    • 2022
  • This paper proposes a customized AI exercise recommendation service for balancing the relative amount of exercise according to the working environment by each occupation. WISDM database is collected by using acceleration and gyro sensors, and is a dataset that classifies physical activities into 18 categories. Our system recommends a adaptive exercise using the analyzed activity type after classifying 18 physical activities into 3 physical activities types such as whole body, upper body and lower body. 1 Dimensional convolutional neural network is used for classifying a physical activity in this paper. Proposed model is composed of a convolution blocks in which 1D convolution layers with a various sized kernel are connected in parallel. Convolution blocks can extract a detailed local features of input pattern effectively that can be extracted from deep neural network models, as applying multi 1D convolution layers to input pattern. To evaluate performance of the proposed neural network model, as a result of comparing the previous recurrent neural network, our method showed a remarkable 98.4% accuracy.

Design and Implementation of Distributed Cluster Supporting Dynamic Down-Scaling of the Cluster (노드의 동적 다운 스케일링을 지원하는 분산 클러스터 시스템의 설계 및 구현)

  • Woo-Seok Ryu
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.18 no.2
    • /
    • pp.361-366
    • /
    • 2023
  • Apache Hadoop, a representative framework for distributed processing of big data, has the advantage of increasing cluster size up to thousands of nodes to improve parallel distributed processing performance. However, reducing the size of the cluster is limited to the extent of permanently decommissioning nodes with defects or degraded performance, so there are limitations to operate multiple nodes flexibly in small clusters. In this paper, we discuss the problems that occur when removing nodes from the Hadoop cluster and propose a dynamic down-scaling technique to manage the distributed cluster more flexibly. To do this, we design and implement a modified Hadoop system and interfaces to support dynamic down-scaling of the cluster which supports temporary pause of a node and reconnection of it when necessary, rather than decommissioning the node when removing a node from the Hadoop cluster. We have verified that effective downsizing can be performed without performance degradation based on experimental results.

Design of a High-Speed Data Packet Allocation Circuit for Network-on-Chip (NoC 용 고속 데이터 패킷 할당 회로 설계)

  • Kim, Jeonghyun;Lee, Jaesung
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.10a
    • /
    • pp.459-461
    • /
    • 2022
  • One of the big differences between Network-on-Chip (NoC) and the existing parallel processing system based on an off-chip network is that data packet routing is performed using a centralized control scheme. In such an environment, the best-effort packet routing problem becomes a real-time assignment problem in which data packet arriving time and processing time is the cost. In this paper, the Hungarian algorithm, a representative computational complexity reduction algorithm for the linear algebraic equation of the allocation problem, is implemented in the form of a hardware accelerator. As a result of logic synthesis using the TSMC 0.18um standard cell library, the area of the circuit designed through case analysis for the cost distribution is reduced by about 16% and the propagation delay of it is reduced by about 52%, compared to the circuit implementing the original operation sequence of the Hungarian algorithm.

  • PDF

Closed-form and numerical solution of the static and dynamic analysis of coupled shear walls by the continuous method and the modified transfer matrix method

  • Mao C. Pinto
    • Structural Engineering and Mechanics
    • /
    • v.86 no.1
    • /
    • pp.49-68
    • /
    • 2023
  • This study investigates the static and dynamic structural analysis of symmetrical and asymmetrical coupled shear walls using the continuous and modified transfer matrix methods by idealizing the coupled shear wall as a three-field CTB-type replacement beam. The coupled shear wall is modeled as a continuous structure consisting of the parallel coupling of a Timoshenko beam in tension (with axial extensibility in the shear walls) and a shear beam (replacing the beam coupling effect between the shear walls). The variational method using the Hamilton principle is used to obtain the coupled differential equations and the boundary conditions associated with the model. Using the continuous method, closed-form analytical solutions to the differential equation for the coupled shear wall with uniform properties along the height are derived and a numerical solution using the modified transfer matrix is proposed to overcome the difficulty of coupled shear walls with non-uniform properties along height. The computational advantage of the modified transfer matrix method compared to the classical method is shown. The results of the numerical examples and the parametric analysis show that the proposed analytical and numerical model and method is accurate, reliable and involves reduced processing time for generalized static and dynamic structural analysis of coupled shear walls at a preliminary stage and can used as a verification method in the final stage of the project.

A Comparison of TDMA, Dirty Paper Coding, and Beamforming for Multiuser MIMO Relay Networks

  • Li, Jianing;Zhang, Jianhua;Zhang, Yu;Zhang, Ping
    • Journal of Communications and Networks
    • /
    • v.10 no.2
    • /
    • pp.186-193
    • /
    • 2008
  • A two-hop multiple-input multiple-output (MIMO) relay network which comprises a multiple antenna source, an amplify-and-forward MIMO relay and many potential users are studied in this paper. Consider the achievable sum rate as the performance metric, a joint design method for the processing units of the BS and relay node is proposed. The optimal structures are given, which decompose the multiuser MIMO relay channel into several parallel single-input single-output relay channels. With these structures, the signal-to-noise ratio at the destination users is derived; and the power allocation is proved to be a convex problem. We also show that high sum rate can be achieved by pairing each link according to its magnitude. The sum rate of three broadcast strategies, time division multiple access (TDMA) to the strongest user, dirty paper coding (DPC), and beamforming (BF) are investigated. The sum rate bounds of these strategies and the sum capacity (achieved by DPC) gain over TDMA and BF are given. With these results, it can be easily obtained that how far away TDMA and BF are from being optimal in terms of the achievable sum rate.