• Title/Summary/Keyword: Parallel Calculation

Search Result 359, Processing Time 0.023 seconds

FPGA Design of a Parallel Canny Edge Detector with Optimized Local Buffers (로컬 버퍼 최적화를 통한 병렬 처리 캐니 경계선 검출기의 FPGA 설계)

  • Ingi Min;Suhyun Sim;Seungwon Hwang;Sunhee Kim
    • Journal of the Semiconductor & Display Technology
    • /
    • v.22 no.4
    • /
    • pp.59-65
    • /
    • 2023
  • Edge detection in image processing and computer vision is one of the most fundamental operations. Canny edge detection algorithm has excellent performance and is currently widely used. However, it is difficult to process the algorithm in real-time because the algorithm is complex. In this study, the equations required in the algorithm were simplified to facilitate hardware implementation, and the calculation speed was increased by using a parallel structure. In particular, the size and management of local buffers were selected in consideration of parallel processing and filter size so that data could be processed without bottlenecks. It was designed in verilog and implemented in FPGA to verify operation and performance.

  • PDF

Scattering Model for Electrical-Large Target Employing MLFMA and Radar Imaging Formation

  • Wu, Xia;Jin, Yaqiu
    • Journal of electromagnetic engineering and science
    • /
    • v.10 no.3
    • /
    • pp.166-170
    • /
    • 2010
  • To numerically calculate electromagnetic scattering from the electrical-large three-dimensional(3D) objects, the high-frequency approaches have been usually applied, but the accuracy and feasibility of these geometrical and physical optics(GO-PO) approaches, to some extent, are remained to be improved. In this paper, a new framework is developed for calculation of the near-field scattering field of an electrical-large 3D target by using a multilevel fast multipole algorithm(MLFMA) and generation of radar images by using a fast back-projection(FBP) algorithm. The MPI(Message Passing Interface) parallel computing is carried out to multiply the calculation efficiency greatly. Finally, a simple example of perfectly electrical conducting(PEC) patch and a canonical case of Fighting Falcon F-16 are presented.

Practical Calculation of Iron Loss for Cylindrical Linear Machine

  • Jeong, Sung-In
    • Journal of Electrical Engineering and Technology
    • /
    • v.13 no.5
    • /
    • pp.1901-1907
    • /
    • 2018
  • This paper is a study for accurate iron loss calculation of a cylindrical linear machine for free piston engine. This study presents that it is possible to accurately predict power loss in ferromagnetic laminations under magnetic flux by specially considering the dependence of hysteresis, classical, and excess loss components on the magnetic induction derivative. Significant iron loss in the armature core will not only compromise the machine efficiency, but may also result in excessive heating, which could lead to irreversible deterioration in the machine performance. Thus, correct prediction of power losses under a distorted flux waveform is therefore an important prerequisite to machine design, particularly when dealing with large apparatus where stringent efficiency standards are required. Finally, it will be discussed about the iron loss in various materials of cylindrical linear electric machine by geometric and electrical parameters. It will give elaborate information about the perfect design and design rules of cylindrical linear machine and in parallel tools for the calculation, simulation and design will be available.

A Study on Distributed System Construction and Numerical Calculation Using Raspberry Pi

  • Ko, Young-ho;Heo, Gyu-Seong;Lee, Sang-Hyun
    • International journal of advanced smart convergence
    • /
    • v.8 no.4
    • /
    • pp.194-199
    • /
    • 2019
  • As the performance of the system increases, more parallelized data is being processed than single processing of data. Today's cpu structure has been developed to leverage multicore, and hence data processing methods are being developed to enable parallel processing. In recent years desktop cpu has increased multicore, data is growing exponentially, and there is also a growing need for data processing as artificial intelligence develops. This neural network of artificial intelligence consists of a matrix, making it advantageous for parallel processing. This paper aims to speed up the processing of the system by using raspberrypi to implement the cluster building and parallel processing system against the backdrop of the foregoing discussion. Raspberrypi is a credit card-sized single computer made by the raspberrypi Foundation in England, developed for education in schools and developing countries. It is cheap and easy to get the information you need because many people use it. Distributed processing systems should be supported by programs that connected multiple computers in parallel and operate on a built-in system. RaspberryPi is connected to switchhub, each connected raspberrypi communicates using the internal network, and internally implements parallel processing using the Message Passing Interface (MPI). Parallel processing programs can be programmed in python and can also use C or Fortran. The system was tested for parallel processing as a result of multiplying the two-dimensional arrangement of 10000 size by 0.1. Tests have shown a reduction in computational time and that parallelism can be reduced to the maximum number of cores in the system. The systems in this paper are manufactured on a Linux-based single computer and are thought to require testing on systems in different environments.

A Disk Allocation Scheme for High-Performance Parallel File System (고성능 병렬화일 시스템을 위한 디스크 할당 방법)

  • Park, Kee-Hyun
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.9
    • /
    • pp.2827-2835
    • /
    • 2000
  • In recent years, much attention has been focused on improving I/O devices' processing speed which is essential in such large data processing areas as multimedia data processing. And studies on high-performance parallel file systems are considered to be one of such efforts. In this paper, an efficient disk allocation scheme is proposed for high-performance parallel file systems. In other words, the concept of a parallel disk file's parallelism is defined using data declustering characteristic of a given parallel file. With the concept, an efficient disk allocation scheme is proposed which calculates the appropriate degree of data declustering on disks for each parallel file in order to obtain the maximum throughput when more than one parallel file is used at the same time. Since, calculation for obtaining the maximum throughput is too complex as the number of parallel files increases, an approximate disk allocation algorithm is also proposed in this paper. The approximate algorithm is very simple and especially provides very good results when I/O workload is high. In addition, it has shown that the approximate algorithm provides the optimal disk allocation for the maximum throughput when the arrival rate of I/O requests is infinite.

  • PDF

High-Performance VLSI Architecture for Stereo Vision (스테레오 비전을 위한 고성능 VLSI 구조)

  • Seo, Youngho;Kim, Dong-Wook
    • Journal of Broadcast Engineering
    • /
    • v.18 no.5
    • /
    • pp.669-679
    • /
    • 2013
  • This paper proposed a new VLSI (Very Large Scale Integrated Circuit) architecture for stereo matching in real time. We minimized the amount of calculation and the number of memory accesses through analyzing calculation of stereo matching. From this, we proposed a new stereo matching calculating cell and a new hardware architecture by expanding it in parallel, which concurrently calculates cost function for all pixels in a search range. After expanding it, we proposed a new hardware architecture to calculate cost function for 2-dimensional region. The implemented hardware can be operated with minimum 250Mhz clock frequence in FPGA (Field Programmable Gate Array) environment, and has the performance of 805fps in case of the search range of 64 pixels and the image size of $640{\times}480$.

Molecular Dynamics Free Energy Simulation Study to Rationalize the Relative Activities of PPAR δ Agonists

  • Lee, Woo-Jin;Park, Hwang-Seo;Lee, Sangyoub
    • Bulletin of the Korean Chemical Society
    • /
    • v.29 no.2
    • /
    • pp.363-371
    • /
    • 2008
  • As a computational method for the discovery of the effective agonists for PPARd, we address the usefulness of molecular dynamics free energy (MDFE) simulation with explicit solvent in terms of the accuracy and the computing cost. For this purpose, we establish an efficient computational protocol of thermodynamic integration (TI) that is superior to free energy perturbation (FEP) method in parallel computing environment. Using this protocol, the relative binding affinities of GW501516 and its derivatives for PPARd are calculated. The accuracy of our protocol was evaluated in two steps. First, we devise a thermodynamic cycle to calculate the absolute and relative hydration free energies of test molecules. This allows a self-consistent check for the accuracy of the calculation protocol. Second, the calculated relative binding affinities of the selected ligands are compared with experimental IC50 values. The average deviation of the calculated binding free energies from the experimental results amounts at the most to 1 kcal/mol. The computational efficiency of current protocol is also assessed by comparing its execution times with those of the sequential version of the TI protocol. The results show that the calculation can be accelerated by 4 times when compared to the sequential run. Based on the calculations with the parallel computational protocol, a new potential agonist of GW501516 derivative is proposed.

Real-time Stabilization Method for Video acquired by Unmanned Aerial Vehicle (무인 항공기 촬영 동영상을 위한 실시간 안정화 기법)

  • Cho, Hyun-Tae;Bae, Hyo-Chul;Kim, Min-Uk;Yoon, Kyoungro
    • Journal of the Semiconductor & Display Technology
    • /
    • v.13 no.1
    • /
    • pp.27-33
    • /
    • 2014
  • Video from unmanned aerial vehicle (UAV) is influenced by natural environments due to the light-weight UAV, specifically by winds. Thus UAV's shaking movements make the video shaking. Objective of this paper is making a stabilized video by removing shakiness of video acquired by UAV. Stabilizer estimates camera's motion from calculation of optical flow between two successive frames. Estimated camera's movements have intended movements as well as unintended movements of shaking. Unintended movements are eliminated by smoothing process. Experimental results showed that our proposed method performs almost as good as the other off-line based stabilizer. However estimation of camera's movements, i.e., calculation of optical flow, becomes a bottleneck to the real-time stabilization. To solve this problem, we make parallel stabilizer making average 30 frames per second of stabilized video. Our proposed method can be used for the video acquired by UAV and also for the shaking video from non-professional users. The proposed method can also be used in any other fields which require object tracking, or accurate image analysis/representation.

Architecture design for speeding up Multi-Access Memory System(MAMS) (Multi-Access Memory System(MAMS)의 속도 향상을 위한 아키텍처 설계)

  • Ko, Kyung-sik;Kim, Jae Hee;Lee, S-Ra-El;Park, Jong Won
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.54 no.6
    • /
    • pp.55-64
    • /
    • 2017
  • High-capacity, high-definition image applications need to process considerable amounts of data at high speed. Accordingly, users of these applications demand a high-speed parallel execution system. To increase the speed of a parallel execution system, Park (2004) proposed a technique, called MAMS (Multi-Access Memory System), to access data in several execution units without the conflict of parallel processing memories. Since then, many studies on MAMS have been conducted, furthering the technique to MAMS-PP16 and MAMS-PP64, among others. As a memory architecture for parallel processing, MAMS must be constructed in one chip; therefore, a method to achieve the identical functionality as the existing MAMS while minimizing the architecture needs to be studied. This study proposes a method of miniaturizing the MAMS architecture in which the architectures of the ACR (Address Calculation and Routing) circuit and MMS (Memory Module Selection) circuit, which deliver data in memories to parallel execution units (PEs), do not use the MMS circuit, but are constructed as one shift and conditional statements whose number is the same as that of memory modules inside the ACR circuit. To verify the performance of the realized architecture, the study conducted the processing time of the proposed MAMS-PP64 through an image correlation test, the results of which demonstrated that the ratio of the image correlation from the proposed architecture was improved by 1.05 on average.

Dynamics Modeling and Control of a Delta High-speed Parallel Robot (Delta 고속 병렬로봇의 동역학 모델링 및 제어)

  • Kim, Han Sung
    • Journal of the Korean Society of Manufacturing Process Engineers
    • /
    • v.13 no.5
    • /
    • pp.90-97
    • /
    • 2014
  • This paper presents a simplified dynamics model, dynamics simulations, and computed torque control experiments of the Delta high-speed parallel robot. Using the typical Newton-Euler method, a simplified but accurate dynamics model with practical assumptions is derived. Accuracy and fast calculations of the dynamics are essential in the computed torque control for high-speed applications. It was found that the simplified dynamics equation is in very god agreement with the ADAMS model, and the calculation time of the inverse kinematics and inverse dynamics is about 0.04 msec. From the dynamics simulations, the cycle trajectory along the y-axis requires less peak motor torque and a lower angular velocity and less power than that along the x-axis. The computed torque control scheme can reduce the position error by half as compared to a PD control scheme. Finally, the developed Delta parallel robot prototype, half the size of the ABB Flexpicker robot, can achieve a cycle time of 0.43 sec with a 1.0kg payload.