• Title/Summary/Keyword: in-memory computing

Search Result 766, Processing Time 0.026 seconds

Adaptive White Point Extraction based on Dark Channel Prior for Automatic White Balance

  • Jo, Jieun;Im, Jaehyun;Jang, Jinbeum;Yoo, Yoonjong;Paik, Joonki
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.5 no.6
    • /
    • pp.383-389
    • /
    • 2016
  • This paper presents a novel automatic white balance (AWB) algorithm for consumer imaging devices. While existing AWB methods require reference white patches to correct color, the proposed method performs the AWB function using only an input image in two steps: i) white point detection, and ii) color constancy gain computation. Based on the dark channel prior assumption, a white point or region can be accurately extracted, because the intensity of a sufficiently bright achromatic region is higher than that of other regions in all color channels. In order to finally correct the color, the proposed method computes color constancy gain values based on the Y component in the XYZ color space. Experimental results show that the proposed method gives better color-corrected images than recent existing methods. Moreover, the proposed method is suitable for real-time implementation, since it does not need a frame memory for iterative optimization. As a result, it can be applied to various consumer imaging devices, including mobile phone cameras, compact digital cameras, and computational cameras with coded color.

A NOVEL PARALLEL METHOD FOR SPECKLE MASKING RECONSTRUCTION USING THE OPENMP

  • LI, XUEBAO;ZHENG, YANFANG
    • Journal of The Korean Astronomical Society
    • /
    • v.49 no.4
    • /
    • pp.157-162
    • /
    • 2016
  • High resolution reconstruction technology is developed to help enhance the spatial resolution of observational images for ground-based solar telescopes, such as speckle masking. Near real-time reconstruction performance is achieved on a high performance cluster using the Message Passing Interface (MPI). However, much time is spent in reconstructing solar subimages in such a speckle reconstruction. We design and implement a novel parallel method for speckle masking reconstruction of solar subimage on a shared memory machine using the OpenMP. Real tests are performed to verify the correctness of our codes. We present the details of several parallel reconstruction steps. The parallel implementation between various modules shows a great speed increase as compared to single thread serial implementation, and a speedup of about 2.5 is achieved in one subimage reconstruction. The timing result for reconstructing one subimage with 256×256 pixels shows a clear advantage with greater number of threads. This novel parallel method can be valuable in real-time reconstruction of solar images, especially after porting to a high performance cluster.

Design and Implementation of Auto Set-up Program for SFP Module by using VEE (VEE를 이용한 SFP 모듈 자동 설정 프로그램 설계 및 개발)

  • Choi, Jeoung-Hoon;Jun, Byung-Uk;Koo, Yong-Wan
    • Journal of Internet Computing and Services
    • /
    • v.8 no.2
    • /
    • pp.67-76
    • /
    • 2007
  • Data used for the SFP module are stored in A0 and A2 memory area based on the SFP-MSA standard. In this paper the auto set-up program for SFP module has been designed and implemented. In order to make the Digital Diagnostic Monitoring Interface, the specific value has been written into the designated register via RS232 communication channel in the LD Driver IC. The Agilent VEE is used as a programming language for factory automation, and optical characteristics of SFP module and SFP-MSA standard are main structure of the implementation. The implemented program has been applied to the manufacturing field and the system gains a higher effect than the result of 6-Sigma.

  • PDF

Selecting a Synthesizable RISC-V Processor Core for Low-cost Hardware Devices

  • Gookyi, Dennis Agyemanh Nana;Ryoo, Kwangki
    • Journal of Information Processing Systems
    • /
    • v.15 no.6
    • /
    • pp.1406-1421
    • /
    • 2019
  • The Internet-of-Things (IoT) has been deployed in almost every facet of our day to day activities. This is made possible because sensing and data collection devices have been given computing and communication capabilities. The devices implement System-on-Chips (SoCs) that incorporate a lot of functionalities, yet they are severely constrained in terms of memory capacitance, hardware area, and power consumption. With the increase in the functionalities of sensing devices, there is a need for low-cost synthesizable processors to handle control, interfacing, and error processing. The first step in selecting a synthesizable processor core for low-cost devices is to examine the hardware resource utilization to make sure that it fulfills the requirements of the device. This paper gives an analysis of the hardware resource usage of ten synthesizable processors that implement the Reduced Instruction Set Computer Five (RISC-V) Instruction Set Architecture (ISA). All the ten processors are synthesized using Vivado v2018.02. The maximum frequency, area, and power reports are extracted and a comparison is made to determine which processor is ideal for low-cost hardware devices.

A Study on Context Aware Middleware Design and Application (상황인식 미들웨어의 설계와 적용에 관한 연구)

  • Jang, Dong-Wook;Sohn, Surg-Won;Han, Kwang-Rok
    • The KIPS Transactions:PartD
    • /
    • v.18D no.5
    • /
    • pp.393-402
    • /
    • 2011
  • This paper describes a design and application of middleware that is essential to the context-aware system. We define a transducer interface protocol in order to deal with a variety of context data. For the purpose of systematic process of data between middleware modules, a message oriented middleware is designed and implemented. Memory improves the performance of high-performance computing system compared to previous strategies. Context aware middleware adopts service oriented architecture so that functions in modules may be independent and scalability can be remarkable. Using messages across modules decreases the complexity of the application development. In order to justify the usefulness of the proposed context aware middleware, we carried out our experiments in bridge health monitoring system and verified the efficacy.

GCC2Verilog Compiler Toolset for Complete Translation of C Programming Language into Verilog HDL

  • Huong, Giang Nguyen Thi;Kim, Seon-Wook
    • ETRI Journal
    • /
    • v.33 no.5
    • /
    • pp.731-740
    • /
    • 2011
  • Reconfigurable computing using a field-programmable gate-array (FPGA) device has become a promising solution in system design because of its power efficiency and design flexibility. To bring the benefit of FPGA to many application programmers, there has been intensive research about automatic translation from high-level programming languages (HLL) such as C and C++ into hardware. However, the large gap of syntaxes and semantics between hardware and software programming makes the translation challenging. In this paper, we introduce a new approach for the translation by using the widely used GCC compiler. By simply adding a hardware description language (HDL) backend to the existing state-of- the-art compiler, we could minimize an effort to implement the translator while supporting full features of HLL in the HLL-to-HDL translation and providing high performance. Our translator, called GCC2Verilog, was implemented as the GCC's cross compiler targeting at FPGAs instead of microprocessor architectures. Our experiment shows that we could achieve a speedup of up to 34 times and 17 times on average with 4-port memory over PICO microprocessor execution in selected EEMBC benchmarks.

Particle Swarm Optimization in Gated Recurrent Unit Neural Network for Efficient Workload and Resource Management (효율적인 워크로드 및 리소스 관리를 위한 게이트 순환 신경망 입자군집 최적화)

  • Ullah, Farman;Jadhav, Shivani;Yoon, Su-Kyung;Nah, Jeong Eun
    • Journal of the Semiconductor & Display Technology
    • /
    • v.21 no.3
    • /
    • pp.45-49
    • /
    • 2022
  • The fourth industrial revolution, internet of things, and the expansion of online web services have increased an exponential growth and deployment in the number of cloud data centers (CDC). The cloud is emerging as new paradigm for delivering the Internet-based computing services. Due to the dynamic and non-linear workload and availability of the resources is a critical problem for efficient workload and resource management. In this paper, we propose the particle swarm optimization (PSO) based gated recurrent unit (GRU) neural network for efficient prediction the future value of the CPU and memory usage in the cloud data centers. We investigate the hyper-parameters of the GRU for better model to effectively predict the cloud resources. We use the Google Cluster traces to evaluate the aforementioned PSO-GRU prediction. The experimental shows the effectiveness of the proposed algorithm.

Abnormal Electrocardiogram Signal Detection Based on the BiLSTM Network

  • Asif, Husnain;Choe, Tae-Young
    • International Journal of Contents
    • /
    • v.18 no.2
    • /
    • pp.68-80
    • /
    • 2022
  • The health of the human heart is commonly measured using ECG (Electrocardiography) signals. To identify any anomaly in the human heart, the time-sequence of ECG signals is examined manually by a cardiologist or cardiac electrophysiologist. Lightweight anomaly detection on ECG signals in an embedded system is expected to be popular in the near future, because of the increasing number of heart disease symptoms. Some previous research uses deep learning networks such as LSTM and BiLSTM to detect anomaly signals without any handcrafted feature. Unfortunately, lightweight LSTMs show low precision and heavy LSTMs require heavy computing powers and volumes of labeled dataset for symptom classification. This paper proposes an ECG anomaly detection system based on two level BiLSTM for acceptable precision with lightweight networks, which is lightweight and usable at home. Also, this paper presents a new threshold technique which considers statistics of the current ECG pattern. This paper's proposed model with BiLSTM detects ECG signal anomaly in 0.467 ~ 1.0 F1 score, compared to 0.426 ~ 0.978 F1 score of the similar model with LSTM except one highly noisy dataset.

MPI-OpenMP Hybrid Parallelization for Multibody Peridynamic Simulations (다물체 페리다이나믹 해석을 위한 MPI-OpenMP 혼합 병렬화)

  • Lee, Seungwoo;Ha, Youn Doh
    • Journal of the Computational Structural Engineering Institute of Korea
    • /
    • v.33 no.3
    • /
    • pp.171-178
    • /
    • 2020
  • In this study, we develop MPI-OpenMP hybrid parallelization for multibody peridynamic simulations. Peridynamics is suitable for analyzing complicated dynamic fractures and various discontinuities. However, compared with a conventional finite element method, nonlocal interactions in peridynamics cost more time and memory. In multibody peridynamic analysis, the costs increase due to the additional interactions that occur when computing the nonlocal contact and ghost interlayer models between adjacent bodies. The costs become excessive when further refinement and smaller time steps are required in cases of high-velocity impact fracturing or similar instances. Thus, high computational efficiency and performance can be achieved by parallelization and optimization of multibody peridynamic simulations. The analytical code is developed using an Intel Fortran MPI compiler and OpenMP in NURION of the KISTI HPC center and parallelized through MPI-OpenMP hybrid parallelization. Further parallelization is conducted by hybridizing with OpenMP threads in each MPI process. We also try to minimize communication operations by model-based decomposition of MPI processes. The numerical results for the impact fracturing of multiple bodies show that the computing performance improves significantly with MPI-OpenMP hybrid parallelization.

Domain Decomposition Approach Applied for Two- and Three-dimensional Problems via Direct Solution Methodology

  • Kwak, Jun Young;Cho, Haeseong;Chun, Tae Young;Shin, SangJoon;Bauchau, Olivier A.
    • International Journal of Aeronautical and Space Sciences
    • /
    • v.16 no.2
    • /
    • pp.177-189
    • /
    • 2015
  • This paper presents an all-direct domain decomposition approach for large-scale structural analysis. The proposed approach achieves computational robustness and efficiency by enforcing the compatibility of the displacement field across the sub-domain boundaries via local Lagrange multipliers and augmented Lagrangian formulation (ALF). The proposed domain decomposition approach was compared to the existing FETI approach in terms of the computational time and memory usage. The parallel implementation of the proposed algorithm was described in detail. Finally, a preliminary validation was attempted for the proposed approach, and the numerical results of two- and three-dimensional problems were compared to those obtained through a dual-primal FETI approach. The results indicate an improvement in the performance as a result of the implementing the proposed approach.