• Title/Summary/Keyword: Parallel data processing

Search Result 751, Processing Time 0.03 seconds

Simulation of YUV-Aware Instructions for High-Performance, Low-Power Embedded Video Processors (고성능, 저전력 임베디드 비디오 프로세서를 위한 YUV 인식 명령어의 시뮬레이션)

  • Kim, Cheol-Hong;Kim, Jong-Myon
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.13 no.5
    • /
    • pp.252-259
    • /
    • 2007
  • With the rapid development of multimedia applications and wireless communication networks, consumer demand for video-over-wireless capability on mobile computing systems is growing rapidly. In this regard, this paper introduces YUV-aware instructions that enhance the performance and efficiency in the processing of color image and video. Traditional multimedia extensions (e.g., MMX, SSE, VIS, and AltiVec) depend solely on generic subword parallelism whereas the proposed YUV-aware instructions support parallel operations on two-packed 16-bit YUV (6-bit Y, 5-bits U, V) values in a 32-bit datapath architecture, providing greater concurrency and efficiency for color image and video processing. Moreover, the ability to reduce data format size reduces system cost. Experiment results on a representative dynamically scheduled embedded superscalar processor show that YUV-aware instructions achieve an average speedup of 3.9x over the baseline superscalar performance. This is in contrast to MMX (a representative Intel#s multimedia extension), which achieves a speedup of only 2.1x over the same baseline superscalar processor. In addition, YUV-aware instructions outperform MMX instructions in energy reduction (75.8% reduction with YUV-aware instructions, but only 54.8% reduction with MMX instructions over the baseline).

Efficient FPGA Logic Design for Rotatory Vibration Data Acquisition (회전체 진동 데이터 획득을 위한 효율적인 FPGA 로직 설계)

  • Lee, Jung-Sik;Ryu, Deung-Ryeol
    • 전자공학회논문지 IE
    • /
    • v.47 no.4
    • /
    • pp.18-27
    • /
    • 2010
  • This paper is designed the efficient Data Acquisition System for an vibration of rotatory machines. The Data Acquisition System is consist of the analog logic having signal filer and amplifier, and digital logic with ADC, DSP, FPGA and FIFO memory. The vibration signal of rotatory machines acquired from sensors is controlled by the FPGA device through the analog logic and is saved to FIFO memory being converted analog to digital signal. The digital signal process is performed by the DSP using the vibration data in FIFO memory. The vibration factor of the rotatory machinery analysis and diagnosis is defined the RMS, Peak to Peak, average, GAP, FFT of vibration data and digital filtering by DSP, and is need to follow as being happened the event of vibration and make an application to an warning system. It takes time to process the several analysis step of all vibration data and the event follow, also special event. It should be continuously performed the data acquisition and the process, however during processing the input signal the DSP can not be performed to the acquisited data after then, also it will be lose the data at several channel. Therefore it is that the system uses efficiently the DSP and FPGA devices for reducing the data lose, it design to process a part of the signal data to FPGA from DSP in order to minimize the process time, and a process to parallel process system, as a result of design system it propose to method of faster process and more efficient data acquisition system by using DSP and FPGA than signal DSP system.

Development of PC-based and portable high speed impedance analyzer for biosensor (바이오센서를 위한 PC 기반의 휴대용 고속 임피던스 분석기 개발)

  • Kim, Gi-Ryon;Kim, Gwang-Nyeon;Heo, Seung-Deok;Lee, Seung-Hoon;Choi, Byeong-Cheol;Kim, Cheol-Han;Jeon, Gye-Rok;Jung, Dong-Keun
    • Journal of Sensor Science and Technology
    • /
    • v.14 no.1
    • /
    • pp.33-41
    • /
    • 2005
  • For more convenient electrode-electrolyte interface impedance analysis in biosensor, a stand-alone impedance measurement system is required. In our study, we developed a PC-based portable system to analyze impedance of the electrochemical cell using microprocessor. The devised system consists of signal generator, programmable amplifiers, A/D converter, low pass filter, potentiostat, I/V converter, microprocessor, and PC interface. As a microprocessor, PIC16F877 which has the processing speed of 5 MIPS was used. For data acquisition, the sampling rate at 40 k samples/sec, resolution of 12 bit is used. RS-232 with 115.2 kbps speed is used for the PC communication. The square wave was used as stimuli signal for impedance analysis and voltage-controlled current measurement method of three-electrode-method were adopted. Acquired voltage and current data are calculated to multifrequency impedance signal after Fourier transform. To evaluate the implemented system, we set up the dummy cell as equivalent circuit of which was composed of resistor, parallel circuit of capacitor and resistor connected in parallel and measured the impedance of the dummy cell; the result showed that there exist accuracy within 5 % errors and reproduction within 1 % errors compared to output of Hioki LCR tester and HP impedance analyzer as a standard product. These results imply that it is possible to analyze electrode-electrolyte interface impedance quantitatively in biosensor and to implement the more portable high speed impedance analysis system compared to existing systems.

Efficient Collaboration Method Between CPU and GPU for Generating All Possible Cases in Combination (조합에서 모든 경우의 수를 만들기 위한 CPU와 GPU의 효율적 협업 방법)

  • Son, Ki-Bong;Son, Min-Young;Kim, Young-Hak
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.7 no.9
    • /
    • pp.219-226
    • /
    • 2018
  • One of the systematic ways to generate the number of all cases is a combination to construct a combination tree, and its time complexity is O($2^n$). A combination tree is used for various purposes such as the graph homogeneity problem, the initial model for calculating frequent item sets, and so on. However, algorithms that must search the number of all cases of a combination are difficult to use realistically due to high time complexity. Nevertheless, as the amount of data becomes large and various studies are being carried out to utilize the data, the number of cases of searching all cases is increasing. Recently, as the GPU environment becomes popular and can be easily accessed, various attempts have been made to reduce time by parallelizing algorithms having high time complexity in a serial environment. Because the method of generating the number of all cases in combination is sequential and the size of sub-task is biased, it is not suitable for parallel implementation. The efficiency of parallel algorithms can be maximized when all threads have tasks with similar size. In this paper, we propose a method to efficiently collaborate between CPU and GPU to parallelize the problem of finding the number of all cases. In order to evaluate the performance of the proposed algorithm, we analyze the time complexity in the theoretical aspect, and compare the experimental time of the proposed algorithm with other algorithms in CPU and GPU environment. Experimental results show that the proposed CPU and GPU collaboration algorithm maintains a balance between the execution time of the CPU and GPU compared to the previous algorithms, and the execution time is improved remarkable as the number of elements increases.

Channel Searching Method of IEEE 802.15.4 Nodes for Avoiding WiFi Traffic Interference (WiFi 트래픽 간섭을 피하기 위한 IEEE 802.15.4 노드의 채널탐색방법)

  • Song, Myong Lyol
    • Journal of Internet Computing and Services
    • /
    • v.15 no.2
    • /
    • pp.19-31
    • /
    • 2014
  • In this paper, a parallel backoff delay procedure on multiple IEEE 802.15.4 channels and a channel searching method considering the frequency spectrum of WiFi traffic are studied for IEEE 802.15.4 nodes to avoid the interference from WiFi traffic. In order to search the channels being occupied by WiFi traffic, we analyzed the methods measuring the powers of adjacent channels simultaneously, checking the duration of measured power levels greater than a threshold, and finding the same periodicity of sampled RSSI data as the beacon frame by signal processing. In an wireless channel overlapped with IEEE 802.11 network, the operation of CSMA-CA algorithm for IEEE 802.15.4 nodes is explained. A method to execute a parallel backoff procedure on multiples IEEE 802.15.4 channels by an IEEE 802.15.4 device is proposed with the description of its algorithm. When we analyze the data measured by the experimental system implemented with the proposed method, it is observed that medium access delay times increase at the same time in the associated IEEE 802.15.4 channels that are adjacent each other during the generation of WiFi traffic. A channel evaluation function to decide the interference from other traffic on an IEEE 802.15.4 channel is defined. A channel searching method considering the channel evaluations on the adjacent channels together is proposed in order to search the IEEE 802.15.4 channels interfered by WiFi, and the experimental results show that it correctly finds the channels interfered by WiFi traffic.

Verification of safety integrity for vital data processing device through quantitative safety analysis (정량적 안전성 분석을 통한 Vital 데이터 처리장치의 안전무결성 요구사항 검증)

  • Choi, Jin-Woo;Park, Jae-Young
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.16 no.7
    • /
    • pp.4863-4870
    • /
    • 2015
  • Currently, as a priority to secure the safety of the railway signalling system, verification for satisfy of the safety integrity requirements(SIR) is required to the essential elements. Safety Integrity Requirements(SIR) verification is performed based on the system safety analysis. But the probability of securing basic data for system safety analysis significantly dropped because there is no experience yet performed in the country. Therefore we are had to rely on a qualitative analysis. There are methods such as qualitative risk analysis matrix, and risk graphs. The qualitative analysis is wide, the width of the accident. However, the reliability of the result is significantly less has a disadvantage. Therefore, it should be parallel quantitative safety analysis of the system/products in order to compensate for the disadvantages of the qualitative analysis. This paper presents a quantitative safety analysis method to overcome the disadvantages of the qualitative analysis. And through a result, highly reliable Safety Integrity Requirements(SIR) verification measures proposed. Verification results, the dangerous failure incidence for vital data processing device was calculated to be $1.172279{\times}10^{-9}$. The result was verified to exceed the required safety integrity targets more.

(Theoretical Performance analysis of 12Mbps, r=1/2, k=7 Viterbi deocder and its implementation using FPGA for the real time performance evaluation) (12Mbps, r=1/2, k=7 비터비 디코더의 이론적 성능분석 및 실시간 성능검증을 위한 FPGA구현)

  • Jeon, Gwang-Ho;Choe, Chang-Ho;Jeong, Hae-Won;Im, Myeong-Seop
    • Journal of the Institute of Electronics Engineers of Korea SC
    • /
    • v.39 no.1
    • /
    • pp.66-75
    • /
    • 2002
  • For the theoretical performance analysis of Viterbi Decoder for wireless LAN with data rate 12Mbps, code rate 1/2 and constraint length 7 defined in IEEE 802.11a, the transfer function is derived using Cramer's rule and the first-event error probability and bit error probability is derived under the AWGN. In the design process, input symbol is quantized into 16 steps for 4 bit soft decision and register exchange method instead of memory method is proposed for trace back, which enables the majority at the final decision stage. In the implementation, the Viterbi decoder based on parallel architecture with pipelined scheme for processing 12Mbps high speed data rate and AWGN generator are implemented using FPGA chips. And then its performance is verified in real time.

Utilizing Local Bilingual Embeddings on Korean-English Law Data (한국어-영어 법률 말뭉치의 로컬 이중 언어 임베딩)

  • Choi, Soon-Young;Matteson, Andrew Stuart;Lim, Heui-Seok
    • Journal of the Korea Convergence Society
    • /
    • v.9 no.10
    • /
    • pp.45-53
    • /
    • 2018
  • Recently, studies about bilingual word embedding have been gaining much attention. However, bilingual word embedding with Korean is not actively pursued due to the difficulty in obtaining a sizable, high quality corpus. Local embeddings that can be applied to specific domains are relatively rare. Additionally, multi-word vocabulary is problematic due to the lack of one-to-one word-level correspondence in translation pairs. In this paper, we crawl 868,163 paragraphs from a Korean-English law corpus and propose three mapping strategies for word embedding. These strategies address the aforementioned issues including multi-word translation and improve translation pair quality on paragraph-aligned data. We demonstrate a twofold increase in translation pair quality compared to the global bilingual word embedding baseline.

Development and Speed Comparison of Convolutional Neural Network Using CUDA (CUDA를 이용한 Convolutional Neural Network의 구현 및 속도 비교)

  • Ki, Cheol-min;Cho, Tai-Hoon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2017.05a
    • /
    • pp.335-338
    • /
    • 2017
  • Currently Artificial Inteligence and Deep Learning are social issues, and These technologies are applied to various fields. A good method among the various algorithms in Artificial Inteligence is Convolutional Neural Network. Convolutional Neural Network is a form that adds convolution layers that extracts features by convolution operation on a general neural network method. If you use Convolutional Neural Network as small amount of data, or if the structure of layers is not complicated, you don't have to pay attention to speed. But the learning time is long as the size of the learning data is large and the structure of layers is complicated. So, GPU-based parallel processing is a lot. In this paper, we developed Convolutional Neural Network using CUDA and Learning speed is faster and more efficient than the method using the CPU.

  • PDF

A Fast Parity Resynchronization Scheme for Small and Mid-sized RAIDs (중소형 레이드를 위한 빠른 패리티 재동기화 기법)

  • Baek, Sung Hoon;Park, Ki-Wong
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.2 no.10
    • /
    • pp.413-420
    • /
    • 2013
  • Redundant arrays of independent disks (RAID) without a power-fail-safe component in small and mid-sized business suffers from intolerably long resynchronization time after a unclean power-failure. Data blocks and a parity block in a stripe must be updated in a consistent manner, however a data block may be updated but the corresponding parity block may not be updated when a power goes off. Such a partially modified stripe must be updated with a correct parity block. However, it is difficult to find which stripe is partially updated (inconsistent). The widely-used traditional parity resynchronization manner is a intolerably long process that scans the entire volume to find and fix inconsistent stripes. This paper presents a fast resynchronization scheme with a negligible overhead for small and mid-sized RAIDs. The proposed scheme is integrated into a software RAID driver in a Linux system. According to the performance evaluation, the proposed scheme shortens the resynchronization process from 200 minutes to 5 seconds with 2% overhead for normal I/Os.