• Title/Summary/Keyword: and Parallel Processing

Search Result 2,013, Processing Time 0.03 seconds

Adaptive Pipeline Architecture for an Asynchronous Embedded Processor (비동기식 임베디드 프로세서를 위한 적응형 파이프라인 구조)

  • Lee, Seung-Sook;Lee, Je-Hoon;Lim, Young-Il;Cho, Kyoung-Rok
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.44 no.1
    • /
    • pp.51-58
    • /
    • 2007
  • This paper presented an adaptive pipeline architecture for a high-performance and low-power asynchronous processor. The proposed pipeline architecture employed a stage-skipping and a stage-combining scheme. The stage-skipping scheme can skip the operation of a bubble stage that is not used pipeline stage in an instruction execution. In the stage-combining scheme, two consecutive stages can be joined to form one stage if the latter stage is empty. The proposed pipeline architecture could reduce the processing time and power consumption. The proposed architecture supports multi-processing in the EX stage that executes parallel 4 instructions. We designed an asynchronous microprocessor to estimate the efficiency of the proposed pipeline architecture that was synthesized to a gate level design using a $0.35-{\mu}m$ CMOS standard cell library. We evaluated the performance of the target processor using SPEC2000 benchmark programs. The proposed architecture showed about 2.3 times higher speed than the asynchronous counterpart, AMULET3i. As a result, the proposed pipeline schemes and architecture can be used for asynchronous high-speed processor design

Hardware Design of High Performance In-loop Filter in HEVC Encoder for Ultra HD Video Processing in Real Time (UHD 영상의 실시간 처리를 위한 고성능 HEVC In-loop Filter 부호화기 하드웨어 설계)

  • Im, Jun-seong;Dennis, Gookyi;Ryoo, Kwang-ki
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2015.10a
    • /
    • pp.401-404
    • /
    • 2015
  • This paper proposes a high-performance in-loop filter in HEVC(High Efficiency Video Coding) encoder for Ultra HD video processing in real time. HEVC uses in-loop filter consisting of deblocking filter and SAO(Sample Adaptive Offset) to solve the problems of quantization error which causes image degradation. In the proposed in-loop filter encoder hardware architecture, the deblocking filter and SAO has a 2-level hybrid pipeline structure based on the $32{\times}32CTU$ to reduce the execution time. The deblocking filter is performed by 6-stage pipeline structure, and it supports minimization of memory access and simplification of reference memory structure using proposed efficient filtering order. Also The SAO is implemented by 2-statge pipeline for pixel classification and applying SAO parameters and it uses two three-layered parallel buffers to simplify pixel processing and reduce operation cycle. The proposed in-loop filter encoder architecture is designed by Verilog HDL, and implemented by 205K logic gates in TSMC 0.13um process. At 110MHz, the proposed in-loop filter encoder can support 4K Ultra HD video encoding at 30fps in realtime.

  • PDF

Development of Information Technology Infrastructures through Construction of Big Data Platform for Road Driving Environment Analysis (도로 주행환경 분석을 위한 빅데이터 플랫폼 구축 정보기술 인프라 개발)

  • Jung, In-taek;Chong, Kyu-soo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.19 no.3
    • /
    • pp.669-678
    • /
    • 2018
  • This study developed information technology infrastructures for building a driving environment analysis platform using various big data, such as vehicle sensing data, public data, etc. First, a small platform server with a parallel structure for big data distribution processing was developed with H/W technology. Next, programs for big data collection/storage, processing/analysis, and information visualization were developed with S/W technology. The collection S/W was developed as a collection interface using Kafka, Flume, and Sqoop. The storage S/W was developed to be divided into a Hadoop distributed file system and Cassandra DB according to the utilization of data. Processing S/W was developed for spatial unit matching and time interval interpolation/aggregation of the collected data by applying the grid index method. An analysis S/W was developed as an analytical tool based on the Zeppelin notebook for the application and evaluation of a development algorithm. Finally, Information Visualization S/W was developed as a Web GIS engine program for providing various driving environment information and visualization. As a result of the performance evaluation, the number of executors, the optimal memory capacity, and number of cores for the development server were derived, and the computation performance was superior to that of the other cloud computing.

Multi-threaded Web Crawling Design using Queues (큐를 이용한 다중스레드 방식의 웹 크롤링 설계)

  • Kim, Hyo-Jong;Lee, Jun-Yun;Shin, Seung-Soo
    • Journal of Convergence for Information Technology
    • /
    • v.7 no.2
    • /
    • pp.43-51
    • /
    • 2017
  • Background/Objectives : The purpose of this study is to propose a multi-threaded web crawl using queues that can solve the problem of time delay of single processing method, cost increase of parallel processing method, and waste of manpower by utilizing multiple bots connected by wide area network Design and implement. Methods/Statistical analysis : This study designs and analyzes applications that run on independent systems based on multi-threaded system configuration using queues. Findings : We propose a multi-threaded web crawler design using queues. In addition, the throughput of web documents can be analyzed by dividing by client and thread according to the formula, and the efficiency and the number of optimal clients can be confirmed by checking efficiency of each thread. The proposed system is based on distributed processing. Clients in each independent environment provide fast and reliable web documents using queues and threads. Application/Improvements : There is a need for a system that quickly and efficiently navigates and collects various web sites by applying queues and multiple threads to a general purpose web crawler, rather than a web crawler design that targets a particular site.

Shallow Marine Seismic Refraction Data Acquisition and Interpretation Using digital Technique (디지털 技法을 이용한 淺海底 屈折法 彈性波 探査資料의 取得과 解析)

  • 이호영;김철민
    • 한국해양학회지
    • /
    • v.27 no.1
    • /
    • pp.19-34
    • /
    • 1992
  • Marine seismic refraction surveys have been carried out by Korea Institute of Geology, Mining and Materials(KIGAM) since 1984. The recording of refraction data was based on analog instrumentation. Therefore the resolution of refraction data was not good enough to distinguish many layers. The objective of the interpretation of seismic refraction data is the determination of intervals and critically refracted seismic wave propagation velocities through the layers beneath the sea floor. To determine intervals and velocities precisely, the resolution of refraction data should be enhanced. The intent of the study is to improve the quality of shallow marine refraction data by the digital technique using microcomputer- based acquisition and processing system. The system consists of an IBM AT microcomputer clone, an analog-digital(A/D) converter. A mass storage unit and a parallel processing board. The A/D converter has 12 bits of precision and 250 kHz of conversion rate. The magneto-optical disk drive is used for the mass storage of seismic refraction data. Shallow marine seismic refraction surveys have been carried out using the system at 6 locations off Ulsan and Pusan area. The refraction data were acquired by the radio sonobuoy. The refraction profiles have been produced by the laser printer with 300 dpi resolution after the basic computer processing. 5-9 layers were interpreted from digital refraction profiles, whereas 2-4 layers were interpreted from analog refraction profiles. the propagation velocities of sediments were interpreted as 1.6-2.1 km/sec. The propagation velocities of acoustic basement were interpreted as 2.4-2.7 km/sec off Ulsan area, 4.8 km/sec off Pusan area.

  • PDF

Simulation of YUV-Aware Instructions for High-Performance, Low-Power Embedded Video Processors (고성능, 저전력 임베디드 비디오 프로세서를 위한 YUV 인식 명령어의 시뮬레이션)

  • Kim, Cheol-Hong;Kim, Jong-Myon
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.13 no.5
    • /
    • pp.252-259
    • /
    • 2007
  • With the rapid development of multimedia applications and wireless communication networks, consumer demand for video-over-wireless capability on mobile computing systems is growing rapidly. In this regard, this paper introduces YUV-aware instructions that enhance the performance and efficiency in the processing of color image and video. Traditional multimedia extensions (e.g., MMX, SSE, VIS, and AltiVec) depend solely on generic subword parallelism whereas the proposed YUV-aware instructions support parallel operations on two-packed 16-bit YUV (6-bit Y, 5-bits U, V) values in a 32-bit datapath architecture, providing greater concurrency and efficiency for color image and video processing. Moreover, the ability to reduce data format size reduces system cost. Experiment results on a representative dynamically scheduled embedded superscalar processor show that YUV-aware instructions achieve an average speedup of 3.9x over the baseline superscalar performance. This is in contrast to MMX (a representative Intel#s multimedia extension), which achieves a speedup of only 2.1x over the same baseline superscalar processor. In addition, YUV-aware instructions outperform MMX instructions in energy reduction (75.8% reduction with YUV-aware instructions, but only 54.8% reduction with MMX instructions over the baseline).

Efficient Collaboration Method Between CPU and GPU for Generating All Possible Cases in Combination (조합에서 모든 경우의 수를 만들기 위한 CPU와 GPU의 효율적 협업 방법)

  • Son, Ki-Bong;Son, Min-Young;Kim, Young-Hak
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.7 no.9
    • /
    • pp.219-226
    • /
    • 2018
  • One of the systematic ways to generate the number of all cases is a combination to construct a combination tree, and its time complexity is O($2^n$). A combination tree is used for various purposes such as the graph homogeneity problem, the initial model for calculating frequent item sets, and so on. However, algorithms that must search the number of all cases of a combination are difficult to use realistically due to high time complexity. Nevertheless, as the amount of data becomes large and various studies are being carried out to utilize the data, the number of cases of searching all cases is increasing. Recently, as the GPU environment becomes popular and can be easily accessed, various attempts have been made to reduce time by parallelizing algorithms having high time complexity in a serial environment. Because the method of generating the number of all cases in combination is sequential and the size of sub-task is biased, it is not suitable for parallel implementation. The efficiency of parallel algorithms can be maximized when all threads have tasks with similar size. In this paper, we propose a method to efficiently collaborate between CPU and GPU to parallelize the problem of finding the number of all cases. In order to evaluate the performance of the proposed algorithm, we analyze the time complexity in the theoretical aspect, and compare the experimental time of the proposed algorithm with other algorithms in CPU and GPU environment. Experimental results show that the proposed CPU and GPU collaboration algorithm maintains a balance between the execution time of the CPU and GPU compared to the previous algorithms, and the execution time is improved remarkable as the number of elements increases.

A Study of Big data-based Machine Learning Techniques for Wheel and Bearing Fault Diagnosis (차륜 및 차축베어링 고장진단을 위한 빅데이터 기반 머신러닝 기법 연구)

  • Jung, Hoon;Park, Moonsung
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.19 no.1
    • /
    • pp.75-84
    • /
    • 2018
  • Increasing the operation rate of components and stabilizing the operation through timely management of the core parts are crucial for improving the efficiency of the railroad maintenance industry. The demand for diagnosis technology to assess the condition of rolling stock components, which employs history management and automated big data analysis, has increased to satisfy both aspects of increasing reliability and reducing the maintenance cost of the core components to cope with the trend of rapid maintenance. This study developed a big data platform-based system to manage the rolling stock component condition to acquire, process, and analyze the big data generated at onboard and wayside devices of railroad cars in real time. The system can monitor the conditions of the railroad car component and system resources in real time. The study also proposed a machine learning technique that enabled the distributed and parallel processing of the acquired big data and automatic component fault diagnosis. The test, which used the virtual instance generation system of the Amazon Web Service, proved that the algorithm applying the distributed and parallel technology decreased the runtime and confirmed the fault diagnosis model utilizing the random forest machine learning for predicting the condition of the bearing and wheel parts with 83% accuracy.

A LDPC Decoder for DVB-S2 Standard Supporting Multiple Code Rates (DVB-S2 기반에서 다양한 부호화 율을 지원하는 LCPC 복호기)

  • Ryu, Hye-Jin;Lee, Jong-Yeol
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.45 no.2
    • /
    • pp.118-124
    • /
    • 2008
  • For forward error correction, DVB-S2, which is the digital video broadcasting forward error coding and modulation standard for satellite television, uses a system based the concatenation of BCH with LDPC inner coding. In DVB-S2 the LDPC codes are defined for 11 different code rates, which means that a DVB-S2 LDPC decoder should support multiple code rates. Seven of the 11 code rates, 3/5, 2/3, 3/4, 4/5, 5/6, 8/9, and 9/10, are regular and the rest four code rates, 1/4, 1/3, 2/5, and 1/2, are irregular. In this paper we propose a flexible decoder for the regular LDPC codes. We combined the partially parallel decoding architecture that has the advantages in the chip size, the memory efficiency, and the processing rate with Benes network to implement a DVB-S2 LDPC decoder that can support multiple code rates with a block size of 64,800 and can configure the interconnection between the variable nodes and the check nodes according to the parity-check matrix. The proposed decoder runs correctly at the frequency of 200MHz enabling 193.2Mbps decoding throughput. The area of the proposed decoder is $16.261m^2$ and the power dissipation is 198mW at a power supply voltage of 1.5V.

Development of PC-based and portable high speed impedance analyzer for biosensor (바이오센서를 위한 PC 기반의 휴대용 고속 임피던스 분석기 개발)

  • Kim, Gi-Ryon;Kim, Gwang-Nyeon;Heo, Seung-Deok;Lee, Seung-Hoon;Choi, Byeong-Cheol;Kim, Cheol-Han;Jeon, Gye-Rok;Jung, Dong-Keun
    • Journal of Sensor Science and Technology
    • /
    • v.14 no.1
    • /
    • pp.33-41
    • /
    • 2005
  • For more convenient electrode-electrolyte interface impedance analysis in biosensor, a stand-alone impedance measurement system is required. In our study, we developed a PC-based portable system to analyze impedance of the electrochemical cell using microprocessor. The devised system consists of signal generator, programmable amplifiers, A/D converter, low pass filter, potentiostat, I/V converter, microprocessor, and PC interface. As a microprocessor, PIC16F877 which has the processing speed of 5 MIPS was used. For data acquisition, the sampling rate at 40 k samples/sec, resolution of 12 bit is used. RS-232 with 115.2 kbps speed is used for the PC communication. The square wave was used as stimuli signal for impedance analysis and voltage-controlled current measurement method of three-electrode-method were adopted. Acquired voltage and current data are calculated to multifrequency impedance signal after Fourier transform. To evaluate the implemented system, we set up the dummy cell as equivalent circuit of which was composed of resistor, parallel circuit of capacitor and resistor connected in parallel and measured the impedance of the dummy cell; the result showed that there exist accuracy within 5 % errors and reproduction within 1 % errors compared to output of Hioki LCR tester and HP impedance analyzer as a standard product. These results imply that it is possible to analyze electrode-electrolyte interface impedance quantitatively in biosensor and to implement the more portable high speed impedance analysis system compared to existing systems.