• Title/Summary/Keyword: Parallel data processing

Search Result 751, Processing Time 0.032 seconds

Design of a scalable general-purpose parallel associative processor using content-addressable memory (Content-Addressable Memory를 이용한 확장 가능한 범용 병렬 Associative Processor 설계)

  • Park, Tae-Geun
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.43 no.2 s.344
    • /
    • pp.51-59
    • /
    • 2006
  • Von Neumann architecture suffers from the interface between the central processing unit and the memory, which is called 'Von Neumann bottleneck' In this paper, we propose a scalable general-purpose associative processor (AP) based on content-addressable memory (CAM) which solves this problem and is suitable for the search-oriented applications. We propose an efficient instruction set and a structural scalability to extend for larger applications. We define twelve instructions and provide some reduced instructions to speed up which execute two instructions in a single instruction cycle. The proposed AP performs in a bit-serial, word-parallel fashion and can be considered as a 32-bit general-purpose parallel processor with a massively parallel SIMD structure. We design and simulate a maximum/minumum search greater-than/less-than search, and parallel addition to verify the proposed architecture. The algorithms are executed in a constant time O(k) regardless of the number of input data.

Three-Parallel Reed-Solomon based Forward Error Correction Architecture for 100Gb/s Optical Communications (100Gb/s급 광통신시스템을 위한 3-병렬 Reed-Solomon 기반 FEC 구조 설계)

  • Choi, Chang-Seok;Lee, Han-Ho
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.46 no.11
    • /
    • pp.48-55
    • /
    • 2009
  • This paper presents a high-speed Forward Error Correction (FEC) architecture based on three-parallel Reed-Solomon (RS) decoder for next-generation 100-Gb/s optical communication systems. A high-speed three-parallel RS(255,239) decoder has been designed and the derived structure can also be applied to implement the 100-Gb/s RS-FEC architecture. The proposed 100-Gb/s RS-FEC has been implemented with 0.13-${\mu}m$ CMOS standard cell technology in a supply voltage of 1.2V. The implementation results show that 16-Ch. RS-FEC architecture can operate at a clock frequency of 300MHz and has a throughput of 115-Gb/s for 0.13-${\mu}m$ CMOS technology. As a result, the proposed three-parallel RS-FEC architecture has a much higher data processing rate and low hardware complexity compared with the conventional two-parallel, three-parallel and serial RS-FEC architectures.

A Study on Adaptive Parallel Computability in Many-Task Computing on Hadoop Framework (하둡 기반 대규모 작업처리 프레임워크에서의 Adaptive Parallel Computability 기술 연구)

  • Jik-Soo, Kim
    • Journal of Broadcast Engineering
    • /
    • v.24 no.6
    • /
    • pp.1122-1133
    • /
    • 2019
  • We have designed and implemented a new data processing framework called MOHA(Mtc On HAdoop) which can effectively support Many-Task Computing(MTC) applications in a YARN-based Hadoop platform. MTC applications can be composed of a very large number of computational tasks ranging from hundreds of thousands to millions of tasks, and each MTC application may have different resource usage patterns. Therefore, we have implemented MOHA-TaskExecutor(a pilot-job that executes real MTC application tasks)'s Adaptive Parallel Computability which can adaptively execute multiple tasks simultaneously, in order to improve the parallel computability of a YARN container and the overall system throughput. We have implemented multi-threaded version of TaskExecutor which can "independently and dynamically" adjust the number of concurrently running tasks, and in order to find the optimal number of concurrent tasks, we have employed Hill-Climbing algorithm.

Prediction-Based Parallel Gate-Level Timing Simulation Using Spatially Partial Simulation Strategy (공간적 부분시뮬레이션 전략이 적용된 예측기반 병렬 게이트수준 타이밍 시뮬레이션)

  • Han, Jaehoon;Yang, Seiyang
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.8 no.3
    • /
    • pp.57-64
    • /
    • 2019
  • In this paper, an efficient prediction-based parallel simulation method using spatially partial simulation strategy is proposed for improving both the performance of the event-driven gate-level timing simulation and the debugging efficiency. The proposed method quickly generates the prediction data on-the-fly, but still accurately for the input values and output values of parallel event-driven local simulations by applying the strategy to the simulation at the higher abstraction level. For those six designs which had used for the performance evaluation of the proposed strategy, our method had shown about 3.7x improvement over the most general sequential event-driven gate-level timing simulation, 9.7x improvement over the commercial multi-core based parallel event-driven gate-level timing simulation, and 2.7x improvement over the best of previous prediction-based parallel simulation results, on average.

A Massively Parallel Algorithm for Fuzzy Vector Quantization (퍼지 벡터 양자화를 위한 대규모 병렬 알고리즘)

  • Huynh, Luong Van;Kim, Cheol-Hong;Kim, Jong-Myon
    • The KIPS Transactions:PartA
    • /
    • v.16A no.6
    • /
    • pp.411-418
    • /
    • 2009
  • Vector quantization algorithm based on fuzzy clustering has been widely used in the field of data compression since the use of fuzzy clustering analysis in the early stages of a vector quantization process can make this process less sensitive to its initialization. However, the process of fuzzy clustering is computationally very intensive because of its complex framework for the quantitative formulation of the uncertainty involved in the training vector space. To overcome the computational burden of the process, this paper introduces an array architecture for the implementation of fuzzy vector quantization (FVQ). The arrayarchitecture, which consists of 4,096 processing elements (PEs), provides a computationally efficient solution by employing an effective vector assignment strategy during the clustering process. Experimental results indicatethat the proposed parallel implementation providessignificantly greater performance and efficiency than appropriately scaled alternative array systems. In addition, the proposed parallel implementation provides 1000x greater performance and 100x higher energy efficiency than other implementations using today's ARMand TI DSP processors in the same 130nm technology. These results demonstrate that the proposed parallel implementation shows the potential for improved performance and energy efficiency.

Research on Big Data Integration Method

  • Kim, Jee-Hyun;Cho, Young-Im
    • Journal of the Korea Society of Computer and Information
    • /
    • v.22 no.1
    • /
    • pp.49-56
    • /
    • 2017
  • In this paper we propose the approach for big data integration so as to analyze, visualize and predict the future of the trend of the market, and that is to get the integration data model using the R language which is the future of the statistics and the Hadoop which is a parallel processing for the data. As four approaching methods using R and Hadoop, ff package in R, R and Streaming as Hadoop utility, and Rhipe and RHadoop as R and Hadoop interface packages are used, and the strength and weakness of four methods are described and analyzed, so Rhipe and RHadoop are proposed as a complete set of data integration model. The integration of R, which is popular for processing statistical algorithm and Hadoop contains Distributed File System and resource management platform and can implement the MapReduce programming model gives us a new environment where in R code can be written and deployed in Hadoop without any data movement. This model allows us to predictive analysis with high performance and deep understand over the big data.

Optimized Hardware Design using Sobel and Median Filters for Lane Detection

  • Lee, Chang-Yong;Kim, Young-Hyung;Lee, Yong-Hwan
    • Journal of Advanced Information Technology and Convergence
    • /
    • v.9 no.1
    • /
    • pp.115-125
    • /
    • 2019
  • In this paper, the image is received from the camera and the lane is sensed. There are various ways to detect lanes. Generally, the method of detecting edges uses a lot of the Sobel edge detection and the Canny edge detection. The minimum use of multiplication and division is used when designing for the hardware configuration. The images are tested using a black box image mounted on the vehicle. Because the top of the image of the used the black box is mostly background, the calculation process is excluded. Also, to speed up, YCbCr is calculated from the image and only the data for the desired color, white and yellow lane, is obtained to detect the lane. The median filter is used to remove noise from images. Intermediate filters excel at noise rejection, but they generally take a long time to compare all values. In this paper, by using addition, the time can be shortened by obtaining and using the result value of the median filter. In case of the Sobel edge detection, the speed is faster and noise sensitive compared to the Canny edge detection. These shortcomings are constructed using complementary algorithms. It also organizes and processes data into parallel processing pipelines. To reduce the size of memory, the system does not use memory to store all data at each step, but stores it using four line buffers. Three line buffers perform mask operations, and one line buffer stores new data at the same time as the operation. Through this work, memory can use six times faster the processing speed and about 33% greater quantity than other methods presented in this paper. The target operating frequency is designed so that the system operates at 50MHz. It is possible to use 2157fps for the images of 640by360 size based on the target operating frequency, 540fps for the HD images and 240fps for the Full HD images, which can be used for most images with 30fps as well as 60fps for the images with 60fps. The maximum operating frequency can be used for larger amounts of the frame processing.

Design and Implementation of an Efficient Web Services Data Processing Using Hadoop-Based Big Data Processing Technique (하둡 기반 빅 데이터 기법을 이용한 웹 서비스 데이터 처리 설계 및 구현)

  • Kim, Hyun-Joo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.16 no.1
    • /
    • pp.726-734
    • /
    • 2015
  • Relational databases used by structuralizing data are the most widely used in data management at present. However, in relational databases, service becomes slower as the amount of data increases because of constraints in the reading and writing operations to save or query data. Furthermore, when a new task is added, the database grows and, consequently, requires additional infrastructure, such as parallel configuration of hardware, CPU, memory, and network, to support smooth operation. In this paper, in order to improve the web information services that are slowing down due to increase of data in the relational databases, we implemented a model to extract a large amount of data quickly and safely for users by processing Hadoop Distributed File System (HDFS) files after sending data to HDFSs and unifying and reconstructing the data. We implemented our model in a Web-based civil affairs system that stores image files, which is irregular data processing. Our proposed system's data processing was found to be 0.4 sec faster than that of a relational database system. Thus, we found that it is possible to support Web information services with a Hadoop-based big data processing technique in order to process a large amount of data, as in conventional relational databases. Furthermore, since Hadoop is open source, our model has the advantage of reducing software costs. The proposed system is expected to be used as a model for Web services that provide fast information processing for organizations that require efficient processing of big data because of the increase in the size of conventional relational databases.

A Parallel Task Oriented Memory Manager for Dynamic Objects (동적 객체에 대한 병렬 타스크 중심의 메모리 관리기)

  • Kim, Eun-Jeong;Bae, Jong-Min
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.5
    • /
    • pp.1391-1400
    • /
    • 1997
  • 공유 메모리 다중 프로세서상에서 많은 동적 객체를 생성하는 언어가 실행될 때, 동적 객체에 대한 메모리 관리 알고리즘은 프로그램의 실행 속도에 큰 영향을 미친다. 본 논문에서는 이러한 환경에서 프로그램의 성능을 향상 시킬 수 있는 새로운 메모리 관리 알고리즘을 제안한다. 이를 위해 힘 영역의 할당 및 회수 작업을 병렬 타스크 중심으로 행한다. 또한 동적 객체를 병렬 타스크사이에 공유 되는 객체(shared data) 와 비공유 객체(mon-shared data)로 구분하고, 힘 영역을 공동 영역과 전용 영역으로 분리 한다. 이는 병렬 타스크가 동적으로 스케줄링되는 것을 자유롭게 하고 창조 지역성 을 높이는 효과가 있으며, 전용 영역에 대한 메모리 재사용으로 인하여 볼용 셀수집기의 수행 횟수를 줄일 수 있다.

  • PDF

A New Architecture of Call Processor Based On Data flow System (데이타 흐름 시스템을 이용한 호처리 프로세서의 구조)

  • Lim, In-Taek;Lee, Sung-Gyu;Han, Young-Chul
    • Proceedings of the KIEE Conference
    • /
    • 1987.07b
    • /
    • pp.965-968
    • /
    • 1987
  • Conventional major electronic switching systems based on stored program control employ a Von Neumann styled control processor. It has strict limitations such that it essentially lacks concurrency in executing instructions, which have brought the software bottleneck problem, and the capabilities of call processing are restricted by expanding system's capacity. In this paper, a new architecture of call control processor based on the data flow system is proposed, aiming at fundamental resolution for these limitations. The processor has a number of advantages in such as expansibility of system's capacity, parallel processing of calls, and so on.

  • PDF