• Title/Summary/Keyword: Parallel data processing

Search Result 751, Processing Time 0.033 seconds

Image Browse for JPEG Decoder

  • Chong, Ui-Pil
    • Journal of IKEEE
    • /
    • v.2 no.1 s.2
    • /
    • pp.96-100
    • /
    • 1998
  • Due to expected wide spread use of DCT based image/video coding standard, it is advantageous to process data directly in the DCT domain rather than decoding the source back to the spatial domain. The block processing algorithm provides a parallel processing method since multiple input data are processed in the block filter structure. Hence a fast implementation of the algorithm is well suited. In this paper, we propose the JPEG browse by Block Transform Domain Filtering(BTDF) using subband filter banks. Instead of decompressing the entire image to retrieve at full resolution from compressed format, a user can select the level of expansion required$(2^N{\times}2^N)$. Also this approach reduces the computer cpu time by reducing the number of multiplication through BTDF in the filter banks.

  • PDF

Reducing False Sharing based on Memory Reference Patterns in Distributed Shared Memory Systems (분산 공유 메모리 시스템에서 메모리 참조 패턴에 근거한 거짓 공유 감속 기법)

  • Jo, Seong-Je
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.4
    • /
    • pp.1082-1091
    • /
    • 2000
  • In Distributed Shared Memory systems, false sharing occurs when two different data items, not shared but accessed by two different processors, are allocated to a single block and is an important factor in degrading system performance. The paper first analyzes shared memory allocation and reference patterns in parallel applications that allocate memory for shared data objects using a dynamic memory allocator. The shared objects are sequentially allocated and generally show different reference patterns. If the objects with the same size are requested successively as many times as the number of processors, each object is referenced by only a particular processor. If the objects with the same size are requested successively much more than the number of processors, two or more successive objects are referenced by only particular processors. On the basis of these analyses, we propose a memory allocation scheme which allocates each object requested by different processors to different pages and evaluate the existing memory allocation techniques for reducing false sharing faults. Our allocation scheme reduces a considerable amount of false sharing faults for some applications with a little additional memory space.

  • PDF

A study on the implementation simulation and system for 2-D doppler system using second-order sampling (2차 샘플링을 이용한 2-D 도플러 시스템의 시뮬레이션과 시스템구현에 관한 연구)

  • 임춘성;임용곤
    • Journal of Biomedical Engineering Research
    • /
    • v.11 no.1
    • /
    • pp.147-156
    • /
    • 1990
  • A two-dimensional pulsed doppler system for ultrasonic blood velocity doppler signals is studied and implemented. The second-order sampling method and serial data processing procedures are utillized in the sys- tem, which eliminates the untuning problems at phase channels in the quadrature detection method as well as in the channels of parallel data processing. rho digital signal processor used in this system allows a hardware savings and flexible design options. The efficiency of the various mean frequency estimators in the second-order sampling system is examined by computer simulation as a function of the intersequence sample delay time. The temporal delay for the quadrature component is changed from $1/(4f_o){\;}to{\;}3/(4f_o){\;}and{\;}5/(4f_o)$ where to is the center frequency of the transducer, It is found that autocorrelator is the optimum frequency estimator for the second-order sampling: with !he intersequence sample delay of $1/(4f_o){\;}to{\;}3/(4f_o){\;}and{\;}5/(4f_o)$. The qualitative variation and information proportional to blood velocity in the vessel system are obtained in the VIVO experiments.

  • PDF

Design of Maneuvering Target Tracking System Using Data Fusion Capability of Neural Networks (신경망의 자료 융합 능력을 이용한 기동 표적 추적 시스템의 설계)

  • Kim, Haeng-Koo;Jin, Seung-Hee;Yoon, Tae-Sung;Park, Jin-Bae;Joo, Young-Hoon
    • Proceedings of the KIEE Conference
    • /
    • 1998.07b
    • /
    • pp.552-554
    • /
    • 1998
  • In target tracking problems the fixed gain Kalman filter is primarily used to predict a target state vector. This filter, however, has a poor precision for maneuvering targets while it has a good performance for non-maneuvering targets. To overcome the problem this paper proposes the system which estimates the acceleration with neural networks using the input estimation technique. The ability to efficiently fuse information of different forms is one of the major capabilities of trained multi-layer neural networks. The primary motivation for employing neural networks in these applications comes from the efficiency with which more features can be utilized as inputs for estimating target maneuvers. The parallel processing capability of a properly trained neural network can permit fast processing of features to yield correct acceleration estimates. The features used as inputs can be extracted from the combinations of innovation data and heading changes, and for this we set the two dimensional model. The properly trained neural network system outputs the acceleration estimates and compensates for the primary Kalman filter. Finally the proposed system shows the optimum performance.

  • PDF

Development of a Systolic Array Design System(SADS) (시스톨릭 어레이 설계 시스템의 개발)

  • Yu, Gi-Hyeong;Lee, Seong-U;Park, Dong-Gi;Kim, Yun-Ho
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.5
    • /
    • pp.1380-1390
    • /
    • 1997
  • This paper presents a systolic array design method which derives 1 or 2 dimensional optimal planar systolic arrays from a given n dimensional problem represented as a regular recurrence equation and its implementation called a systolic array design system(SADS).The SADS parses a regular recurrence equation and gets the information such as problem space, data dependence vectors. and intial data positions. Systolic arrays are automati-cally derived by the space-time transformation form the information to be abeaired in the parsing phase.The SADS allows us to verify the parallel execution of the derived systolic aooay through the graghical interface.

  • PDF

Implementation and Performance Analysis of Hadoop MapReduce over Lustre Filesystem (러스터 파일 시스템 기반 하둡 맵리듀스 실행 환경 구현 및 성능 분석)

  • Kwak, Jae-Hyuck;Kim, Sangwan;Huh, Taesang;Hwang, Soonwook
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.8
    • /
    • pp.561-566
    • /
    • 2015
  • Hadoop is becoming widely adopted in scientific and commercial areas as an open-source distributed data processing framework. Recently, for real-time processing and analysis of data, an attempt to apply high-performance computing technologies to Hadoop is being made. In this paper, we have expanded the Hadoop Filesystem library to support Lustre, which is a popular high-performance parallel distributed filesystem, and implemented the Hadoop MapReduce execution environment over the Lustre filesystem. We analysed Hadoop MapReduce over Lustre by using Hadoop standard benchmark tools. We found that Hadoop MapReduce over Lustre execution has a performance 2-13 times better than a typical Hadoop MapReduce execution.

A Study on Architecture of Parallel Deblocking Filter for H.264/AVC (H.264/AVC용 병렬 디블록킹 필터의 아키텍처에 관한 연구)

  • Sonh, Seung-Il;Kim, Won-Sam
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.11 no.4
    • /
    • pp.766-772
    • /
    • 2007
  • H.264/AVC is a new international standard for the compression of video images, in which a deblocking filter has been adopted to remove blocking artifacts. This paper proposes an efficient architecture of deblocking filter in H.264/AVC. By making good use of data dependence between neighboring $4{\times}4$ blocks, the memory size is reduced and the throughput of the deblocking filter processing is increased. Compared to the conventional deblocking filters, the proposed architecture enhances the performance of deblocking filter processing from 1.75 to 4.23 times. Hence the proposed architecture is able to perform real-time deblocking of high-resolution($2048{\times}1024$) video applications.

Design and Implementation of MongoDB-based Unstructured Log Processing System over Cloud Computing Environment (클라우드 환경에서 MongoDB 기반의 비정형 로그 처리 시스템 설계 및 구현)

  • Kim, Myoungjin;Han, Seungho;Cui, Yun;Lee, Hanku
    • Journal of Internet Computing and Services
    • /
    • v.14 no.6
    • /
    • pp.71-84
    • /
    • 2013
  • Log data, which record the multitude of information created when operating computer systems, are utilized in many processes, from carrying out computer system inspection and process optimization to providing customized user optimization. In this paper, we propose a MongoDB-based unstructured log processing system in a cloud environment for processing the massive amount of log data of banks. Most of the log data generated during banking operations come from handling a client's business. Therefore, in order to gather, store, categorize, and analyze the log data generated while processing the client's business, a separate log data processing system needs to be established. However, the realization of flexible storage expansion functions for processing a massive amount of unstructured log data and executing a considerable number of functions to categorize and analyze the stored unstructured log data is difficult in existing computer environments. Thus, in this study, we use cloud computing technology to realize a cloud-based log data processing system for processing unstructured log data that are difficult to process using the existing computing infrastructure's analysis tools and management system. The proposed system uses the IaaS (Infrastructure as a Service) cloud environment to provide a flexible expansion of computing resources and includes the ability to flexibly expand resources such as storage space and memory under conditions such as extended storage or rapid increase in log data. Moreover, to overcome the processing limits of the existing analysis tool when a real-time analysis of the aggregated unstructured log data is required, the proposed system includes a Hadoop-based analysis module for quick and reliable parallel-distributed processing of the massive amount of log data. Furthermore, because the HDFS (Hadoop Distributed File System) stores data by generating copies of the block units of the aggregated log data, the proposed system offers automatic restore functions for the system to continually operate after it recovers from a malfunction. Finally, by establishing a distributed database using the NoSQL-based Mongo DB, the proposed system provides methods of effectively processing unstructured log data. Relational databases such as the MySQL databases have complex schemas that are inappropriate for processing unstructured log data. Further, strict schemas like those of relational databases cannot expand nodes in the case wherein the stored data are distributed to various nodes when the amount of data rapidly increases. NoSQL does not provide the complex computations that relational databases may provide but can easily expand the database through node dispersion when the amount of data increases rapidly; it is a non-relational database with an appropriate structure for processing unstructured data. The data models of the NoSQL are usually classified as Key-Value, column-oriented, and document-oriented types. Of these, the representative document-oriented data model, MongoDB, which has a free schema structure, is used in the proposed system. MongoDB is introduced to the proposed system because it makes it easy to process unstructured log data through a flexible schema structure, facilitates flexible node expansion when the amount of data is rapidly increasing, and provides an Auto-Sharding function that automatically expands storage. The proposed system is composed of a log collector module, a log graph generator module, a MongoDB module, a Hadoop-based analysis module, and a MySQL module. When the log data generated over the entire client business process of each bank are sent to the cloud server, the log collector module collects and classifies data according to the type of log data and distributes it to the MongoDB module and the MySQL module. The log graph generator module generates the results of the log analysis of the MongoDB module, Hadoop-based analysis module, and the MySQL module per analysis time and type of the aggregated log data, and provides them to the user through a web interface. Log data that require a real-time log data analysis are stored in the MySQL module and provided real-time by the log graph generator module. The aggregated log data per unit time are stored in the MongoDB module and plotted in a graph according to the user's various analysis conditions. The aggregated log data in the MongoDB module are parallel-distributed and processed by the Hadoop-based analysis module. A comparative evaluation is carried out against a log data processing system that uses only MySQL for inserting log data and estimating query performance; this evaluation proves the proposed system's superiority. Moreover, an optimal chunk size is confirmed through the log data insert performance evaluation of MongoDB for various chunk sizes.

A Study on Improved Image Matching Method using the CUDA Computing (CUDA 연산을 이용한 개선된 영상 매칭 방법에 관한 연구)

  • Cho, Kyeongrae;Park, Byungjoon;Yoon, Taebok
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.16 no.4
    • /
    • pp.2749-2756
    • /
    • 2015
  • Recently, Depending on the quality of data increases, the problem of time-consuming to process the image is raised by being required to accelerate the image processing algorithms, in a traditional CPU and CUDA(Compute Unified Device Architecture) based recognition system for computing speed and performance gains compared to OpenMP When character recognition has been learned by the system to measure the input by the character data matching is implemented in an environment that recognizes the region of the well, so that the font of the characters image learning English alphabet are each constant and standardized in size and character an image matching method for calculating the matching has also been implemented. GPGPU (General Purpose GPU) programming platform technology when using the CUDA computing techniques to recognize and use the four cores of Intel i5 2500 with OpenMP to deal quickly and efficiently an algorithm, than the performance of existing CPU does not produce the rate of four times due to the delay of the data of the partition and merge operation proposed a method of improving the rate of speed of about 3.2 times, and the parallel processing of the video card that processes a result, the sequential operation of the process compared to CPU-based who performed the performance gain is about 21 tiems improvement in was confirmed.

A Fast Processor Architecture and 2-D Data Scheduling Method to Implement the Lifting Scheme 2-D Discrete Wavelet Transform (리프팅 스킴의 2차원 이산 웨이브릿 변환 하드웨어 구현을 위한 고속 프로세서 구조 및 2차원 데이터 스케줄링 방법)

  • Kim Jong Woog;Chong Jong Wha
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.42 no.4 s.334
    • /
    • pp.19-28
    • /
    • 2005
  • In this paper, we proposed a parallel fast 2-D discrete wavelet transform hardware architecture based on lifting scheme. The proposed architecture improved the 2-D processing speed, and reduced internal memory buffer size. The previous lifting scheme based parallel 2-D wavelet transform architectures were consisted with row direction and column direction modules, which were pair of prediction and update filter module. In 2-D wavelet transform, column direction processing used the row direction results, which were not generated in column direction order but in row direction order, so most hardware architecture need internal buffer memory. The proposed architecture focused on the reducing of the internal memory buffer size and the total calculation time. Reducing the total calculation time, we proposed a 4-way data flow scheduling and memory based parallel hardware architecture. The 4-way data flow scheduling can increase the row direction parallel performance, and reduced the initial latency of starting of the row direction calculation. In this hardware architecture, the internal buffer memory didn't used to store the results of the row direction calculation, while it contained intermediate values of column direction calculation. This method is very effective in column direction processing, because the input data of column direction were not generated in column direction order The proposed architecture was implemented with VHDL and Altera Stratix device. The implementation results showed overall calculation time reduced from $N^2/2+\alpha$ to $N^2/4+\beta$, and internal buffer memory size reduced by around $50\%$ of previous works.