• Title/Summary/Keyword: Parallel data processing

Search Result 751, Processing Time 0.035 seconds

Declustering Method for Moving Object Database (이동체 데이터베이스를 위한 디클러스터링 정책)

  • Seo YoungDuk;Hong EnSuk;Hong BongHee
    • The KIPS Transactions:PartD
    • /
    • v.11D no.7 s.96
    • /
    • pp.1399-1408
    • /
    • 2004
  • Because there are so many spatio-temporal data in Moving Object Databases, a single disk system can not gain the fast response time and tota throughput. So it is needed to take a parallel processing system for the high effectiveness query process. In these existing parallel process-ing system. it does not consider characters of moving object data. Moving object data have to be thought about continuous report to the Moving Object Databases. So it is necessary think about the new Declustering System for the high performance system. In this paper, we propose the new Dechustering Policies of Moving objet data for high effectiveness query processing. At first, consider a spatial part of MBB(Minimum Bounding Box) then take a SD(SemiAllocation Disk) value. Second time, consider a SD value and time value which is node made at together as SDT-Proximity. And for more accuracy Declustering effect, consider a Load Balancing. Evaluation shows performance improvement of aver-age %15\%$ compare with Round-Robin method about $5\%\;and\;10\%$ query area. And performance improvement of average $6\%$ compare with Spatial Proximity method.

A Study on the Data Collection Methods based Hadoop Distributed Environment (하둡 분산 환경 기반의 데이터 수집 기법 연구)

  • Jin, Go-Whan
    • Journal of the Korea Convergence Society
    • /
    • v.7 no.5
    • /
    • pp.1-6
    • /
    • 2016
  • Many studies have been carried out for the development of big data utilization and analysis technology recently. There is a tendency that government agencies and companies to introduce a Hadoop of a processing platform for analyzing big data is increasing gradually. Increased interest with respect to the processing and analysis of these big data collection technology of data has become a major issue in parallel to it. However, study of the collection technology as compared to the study of data analysis techniques, it is insignificant situation. Therefore, in this paper, to build on the Hadoop cluster is a big data analysis platform, through the Apache sqoop, stylized from relational databases, to collect the data. In addition, to provide a sensor through the Apache flume, a system to collect on the basis of the data file of the Web application, the non-structured data such as log files to stream. The collection of data through these convergence would be able to utilize as a basic material of big data analysis.

An Efficient Angular Space Partitioning Based Skyline Query Processing Using Sampling-Based Pruning (데이터 샘플링 기반 프루닝 기법을 도입한 효율적인 각도 기반 공간 분할 병렬 스카이라인 질의 처리 기법)

  • Choi, Woosung;Kim, Minseok;Diana, Gromyko;Chung, Jaehwa;Jung, Soonyong
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.1
    • /
    • pp.1-8
    • /
    • 2017
  • Given a multi-dimensional dataset of tuples, a skyline query returns a subset of tuples which are not 'dominated' by any other tuples. Skyline query is very useful in Big data analysis since it filters out uninteresting items. Much interest was devoted to the MapReduce-based parallel processing of skyline queries in large-scale distributed environment. There are three requirements to improve parallelism in MapReduced-based algorithms: (1) workload should be well balanced (2) avoid redundant computations (3) Optimize network communication cost. In this paper, we introduce MR-SEAP (MapReduce sample Skyline object Equality Angular Partitioning), an efficient angular space partitioning based skyline query processing using sampling-based pruning, which satisfies requirements above. We conduct an extensive experiment to evaluate MR-SEAP.

Big Data-based Sensor Data Processing and Analysis for IoT Environment (IoT 환경을 위한 빅데이터 기반 센서 데이터 처리 및 분석)

  • Shin, Dong-Jin;Park, Ji-Hun;Kim, Ju-Ho;Kwak, Kwang-Jin;Park, Jeong-Min;Kim, Jeong-Joon
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.19 no.1
    • /
    • pp.117-126
    • /
    • 2019
  • The data generated in the IoT environment is very diverse. Especially, the development of the fourth industrial revolution has made it possible to increase the number of fixed and unstructured data generated in manufacturing facilities such as Smart Factory. With Big Data related solutions, it is possible to collect, store, process, analyze and visualize various large volumes of data quickly and accurately. Therefore, in this paper, we will directly generate data using Raspberry Pi used in IoT environment, and analyze using various Big Data solutions. Collected by using an Sqoop solution collected and stored in the database to the HDFS, and the process is to process the data by using the solutions available Hive parallel processing is associated with Hadoop. Finally, the analysis and visualization of the processed data via the R programming will be used universally to end verification.

Artificial Intelligence-based Classification Scheme to improve Time Series Data Accuracy of IoT Sensors (IoT 센서의 시계열 데이터 정확도 향상을 위한 인공지능 기반 분류 기법)

  • Kim, Jin-Young;Sim, Isaac;Yoon, Sung-Hoon
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.21 no.4
    • /
    • pp.57-62
    • /
    • 2021
  • As the parallel computing capability for artificial intelligence improves, the field of artificial intelligence technology is expanding in various industries. In particular, artificial intelligence is being introduced to process data generated from IoT sensors that have enoumous data. However, the limitation exists when applying the AI techniques on IoT network because IoT has time series data, where the importance of data changes over time. In this paper, we propose time-weighted and user-state based artificial intelligence processing techniques to effectively process IoT sensor data. This technique aims to effectively classify IoT sensor data through a data pre-processing process that personalizes time series data and places a weight on the time series data before artificial intelligence learning and use status of personal data. Based on the research, it is possible to propose a method of applying artificial intelligence learning in various fields.

Parallel k-Modes Algorithm for Spark Framework (스파크 프레임워크를 위한 병렬적 k-Modes 알고리즘)

  • Chung, Jaehwa
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.10
    • /
    • pp.487-492
    • /
    • 2017
  • Clustering is a technique which is used to measure similarities between data in big data analysis and data mining field. Among various clustering methods, k-Modes algorithm is representatively used for categorical data. To increase the performance of iterative-centric tasks such as k-Modes, a distributed and concurrent framework Spark has been received great attention recently because it overcomes the limitation of Hadoop. Spark provides an environment that can process large amount of data in main memory using the concept of abstract objects called RDD. Spark provides Mllib, a dedicated library for machine learning, but Mllib only includes k-means that can process only continuous data, so there is a limitation that categorical data processing is impossible. In this paper, we design RDD for k-Modes algorithm for categorical data clustering in spark environment and implement an algorithm that can operate effectively. Experiments show that the proposed algorithm increases linearly in the spark environment.

The Analysis of the GPS Data Processing of the NGII CORS by Bernese and TGO (Bernese와 TGO에 의한 국내 GPS 상시관측소 자료처리 결과 분석)

  • Kim, Ji-Woon;Kwon, Jay-Hyoun;Lee, Ji-Sun
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.26 no.6
    • /
    • pp.549-559
    • /
    • 2008
  • This study verified the limitations of commercial GPS data processing software and the applicability on precise positioning through comparing the processing results between Bernese and TGO under various conditions. To achieve the goal, we selected three nationwide station data and two smaller local data to constitute networks. By using Bernese and TGO, those networks are processed through the baseline analysis and the network adjustment. The comparative analysis was carried out, in terms of software, baseline length and network scale, observation duration, and number of fixed points. In the comparison between softwares, the scientific software was excellent in accuracy. It was confirmed that, as GPS-related technology is developed, the performance of the receiver was enhanced. And, in parallel with this, even the functionalities of the commercial software were tremendously enhanced. The difference, however, in result between the scientific and commercial software are still exist even if it is not big. Therefore, this study confirms that the scientific software should be used when the most precise position is necessary to be computed, especially if baseline vectors are big.

RDP: A storage-tier-aware Robust Data Placement strategy for Hadoop in a Cloud-based Heterogeneous Environment

  • Muhammad Faseeh Qureshi, Nawab;Shin, Dong Ryeol
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.9
    • /
    • pp.4063-4086
    • /
    • 2016
  • Cloud computing is a robust technology, which facilitate to resolve many parallel distributed computing issues in the modern Big Data environment. Hadoop is an ecosystem, which process large data-sets in distributed computing environment. The HDFS is a filesystem of Hadoop, which process data blocks to the cluster nodes. The data block placement has become a bottleneck to overall performance in a Hadoop cluster. The current placement policy assumes that, all Datanodes have equal computing capacity to process data blocks. This computing capacity includes availability of same storage media and same processing performances of a node. As a result, Hadoop cluster performance gets effected with unbalanced workloads, inefficient storage-tier, network traffic congestion and HDFS integrity issues. This paper proposes a storage-tier-aware Robust Data Placement (RDP) scheme, which systematically resolves unbalanced workloads, reduces network congestion to an optimal state, utilizes storage-tier in a useful manner and minimizes the HDFS integrity issues. The experimental results show that the proposed approach reduced unbalanced workload issue to 72%. Moreover, the presented approach resolve storage-tier compatibility problem to 81% by predicting storage for block jobs and improved overall data block placement by 78% through pre-calculated computing capacity allocations and execution of map files over respective Namenode and Datanodes.

Performance Analysis of Implementation on Image Processing Algorithm for Multi-Access Memory System Including 16 Processing Elements (16개의 처리기를 가진 다중접근기억장치를 위한 영상처리 알고리즘의 구현에 대한 성능평가)

  • Lee, You-Jin;Kim, Jea-Hee;Park, Jong-Won
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.49 no.3
    • /
    • pp.8-14
    • /
    • 2012
  • Improving the speed of image processing is in great demand according to spread of high quality visual media or massive image applications such as 3D TV or movies, AR(Augmented reality). SIMD computer attached to a host computer can accelerate various image processing and massive data operations. MAMS is a multi-access memory system which is, along with multiple processing elements(PEs), adequate for establishing a high performance pipelined SIMD machine. MAMS supports simultaneous access to pq data elements within a horizontal, a vertical, or a block subarray with a constant interval in an arbitrary position in an $M{\times}N$ array of data elements, where the number of memory modules(MMs), m, is a prime number greater than pq. MAMS-PP4 is the first realization of the MAMS architecture, which consists of four PEs in a single chip and five MMs. This paper presents implementation of image processing algorithms and performance analysis for MAMS-PP16 which consists of 16 PEs with 17 MMs in an extension or the prior work, MAMS-PP4. The newly designed MAMS-PP16 has a 64 bit instruction format and application specific instruction set. The author develops a simulator of the MAMS-PP16 system, which implemented algorithms can be executed on. Performance analysis has done with this simulator executing implemented algorithms of processing images. The result of performance analysis verifies consistent response of MAMS-PP16 through the pyramid operation in image processing algorithms comparing with a Pentium-based serial processor. Executing the pyramid operation in MAMS-PP16 results in consistent response of processing time while randomly response time in a serial processor.

On Effective Slack Reclamation in Task Scheduling for Energy Reduction

  • Lee, Young-Choon;Zomaya, Albert Y.
    • Journal of Information Processing Systems
    • /
    • v.5 no.4
    • /
    • pp.175-186
    • /
    • 2009
  • Power consumed by modern computer systems, particularly servers in data centers has almost reached an unacceptable level. However, their energy consumption is often not justifiable when their utilization is considered; that is, they tend to consume more energy than needed for their computing related jobs. Task scheduling in distributed computing systems (DCSs) can play a crucial role in increasing utilization; this will lead to the reduction in energy consumption. In this paper, we address the problem of scheduling precedence-constrained parallel applications in DCSs, and present two energy- conscious scheduling algorithms. Our scheduling algorithms adopt dynamic voltage and frequency scaling (DVFS) to minimize energy consumption. DVFS, as an efficient power management technology, has been increasingly integrated into many recent commodity processors. DVFS enables these processors to operate with different voltage supply levels at the expense of sacrificing clock frequencies. In the context of scheduling, this multiple voltage facility implies that there is a trade-off between the quality of schedules and energy consumption. Our algorithms effectively balance these two performance goals using a novel objective function and its variant, which take into account both goals; this claim is verified by the results obtained from our extensive comparative evaluation study.