• Title/Summary/Keyword: Distributed data collection

Search Result 237, Processing Time 0.027 seconds

Design of Advanced Collection Manager Service for Grid-IR System Based on OGSA-DAI component (그리드 정보검색 시스템을 위한 OGSA-DAI 기반 확장된 Collection Manager 서비스 설계)

  • Kim, Hyukho;Kim, Yangwoo
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2009.04a
    • /
    • pp.846-848
    • /
    • 2009
  • The interest in the access and integration of distributed massive data resources has increased recently. This paper presents the Advanced Collection Manager(CM) service with OGSA-DAI component which can access and integrate the distributed data resources. The Advanced CM service supports the data resource of various types. And it can provide the query, updating, transforming and delivering data via cooperating with other services in Grid Information Retrieval(Grid-IR or GIR) System. As a result, it can access and manage the data resource more flexible and efficient.

Distributed System Architecture Modeling of a Performance Monitoring and Reporting Tool (분산 시스템의 성능 모니터링과 레포팅 툴의 아키텍처 모델링)

  • Kim, Ki;Choi, Eun-Mi
    • Journal of the Korea Society for Simulation
    • /
    • v.12 no.3
    • /
    • pp.69-81
    • /
    • 2003
  • To manage a cluster of distributed server systems, a number of management aspects should be considered in terms of configuration management, fault management, performance management, and user management. System performance monitoring and reporting take an important role for performance and fault management. In this paper, we present distributed system architecture modeling of a performance monitoring and reporting tool. Modeling architecture of four subsystems are introduced: node agent, data collection, performance management & report, and DB schema. The performance-related information collected from distributed servers are categorized into performance counters, event data for system status changes, service quality, and system configuration data. In order to analyze those performance information, we use a number of ways to evaluate data corelation. By using some results from a real site of a company and from simulation of artificial workload, we show the example of performance collection and analysis. Since our report tool detects system fault or node component failure and analyzes performances through resource usage and service quality, we are able to provide information for server load balancing, in short term view, and the cause of system faults and decision for system scale-out and scale-up, in long term view.

  • PDF

Improvement of IoT sensor data loss rate of wireless network-based smart factory management system

  • Tae-Hyung Kim;Young-Gon, Kim
    • International journal of advanced smart convergence
    • /
    • v.12 no.2
    • /
    • pp.173-181
    • /
    • 2023
  • Data collection is an essential element in the construction and operation of a smart factory. The quality of data collection is greatly influenced by network conditions, and existing wireless network systems for IoT inevitably lose data due to wireless signal strength. This data loss has contributed to increased system instability due to misinformation based on incorrect data. In this study, I designed a distributed MQTT IoT smart sensor and gateway structure that supports wireless multicasting for smooth sensor data collection. Through this, it was possible to derive significant results in the service latency and data loss rate of packets even in a wireless environment, unlike the MQTT QoS-based system. Therefore, through this study, it will be possible to implement a data collection management system optimized for the domestic smart factory manufacturing environment that can prevent data loss and delay due to abnormal data generation and minimize the input of management personnel.

Development of Big-data Management Platform Considering Docker Based Real Time Data Connecting and Processing Environments (도커 기반의 실시간 데이터 연계 및 처리 환경을 고려한 빅데이터 관리 플랫폼 개발)

  • Kim, Dong Gil;Park, Yong-Soon;Chung, Tae-Yun
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.16 no.4
    • /
    • pp.153-161
    • /
    • 2021
  • Real-time access is required to handle continuous and unstructured data and should be flexible in management under dynamic state. Platform can be built to allow data collection, storage, and processing from local-server or multi-server. Although the former centralize method is easy to control, it creates an overload problem because it proceeds all the processing in one unit, and the latter distributed method performs parallel processing, so it is fast to respond and can easily scale system capacity, but the design is complex. This paper provides data collection and processing on one platform to derive significant insights from various data held by an enterprise or agency in the latter manner, which is intuitively available on dashboards and utilizes Spark to improve distributed processing performance. All service utilize dockers to distribute and management. The data used in this study was 100% collected from Kafka, showing that when the file size is 4.4 gigabytes, the data processing speed in spark cluster mode is 2 minute 15 seconds, about 3 minutes 19 seconds faster than the local mode.

Implementation of Efficient Distributed Crawler through Stepwise Crawling Node Allocation

  • Kim, Hyuntae;Byun, Junhyung;Na, Yoseph;Jung, Yuchul
    • Journal of Advanced Information Technology and Convergence
    • /
    • v.10 no.2
    • /
    • pp.15-31
    • /
    • 2020
  • Various websites have been created due to the increased use of the Internet, and the number of documents distributed through these websites has increased proportionally. However, it is not easy to collect newly updated documents rapidly. Web crawling methods have been used to continuously collect and manage new documents, whereas existing crawling systems applying a single node demonstrate limited performances. Furthermore, crawlers applying distribution methods exhibit a problem related to effective node management for crawling. This study proposes an efficient distributed crawler through stepwise crawling node allocation, which identifies websites' properties and establishes crawling policies based on the properties identified to collect a large number of documents from multiple websites. The proposed crawler can calculate the number of documents included in a website, compare data collection time and the amount of data collected based on the number of nodes allocated to a specific website by repeatedly visiting the website, and automatically allocate the optimal number of nodes to each website for crawling. An experiment is conducted where the proposed and single-node methods are applied to 12 different websites; the experimental result indicates that the proposed crawler's data collection time decreased significantly compared with that of a single node crawler. This result is obtained because the proposed crawler applied data collection policies according to websites. Besides, it is confirmed that the work rate of the proposed model increased.

Implementation of AIoT Edge Cluster System via Distributed Deep Learning Pipeline

  • Jeon, Sung-Ho;Lee, Cheol-Gyu;Lee, Jae-Deok;Kim, Bo-Seok;Kim, Joo-Man
    • International journal of advanced smart convergence
    • /
    • v.10 no.4
    • /
    • pp.278-288
    • /
    • 2021
  • Recently, IoT systems are cloud-based, so that continuous and large amounts of data collected from sensor nodes are processed in the data server through the cloud. However, in the centralized configuration of large-scale cloud computing, computational processing must be performed at a physical location where data collection and processing take place, and the need for edge computers to reduce the network load of the cloud system is gradually expanding. In this paper, a cluster system consisting of 6 inexpensive Raspberry Pi boards was constructed to perform fast data processing. And we propose "Kubernetes cluster system(KCS)" for processing large data collection and analysis by model distribution and data pipeline method. To compare the performance of this study, an ensemble model of deep learning was built, and the accuracy, processing performance, and processing time through the proposed KCS system and model distribution were compared and analyzed. As a result, the ensemble model was excellent in accuracy, but the KCS implemented as a data pipeline proved to be superior in processing speed..

Segmentation and Classification of Lidar data

  • Tseng, Yi-Hsing;Wang, Miao
    • Proceedings of the KSRS Conference
    • /
    • 2003.11a
    • /
    • pp.153-155
    • /
    • 2003
  • Laser scanning has become a viable technique for the collection of a large amount of accurate 3D point data densely distributed on the scanned object surface. The inherent 3D nature of the sub-randomly distributed point cloud provides abundant spatial information. To explore valuable spatial information from laser scanned data becomes an active research topic, for instance extracting digital elevation model, building models, and vegetation volumes. The sub-randomly distributed point cloud should be segmented and classified before the extraction of spatial information. This paper investigates some exist segmentation methods, and then proposes an octree-based split-and-merge segmentation method to divide lidar data into clusters belonging to 3D planes. Therefore, the classification of lidar data can be performed based on the derived attributes of extracted 3D planes. The test results of both ground and airborne lidar data show the potential of applying this method to extract spatial features from lidar data.

  • PDF

Collection Fusion Algorithm in Distributed Multimedia Databases (분산 멀티미디어 데이터베이스에 대한 수집 융합 알고리즘)

  • Kim, Deok-Hwan;Lee, Ju-Hong;Lee, Seok-Lyong;Chung, Chin-Wan
    • Journal of KIISE:Databases
    • /
    • v.28 no.3
    • /
    • pp.406-417
    • /
    • 2001
  • With the advances in multimedia databases on the World Wide Web, it becomes more important to provide users with the search capability of distributed multimedia data. While there have been many studies about the database selection and the collection fusion for text databases. The multimedia databases on the Web have autonomous and heterogeneous properties and they use mainly the content based retrieval. The collection fusion problem of multimedia databases is concerned with the merging of results retrieved by content based retrieval from heterogeneous multimedia databases on the Web. This problem is crucial for the search in distributed multimedia databases, however, it has not been studied yet. This paper provides novel algorithms for processing the collection fusion of heterogeneous multimedia databases on the Web. We propose two heuristic algorithms for estimating the number of objects to be retrieved from local databases and an algorithm using the linear regression. Extensive experiments show the effectiveness and efficiency of these algorithms. These algorithms can provide the basis for the distributed content based retrieval algorithms for multimedia databases on the Web.

  • PDF

A Secure Healthcare System Using Holochain in a Distributed Environment

  • Jong-Sub Lee;Seok-Jae Moon
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.15 no.4
    • /
    • pp.261-269
    • /
    • 2023
  • We propose to design a Holochain-based security and privacy protection system for resource-constrained IoT healthcare systems. Through analysis and performance evaluation, the proposed system confirmed that these characteristics operate effectively in the IoT healthcare environment. The system proposed in this paper consists of four main layers aimed at secure collection, transmission, storage, and processing of important medical data in IoT healthcare environments. The first PERCEPTION layer consists of various IoT devices, such as wearable devices, sensors, and other medical devices. These devices collect patient health data and pass it on to the network layer. The second network connectivity layer assigns an IP address to the collected data and ensures that the data is transmitted reliably over the network. Transmission takes place via standardized protocols, which ensures data reliability and availability. The third distributed cloud layer is a distributed data storage based on Holochain that stores important medical information collected from resource-limited IoT devices. This layer manages data integrity and access control, and allows users to share data securely. Finally, the fourth application layer provides useful information and services to end users, patients and healthcare professionals. The structuring and presentation of data and interaction between applications are managed at this layer. This structure aims to provide security, privacy, and resource efficiency suitable for IoT healthcare systems, in contrast to traditional centralized or blockchain-based systems. We design and propose a Holochain-based security and privacy protection system through a better IoT healthcare system.

A Study on the Big Data Analysis System for Searching of the Flooded Road Areas (도로 침수영역의 탐색을 위한 빅데이터 분석 시스템 연구)

  • Song, Youngmi;Kim, Chang Soo
    • Journal of Korea Multimedia Society
    • /
    • v.18 no.8
    • /
    • pp.925-934
    • /
    • 2015
  • The frequency of natural disasters because of global warming is gradually increasing, risks of flooding due to typhoon and torrential rain have also increased. Among these causes, the roads are flooded by suddenly torrential rain, and then vehicle and personal injury are happening. In this respect, because of the possibility that immersion of a road may occur in a second, it is necessary to study the rapid data collection and quick response system. Our research proposes a big data analysis system based on the collected information and a variety of system information collection methods for searching flooded road areas by torrential rains. The data related flooded roads are utilized the SNS data, meteorological data and the road link data, etc. And the big data analysis system is implemented the distributed processing system based on the Hadoop platform.