• Title/Summary/Keyword: Distributed Data

Search Result 6,037, Processing Time 0.029 seconds

Rhipe Platform for Big Data Processing and Analysis (빅데이터 처리 및 분석을 위한 Rhipe 플랫폼)

  • Jung, Byung Ho;Shin, Ji Eun;Lim, Dong Hoon
    • The Korean Journal of Applied Statistics
    • /
    • v.27 no.7
    • /
    • pp.1171-1185
    • /
    • 2014
  • Rhipe that integrates R and Hadoop environment, made it possible to process and analyze massive amounts of data using a distributed processing environment. In this paper, we implemented multiple regression analysis using Rhipe with various data sizes of actual data and simulated data. Experimental results for comparing the computing speeds of pseudo-distributed and fully-distributed modes for configuring Hadoop cluster, showed fully-distributed mode was more fast than pseudo-distributed mode and computing speeds of fully-distributed mode were faster as the number of data nodes increases. We also compared the performance of our Rhipe with stats and biglm packages available on bigmemory. The results showed that our Rhipe was more fast than other packages owing to paralleling processing with increasing the number of map tasks as the size of data increases.

A Study on the Design and Implementation of the Lightweight Object Model Supporting Distributed Trader (분산 트레이더를 지원하는 경량 (lightweight) 객체 모델 설계 및 구현 방안 연구)

  • Jin, Myeong-Suk;Song, Byeong-Gwon
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.4
    • /
    • pp.1050-1061
    • /
    • 2000
  • This paper presents a new object model, LOM(Lightweight Object Model) and an implementation method for the distributed trader in heterogeneous distributed computing environment including mobile network. Trader is third party object that enables clients to find suitable servers, which provide the most appropriate services to client in distributed environment including dynamic reconfiguration of services and servers. Trading service requires simpler and more specific object model than genetic object models which provide richer multimedia data types and semantic characteristics with complex data structures. LOM supports a new reference attribute type instead of the relationship, inheritance and composite attribute types of the general object oriented models and so LOM has simple data structures. Also in LOM, the modelling step includes specifying of the information about users and the access right to objects for security in the mobile environment and development of the distributed storage for trading service. Also, we propose and implementation method of the distributed trader, which integrates the LOM-information object model and the OMG (object Management Group) computational object model.

  • PDF

Distributed Incremental Approximate Frequent Itemset Mining Using MapReduce

  • Mohsin Shaikh;Irfan Ali Tunio;Syed Muhammad Shehram Shah;Fareesa Khan Sohu;Abdul Aziz;Ahmad Ali
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.5
    • /
    • pp.207-211
    • /
    • 2023
  • Traditional methods for datamining typically assume that the data is small, centralized, memory resident and static. But this assumption is no longer acceptable, because datasets are growing very fast hence becoming huge from time to time. There is fast growing need to manage data with efficient mining algorithms. In such a scenario it is inevitable to carry out data mining in a distributed environment and Frequent Itemset Mining (FIM) is no exception. Thus, the need of an efficient incremental mining algorithm arises. We propose the Distributed Incremental Approximate Frequent Itemset Mining (DIAFIM) which is an incremental FIM algorithm and works on the distributed parallel MapReduce environment. The key contribution of this research is devising an incremental mining algorithm that works on the distributed parallel MapReduce environment.

Design of GlusterFS Based Big Data Distributed Processing System in Smart Factory (스마트 팩토리 환경에서의 GlusterFS 기반 빅데이터 분산 처리 시스템 설계)

  • Lee, Hyeop-Geon;Kim, Young-Woon;Kim, Ki-Young;Choi, Jong-Seok
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.11 no.1
    • /
    • pp.70-75
    • /
    • 2018
  • Smart Factory is an intelligent factory that can enhance productivity, quality, customer satisfaction, etc. by applying information and communications technology to the entire production process including design & development, manufacture, and distribution & logistics. The precise amount of data generated in a smart factory varies depending on the factory's size and state of facilities. Regardless, it would be difficult to apply traditional production management systems to a smart factory environment, as it generates vast amounts of data. For this reason, the need for a distributed big-data processing system has risen, which can process a large amount of data. Therefore, this article has designed a Gluster File System (GlusterFS)-based distributed big-data processing system that can be used in a smart factory environment. Compared to existing distributed processing systems, the proposed distributed big-data processing system reduces the system load and the risk of data loss through the distribution and management of network traffic.

A Differential Data Replicator in Distributed Environments

  • Lee, Wookey;Park, Jooseok;Sukho Kang
    • The Journal of Information Technology and Database
    • /
    • v.3 no.2
    • /
    • pp.3-24
    • /
    • 1996
  • In this paper a data replicator scheme with a distributed join architecture is suggested with its cost functions and the performance results. The contribution of this scheme is not only minimizing the number of base relation locks in distributed database tables but also reducing the remote transmission amount remarkably, which will be able to embellish the distributed databse system practical. The differential files that are derived from the active log of the DBMS are mainly forcing the scheme to reduce the number of base relation locks. The amount of transportation between relevant sites could be curtailed by the tuple reduction procedures. Then we prescribe an algorithm of data replicator with its cost function and show the performance results compared with the semi-join scheme in their distributed environments.

  • PDF

Design and Implementation of a Distributed Data Mining Framework (분산된 데이터마이닝을 위한 프레임워크의 설계 및 구현)

  • Kadel, Prakash;Choi, Ho-Jin
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2007.06c
    • /
    • pp.336-340
    • /
    • 2007
  • We envisage that grid computing environments allow us to implement distributed data mining services, that is, those applications which analyze large sets of geographically distributed databases and information using the computational power and resources of a grid environment. This paper describes an experimental framework towards such a distributed data mining approach, including design considerations and a prototype implementation. Based on the "Knowledge Grid" architecture suggested by Cannataro et al., we identify four major components - user node, broker node, data node, and computation node - and define their individual roles. For implementing the prototype, we have investigated methods for utilizing distributed resources within a grid computing environment, e.g., communication and coordination among the various resources available.

  • PDF

Spatial Operation Allocation Scheme over Common Query Regions for Distributed Spatial Data Stream Processing (분산 공간 데이터 스트림 처리에서 질의 영역의 겹침을 고려한 공간 연산 배치 기법)

  • Chung, Weon-Il
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.13 no.6
    • /
    • pp.2713-2719
    • /
    • 2012
  • According to increasing of various location-based services, distributed data stream processing techniques have been widely studied to provide high scalability and availability. In previous researches, in order to balance the load of distributed nodes, the geographic characteristics of spatial data stream are not considered. For this reason, distributed operations for adjacent spatial regions increases the overall system load. We propose a operation allocation scheme considering the characteristics of spatial operations to effectively processing spatial data stream in distributed computing environments. The proposed method presents the efficient share maximizing approach that preferentially distributes spatial operations sharing the common query regions to the same node in order to separate the adjacent spatial operations on overlapped regions.

A DDMPF(Distributed Data Management Protocol using FAT) Design of Self-organized Storage for Negotiation among a Client and Servers based on Clouding (클라우딩 기반에서 클라이언트와 서버간 협상을 위한 자가 조직 저장매체의 DDMPF(Distributed Data Management Protocol using FAT) 설계)

  • Lee, Byung-Kwan;Jeong, Eun-Hee;Yang, Seung-Hae
    • Journal of Korea Multimedia Society
    • /
    • v.15 no.8
    • /
    • pp.1048-1058
    • /
    • 2012
  • This paper proposes the DDMPF(Distributed Data Management Protocol using FAT) which prevents data loss and keeps the security of self-organized storages by comprising a client, a storage server, and a verification server in clouding environment. The DDMPF builds a self-organized storage server, solves data loss by decentralizing the partitioned data in it in contrast to the centralized problem and the data loss caused by the storage server problems of existing clouding storages, and improves the efficiency of distributed data management with FAT(File Allocation Table). And, the DDMPF improves the reliability of data by a verification server's verifying the data integrity of a storage server, and strengthens the security in double encryption with a client's private key and the system's master key using EC-DH algorithm. Additionally, the DDMPF limits the number of verification servers and detects the flooding attack by setting the TS(Time Stamp) for a verification request message and the replay attack by using the nonce value generated newly, whenever the verification is requested.

DISTRIBUTED WEB GIS SERVICE BASED ON XML AND INTEROPERABILITY

  • Kim, Do-Hyun
    • Proceedings of the KSRS Conference
    • /
    • 2002.10a
    • /
    • pp.145-150
    • /
    • 2002
  • Web GIS (Geographic Information Systems) service systems provide the various GIS services of analyzing and displaying the spatial data with friendly user-interface. These services are expanding the business domain and many users want to access the distributed various spatial data. But, it is difficult to access diverse data sources because of different spatial data format and data access methods. In this paper, we design and implement web GIS services based on the inter-operability and GML (Geography Markup Language) of OGC(Open GIS Consortium) in web distributed environment. Inter-operability provides unique accessing method to distributed data sources based on OLE DB technology of Microsoft. In addition, GML support web GIS services based on XML. We design these GIS services as components using UML (Unified Modeling Language) of an object-oriented modeling language for specifying, visualizing, constructing, and documenting the artifacts of software system. In addition, they also were developed in object-oriented computing environment, and it provides the interoperability, language-independent, easy developing environment as well as re-usability.

  • PDF

Design and evaluation of a GQS-based time-critical event dissemination for distributed clouds

  • Bae, Ihn-Han
    • Journal of the Korean Data and Information Science Society
    • /
    • v.22 no.5
    • /
    • pp.989-998
    • /
    • 2011
  • Cloud computing provides computation, software, data access, and storage services that do not require end-user knowledge of the physical location and configuration of the system that delivers the services. Cloud computing providers have setup several data centers at different geographical locations over the Internet in order to optimally serve needs of their customers around the world. One of the fundamental challenges in geographically distributed clouds is to provide efficient algorithms for supporting inter-cloud data management and dissemination. In this paper, we propose a group quorum system (GQS)-based dissemination for improving the interoperability of inter-cloud in time-critical event dissemination service, such as computing policy updating, message sharing, event notification and so forth. The proposed GQS-based method organizes these distributed clouds into a group quorum ring overlay to support a constant event dissemination latency. Our numerical results show that the GQS-based method improves the efficiency as compared with Chord-based and Plume methods.