• Title/Summary/Keyword: Bigdata server

Search Result 17, Processing Time 0.025 seconds

Designing Cost Effective Open Source System for Bigdata Analysis (빅데이터 분석을 위한 비용효과적 오픈 소스 시스템 설계)

  • Lee, Jong-Hwa;Lee, Hyun-Kyu
    • Knowledge Management Research
    • /
    • v.19 no.1
    • /
    • pp.119-132
    • /
    • 2018
  • Many advanced products and services are emerging in the market thanks to data-based technologies such as Internet (IoT), Big Data, and AI. The construction of a system for data processing under the IoT network environment is not simple in configuration, and has a lot of restrictions due to a high cost for constructing a high performance server environment. Therefore, in this paper, we will design a development environment for large data analysis computing platform using open source with low cost and practicality. Therefore, this study intends to implement a big data processing system using Raspberry Pi, an ultra-small PC environment, and open source API. This big data processing system includes building a portable server system, building a web server for web mining, developing Python IDE classes for crawling, and developing R Libraries for NLP and visualization. Through this research, we will develop a web environment that can control real-time data collection and analysis of web media in a mobile environment and present it as a curriculum for non-IT specialists.

Development of a CUBRID-Based Distributed Parallel Query Processing System

  • Kim, Hyeong-Il;Yang, HyeonSik;Yoon, Min;Chang, Jae-Woo
    • Journal of Information Processing Systems
    • /
    • v.13 no.3
    • /
    • pp.518-532
    • /
    • 2017
  • Due to the rapid growth of the amount of data, research on bigdata processing has been highlighted. For bigdata processing, CUBRID Shard is able to support query processing in parallel way by dividing the database into a number of CUBRID servers. However, CUBRID Shard can answer a user's query only when the query is required to gain accesses to a single CUBRID server, instead of multiple ones. To solve the problem, in this paper we propose a CUBRID based distributed parallel query processing system that can answer a user's query in parallel and distributed manner. Finally, through the performance evaluation, we show that our proposed system provides 2-3 times better performance on query processing time than the existing CUBRID Shard.

Implementation and Performance Aanalysis of Efficient Big Data Processing System Through Dynamic Configuration of Edge Server Computing and Storage Modules (BigCrawler: 엣지 서버 컴퓨팅·스토리지 모듈의 동적 구성을 통한 효율적인 빅데이터 처리 시스템 구현 및 성능 분석)

  • Kim, Yongyeon;Jeon, Jaeho;Kang, Sungjoo
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.16 no.6
    • /
    • pp.259-266
    • /
    • 2021
  • Edge Computing enables real-time big data processing by performing computing close to the physical location of the user or data source. However, in an edge computing environment, various situations that affect big data processing performance may occur depending on temporary service requirements or changes of physical resources in the field. In this paper, we proposed a BigCrawler system that dynamically configures the computing module and storage module according to the big data collection status and computing resource usage status in the edge computing environment. And the feature of big data processing workload according to the arrangement of computing module and storage module were analyzed.

An Implementation of Federated Learning based on Blockchain (블록체인 기반의 연합학습 구현)

  • Park, June Beom;Park, Jong Sou
    • The Journal of Bigdata
    • /
    • v.5 no.1
    • /
    • pp.89-96
    • /
    • 2020
  • Deep learning using an artificial neural network has been recently researched and developed in various fields such as image recognition, big data and data analysis. However, federated learning has emerged to solve issues of data privacy invasion and problems that increase the cost and time required to learn. Federated learning presented learning techniques that would bring the benefits of distributed processing system while solving the problems of existing deep learning, but there were still problems with server-client system and motivations for providing learning data. So, we replaced the role of the server with a blockchain system in federated learning, and conducted research to solve the privacy and security problems that are associated with federated learning. In addition, we have implemented a blockchain-based system that motivates users by paying compensation for data provided by users, and requires less maintenance costs while maintaining the same accuracy as existing learning. In this paper, we present the experimental results to show the validity of the blockchain-based system, and compare the results of the existing federated learning with the blockchain-based federated learning. In addition, as a future study, we ended the thesis by presenting solutions to security problems and applicable business fields.

Design of Cloud Service Platform for eGovernment

  • LEE, Choong Hyong
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.13 no.1
    • /
    • pp.201-209
    • /
    • 2021
  • The term, eGovernmen or e-Government, uses technology communications devices such as computers and the Internet to provide public services to citizens and others. The eGovernment or e-government provides citizens with new opportunities to access the government directly and conveniently, while the government provides citizens with directservices. Also, in these days, cloud computing is a feature that enables users to use computer system resources, especially data storage (cloud storage) and on-demand computing power, without having to manage themselves. The term is commonly used to describe data centers that are available to many users over the Internet. Today, the dominant Big Cloud is distributed across multiple central servers. You can designate it as an Edge server if it is relatively close to the user. However, despite the prevalence of e-government and cloud computing, each of these concepts has evolved. Research attempts to combine these two concepts were not being made properly. For this reason, in this work, we aim to produce independent and objective analysis results by separating progress steps for the analysis of e-government cloud service platforms. This work will be done through an analysis of the development process and architectural composition of the e-government development standard framework and the cloud platform PaaS-TA. In addition, this study is expected to derive implications from an analysis perspective on the direction and service composition of the e-government cloud service platform currently being pursued.

Blockchain Technology for Healthcare Big Data Sharing (헬스케어 빅데이터 유통을 위한 블록체인기술 활성화 방안)

  • Yu, Hyeong Won;Lee, Eunsol;Kho, Wookyun;Han, Ho-seong;Han, Hyun Wook
    • The Journal of Bigdata
    • /
    • v.3 no.1
    • /
    • pp.73-82
    • /
    • 2018
  • At the core of future medicine is the realization of Precision Medicine centered on individuals. For this, we need to have an open ecosystem that can view, manage and distribute healthcare data anytime, anywhere. However, since healthcare data deals with sensitive personal information, a significant level of reliability and security are required at the same time. In order to solve this problem, the healthcare industry is paying attention to the blockchain technology. Unlike the existing information communication infrastructure, which stores and manages transaction information in a central server, the block chain technology is a distributed operating network in which a data is distributed and managed by all users participating in the network. In this study, we not only discuss the technical and legal aspects necessary for demonstration of healthcare data distribution using blockchain technology but also introduce KOREN SDI Network-based Healthcare Big Data Distribution Demonstration Study. In addition, we discuss policy strategies for activating blockchain technology in healthcare.

Design and Implementation of Efficient Storage and Retrieval Technology of Traffic Big Data (교통 빅데이터의 효율적 저장 및 검색 기술의 설계와 구현)

  • Kim, Ki-su;Yi, Jae-Jin;Kim, Hong-Hoi;Jang, Yo-lim;Hahm, Yu-Kun
    • The Journal of Bigdata
    • /
    • v.4 no.2
    • /
    • pp.207-220
    • /
    • 2019
  • Recent developments in information and communication technology has enabled the deployment of sensor based data to provide real-time services. In Korea, The Korea Transportation Safety Authority is collecting driving information of all commercial vehicles through a fitted digital tachograph (DTG). This information gathered using DTG can be utilized in various ways in the field of transportation. Notably in autonomous driving, the real-time analysis of this information can be used to prevent or respond to dangerous driving behavior. However, there is a limit to processing a large amount of data at a level suitable for real-time services using a traditional database system. In particular, due to a such technical problem, the processing of large quantity of traffic big data for real-time commercial vehicle operation information analysis has never been attempted in Korea. In order to solve this problem, this study optimized the new database server system and confirmed that a real-time service is possible. It is expected that the constructed database system will be used to secure base data needed to establish digital twin and autonomous driving environments.

  • PDF

Efficient distributed consensus optimization based on patterns and groups for federated learning (연합학습을 위한 패턴 및 그룹 기반 효율적인 분산 합의 최적화)

  • Kang, Seung Ju;Chun, Ji Young;Noh, Geontae;Jeong, Ik Rae
    • Journal of Internet Computing and Services
    • /
    • v.23 no.4
    • /
    • pp.73-85
    • /
    • 2022
  • In the era of the 4th industrial revolution, where automation and connectivity are maximized with artificial intelligence, the importance of data collection and utilization for model update is increasing. In order to create a model using artificial intelligence technology, it is usually necessary to gather data in one place so that it can be updated, but this can infringe users' privacy. In this paper, we introduce federated learning, a distributed machine learning method that can update models in cooperation without directly sharing distributed stored data, and introduce a study to optimize distributed consensus among participants without an existing server. In addition, we propose a pattern and group-based distributed consensus optimization algorithm that uses an algorithm for generating patterns and groups based on the Kirkman Triple System, and performs parallel updates and communication. This algorithm guarantees more privacy than the existing distributed consensus optimization algorithm and reduces the communication time until the model converges.

An elastic distributed parallel Hadoop system for bigdata platform and distributed inference engines (동적 분산병렬 하둡시스템 및 분산추론기에 응용한 서버가상화 빅데이터 플랫폼)

  • Song, Dong Ho;Shin, Ji Ae;In, Yean Jin;Lee, Wan Gon;Lee, Kang Se
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.5
    • /
    • pp.1129-1139
    • /
    • 2015
  • Inference process generates additional triples from knowledge represented in RDF triples of semantic web technology. Tens of million of triples as an initial big data and the additionally inferred triples become a knowledge base for applications such as QA(question&answer) system. The inference engine requires more computing resources to process the triples generated while inferencing. The additional computing resources supplied by underlying resource pool in cloud computing can shorten the execution time. This paper addresses an algorithm to allocate the number of computing nodes "elastically" at runtime on Hadoop, depending on the size of knowledge data fed. The model proposed in this paper is composed of the layered architecture: the top layer for applications, the middle layer for distributed parallel inference engine to process the triples, and lower layer for elastic Hadoop and server visualization. System algorithms and test data are analyzed and discussed in this paper. The model hast the benefit that rich legacy Hadoop applications can be run faster on this system without any modification.

Design of Client-Server Model For Effective Processing and Utilization of Bigdata (빅데이터의 효과적인 처리 및 활용을 위한 클라이언트-서버 모델 설계)

  • Park, Dae Seo;Kim, Hwa Jong
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.4
    • /
    • pp.109-122
    • /
    • 2016
  • Recently, big data analysis has developed into a field of interest to individuals and non-experts as well as companies and professionals. Accordingly, it is utilized for marketing and social problem solving by analyzing the data currently opened or collected directly. In Korea, various companies and individuals are challenging big data analysis, but it is difficult from the initial stage of analysis due to limitation of big data disclosure and collection difficulties. Nowadays, the system improvement for big data activation and big data disclosure services are variously carried out in Korea and abroad, and services for opening public data such as domestic government 3.0 (data.go.kr) are mainly implemented. In addition to the efforts made by the government, services that share data held by corporations or individuals are running, but it is difficult to find useful data because of the lack of shared data. In addition, big data traffic problems can occur because it is necessary to download and examine the entire data in order to grasp the attributes and simple information about the shared data. Therefore, We need for a new system for big data processing and utilization. First, big data pre-analysis technology is needed as a way to solve big data sharing problem. Pre-analysis is a concept proposed in this paper in order to solve the problem of sharing big data, and it means to provide users with the results generated by pre-analyzing the data in advance. Through preliminary analysis, it is possible to improve the usability of big data by providing information that can grasp the properties and characteristics of big data when the data user searches for big data. In addition, by sharing the summary data or sample data generated through the pre-analysis, it is possible to solve the security problem that may occur when the original data is disclosed, thereby enabling the big data sharing between the data provider and the data user. Second, it is necessary to quickly generate appropriate preprocessing results according to the level of disclosure or network status of raw data and to provide the results to users through big data distribution processing using spark. Third, in order to solve the problem of big traffic, the system monitors the traffic of the network in real time. When preprocessing the data requested by the user, preprocessing to a size available in the current network and transmitting it to the user is required so that no big traffic occurs. In this paper, we present various data sizes according to the level of disclosure through pre - analysis. This method is expected to show a low traffic volume when compared with the conventional method of sharing only raw data in a large number of systems. In this paper, we describe how to solve problems that occur when big data is released and used, and to help facilitate sharing and analysis. The client-server model uses SPARK for fast analysis and processing of user requests. Server Agent and a Client Agent, each of which is deployed on the Server and Client side. The Server Agent is a necessary agent for the data provider and performs preliminary analysis of big data to generate Data Descriptor with information of Sample Data, Summary Data, and Raw Data. In addition, it performs fast and efficient big data preprocessing through big data distribution processing and continuously monitors network traffic. The Client Agent is an agent placed on the data user side. It can search the big data through the Data Descriptor which is the result of the pre-analysis and can quickly search the data. The desired data can be requested from the server to download the big data according to the level of disclosure. It separates the Server Agent and the client agent when the data provider publishes the data for data to be used by the user. In particular, we focus on the Big Data Sharing, Distributed Big Data Processing, Big Traffic problem, and construct the detailed module of the client - server model and present the design method of each module. The system designed on the basis of the proposed model, the user who acquires the data analyzes the data in the desired direction or preprocesses the new data. By analyzing the newly processed data through the server agent, the data user changes its role as the data provider. The data provider can also obtain useful statistical information from the Data Descriptor of the data it discloses and become a data user to perform new analysis using the sample data. In this way, raw data is processed and processed big data is utilized by the user, thereby forming a natural shared environment. The role of data provider and data user is not distinguished, and provides an ideal shared service that enables everyone to be a provider and a user. The client-server model solves the problem of sharing big data and provides a free sharing environment to securely big data disclosure and provides an ideal shared service to easily find big data.