• Title/Summary/Keyword: 분산 스토리지

Search Result 86, Processing Time 0.023 seconds

Design and Implementation of MongoDB-based Unstructured Log Processing System over Cloud Computing Environment (클라우드 환경에서 MongoDB 기반의 비정형 로그 처리 시스템 설계 및 구현)

  • Kim, Myoungjin;Han, Seungho;Cui, Yun;Lee, Hanku
    • Journal of Internet Computing and Services
    • /
    • v.14 no.6
    • /
    • pp.71-84
    • /
    • 2013
  • Log data, which record the multitude of information created when operating computer systems, are utilized in many processes, from carrying out computer system inspection and process optimization to providing customized user optimization. In this paper, we propose a MongoDB-based unstructured log processing system in a cloud environment for processing the massive amount of log data of banks. Most of the log data generated during banking operations come from handling a client's business. Therefore, in order to gather, store, categorize, and analyze the log data generated while processing the client's business, a separate log data processing system needs to be established. However, the realization of flexible storage expansion functions for processing a massive amount of unstructured log data and executing a considerable number of functions to categorize and analyze the stored unstructured log data is difficult in existing computer environments. Thus, in this study, we use cloud computing technology to realize a cloud-based log data processing system for processing unstructured log data that are difficult to process using the existing computing infrastructure's analysis tools and management system. The proposed system uses the IaaS (Infrastructure as a Service) cloud environment to provide a flexible expansion of computing resources and includes the ability to flexibly expand resources such as storage space and memory under conditions such as extended storage or rapid increase in log data. Moreover, to overcome the processing limits of the existing analysis tool when a real-time analysis of the aggregated unstructured log data is required, the proposed system includes a Hadoop-based analysis module for quick and reliable parallel-distributed processing of the massive amount of log data. Furthermore, because the HDFS (Hadoop Distributed File System) stores data by generating copies of the block units of the aggregated log data, the proposed system offers automatic restore functions for the system to continually operate after it recovers from a malfunction. Finally, by establishing a distributed database using the NoSQL-based Mongo DB, the proposed system provides methods of effectively processing unstructured log data. Relational databases such as the MySQL databases have complex schemas that are inappropriate for processing unstructured log data. Further, strict schemas like those of relational databases cannot expand nodes in the case wherein the stored data are distributed to various nodes when the amount of data rapidly increases. NoSQL does not provide the complex computations that relational databases may provide but can easily expand the database through node dispersion when the amount of data increases rapidly; it is a non-relational database with an appropriate structure for processing unstructured data. The data models of the NoSQL are usually classified as Key-Value, column-oriented, and document-oriented types. Of these, the representative document-oriented data model, MongoDB, which has a free schema structure, is used in the proposed system. MongoDB is introduced to the proposed system because it makes it easy to process unstructured log data through a flexible schema structure, facilitates flexible node expansion when the amount of data is rapidly increasing, and provides an Auto-Sharding function that automatically expands storage. The proposed system is composed of a log collector module, a log graph generator module, a MongoDB module, a Hadoop-based analysis module, and a MySQL module. When the log data generated over the entire client business process of each bank are sent to the cloud server, the log collector module collects and classifies data according to the type of log data and distributes it to the MongoDB module and the MySQL module. The log graph generator module generates the results of the log analysis of the MongoDB module, Hadoop-based analysis module, and the MySQL module per analysis time and type of the aggregated log data, and provides them to the user through a web interface. Log data that require a real-time log data analysis are stored in the MySQL module and provided real-time by the log graph generator module. The aggregated log data per unit time are stored in the MongoDB module and plotted in a graph according to the user's various analysis conditions. The aggregated log data in the MongoDB module are parallel-distributed and processed by the Hadoop-based analysis module. A comparative evaluation is carried out against a log data processing system that uses only MySQL for inserting log data and estimating query performance; this evaluation proves the proposed system's superiority. Moreover, an optimal chunk size is confirmed through the log data insert performance evaluation of MongoDB for various chunk sizes.

Exploratory Study on the Construction of Sewol Ferry Memorial Space: Focusing on the Perspective of Placeness and Dark Tourism (세월호 기억공간 건립 관련 탐색적 연구: 장소성 및 다크투어리즘 관점을 중심으로)

  • Chun, Myung-Hoon;Ko, Jeong-Min
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.8
    • /
    • pp.126-139
    • /
    • 2022
  • This study aims to present the criteria of location selection and contents composition of Sewol ferry memorial space from the perspective of placeness and dark tourism, one of the main concepts of cultural tourism theory, and to draw academic concepts and implications related to it. First, the location selection of Sewol ferry memorial space was proposed by applying the concept of placeness to each important location, and the direction of the contents composition was presented based on the contents components such as characters, stories, places and dark tourism composition directions. As a result, when the Sewol ferry memorial space is designed in use of systemizing the narrative structure of the storytelling method and implementing it specifically, it is suggested that it is possible to construct a Sewol ferry memorial space by distributing and course-making based on the narrative structure rather than focusing on the location selection of a specific place. In addition, it was argued that the contents composition criteria should be more complex and systematically established and applied considering the connection between placeness and dark tourism when the Sewol ferry memorial space is constructed in various sites like 416 life safety parks and Mokpo New Port. This study, which aims to answer research questions related to major issues based on the perspective of placeness and dark tourism on the Sewol ferry memorial space, is expected to be used as a reference for the construction of the memorial space in the future.

Implementation and Performance Analysis of Event Processing and Buffer Managing Techniques for DDS (고성능 데이터 발간/구독 미들웨어의 이벤트, 버퍼 처리 기술 및 성능 분석)

  • Yoon, Gunjae;Choi, Hoon
    • Journal of KIISE
    • /
    • v.44 no.5
    • /
    • pp.449-459
    • /
    • 2017
  • Data Distribution Service (DDS) is a communication middleware that supports a flexible, scalable and real-time communication capability. This paper describes several techniques to improve the performance of DDS middleware. Detailed events for the internal behavior of the middleware are defined. A DDS message is disassembled into several submessages of independent, meaningful units for event-driven structuring in order to reduce the processing complexity. The proposed technique of history cache management is also described. It utilizes the fact that status access and random access to the history cache occur more frequently in the DDS. These methods have been implemented in the EchoDDS, the DDS implementation developed by our team, and it showed improved performance.

A Name-based Service Discovering Mechanism for Efficient Service Delivery in IoT (IoT에서 효율적인 서비스 제공을 위한 이름 기반 서비스 탐색 메커니즘)

  • Cho, Kuk-Hyun;Kim, Jung-Jae;Ryu, Minwoo;Cha, Si-Ho
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.19 no.6
    • /
    • pp.46-54
    • /
    • 2018
  • The Internet of Things (IoT) is an environment in which various devices provide services to users through communications. Because of the nature of the IoT, data are stored and distributed in heterogeneous information systems. In this situation, IoT end applications should be able to access data without having information on where the data are or what the type of storage is. This mechanism is called Service Discovery (SD). However, some problems arise, since the current SD architectures search for data in physical devices. First, turnaround time increases from searching for services based on physical location. Second, there is a need for a data structure to manage devices and services separately. These increase the administrator's service configuration complexity. As a result, the device-oriented SD structure is not suitable to the IoT. Therefore, we propose an SD structure called Name-based Service-centric Service Discovery (NSSD). NSSD provides name-based centralized SD and uses the IoT edge gateway as a cache server to speed up service discovery. Simulation results show that NSSD provides about twice the improvement in average turnaround time, compared to existing domain name system and distributed hash table SD architectures.

A Study on the Test Results and Implementation of Correlated Result Saving System using the Gluster File System (Gluster 파일시스템을 이용한 상관자료 수집 시스템 구축 및 시험고찰)

  • Yeom, Jae-Hwan;Oh, Se-Jin;Roh, Duk-Gyoo;Jung, Dong-Kyu;Hwang, Ju-Yeon;Oh, Chungsik;Kim, Hyo-Ryoung
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.17 no.2
    • /
    • pp.53-60
    • /
    • 2016
  • In this paper, we introduce the implementation and test results of a new method of correlated result storage to achieve the full performance of the Daejeon hardware correlator. Recently, the observation of 8 Gbps speed, which is the maximum observational standard of KVN(Korean VLBI Network), has been performed. The correlation processing using the Daejeon hardware correlator is also required. Therefore, a new correlation result storage introduction has become necessary. The maximum correlation result output speed of the Daejeon hardware correlator is 1.4 GB/sec per 25.6 ms integration time. The conventional correlation result storage system can not cope with the maximum correlation output speed of the Daejeon hardware correlator, and the output speed is limited to 1/4. That is, among the four input ports of the Daejeon hardware correlator, the three inputs are limited to correspond to the observation rate of 1 Gbps. This new storage system uses the Gluster file system among many of the latest technologies used in storage systems. In tests that meet the maximum output rate of 1.4 GB/sec for the Daejeon hardware correlator, 350 MB/sec for each of the four optical outputs, resulting in 1.4 GB/sec in total.

Analysis of Factors for Korean Women's Cancer Screening through Hadoop-Based Public Medical Information Big Data Analysis (Hadoop기반의 공개의료정보 빅 데이터 분석을 통한 한국여성암 검진 요인분석 서비스)

  • Park, Min-hee;Cho, Young-bok;Kim, So Young;Park, Jong-bae;Park, Jong-hyock
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.22 no.10
    • /
    • pp.1277-1286
    • /
    • 2018
  • In this paper, we provide flexible scalability of computing resources in cloud environment and Apache Hadoop based cloud environment for analysis of public medical information big data. In fact, it includes the ability to quickly and flexibly extend storage, memory, and other resources in a situation where log data accumulates or grows over time. In addition, when real-time analysis of accumulated unstructured log data is required, the system adopts Hadoop-based analysis module to overcome the processing limit of existing analysis tools. Therefore, it provides a function to perform parallel distributed processing of a large amount of log data quickly and reliably. Perform frequency analysis and chi-square test for big data analysis. In addition, multivariate logistic regression analysis of significance level 0.05 and multivariate logistic regression analysis of meaningful variables (p<0.05) were performed. Multivariate logistic regression analysis was performed for each model 3.