• Title/Summary/Keyword: Distributed DB

Search Result 140, Processing Time 0.025 seconds

Development of the design methodology for large-scale database based on MongoDB

  • Lee, Jun-Ho;Joo, Kyung-Soo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.22 no.11
    • /
    • pp.57-63
    • /
    • 2017
  • The recent sudden increase of big data has characteristics such as continuous generation of data, large amount, and unstructured format. The existing relational database technologies are inadequate to handle such big data due to the limited processing speed and the significant storage expansion cost. Thus, big data processing technologies, which are normally based on distributed file systems, distributed database management, and parallel processing technologies, have arisen as a core technology to implement big data repositories. In this paper, we propose a design methodology for large-scale database based on MongoDB by extending the information engineering methodology based on E-R data model.

An Efficient Design and Implementation of an MdbULPS in a Cloud-Computing Environment

  • Kim, Myoungjin;Cui, Yun;Lee, Hanku
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.9 no.8
    • /
    • pp.3182-3202
    • /
    • 2015
  • Flexibly expanding the storage capacity required to process a large amount of rapidly increasing unstructured log data is difficult in a conventional computing environment. In addition, implementing a log processing system providing features that categorize and analyze unstructured log data is extremely difficult. To overcome such limitations, we propose and design a MongoDB-based unstructured log processing system (MdbULPS) for collecting, categorizing, and analyzing log data generated from banks. The proposed system includes a Hadoop-based analysis module for reliable parallel-distributed processing of massive log data. Furthermore, because the Hadoop distributed file system (HDFS) stores data by generating replicas of collected log data in block units, the proposed system offers automatic system recovery against system failures and data loss. Finally, by establishing a distributed database using the NoSQL-based MongoDB, the proposed system provides methods of effectively processing unstructured log data. To evaluate the proposed system, we conducted three different performance tests on a local test bed including twelve nodes: comparing our system with a MySQL-based approach, comparing it with an Hbase-based approach, and changing the chunk size option. From the experiments, we found that our system showed better performance in processing unstructured log data.

Specification Technique of EJB-Based Application using Design by Contracts Approach (DbC 접근법을 이용한 EJB 기반 애플리케이션의 명세 기법)

  • 노혜민;유철중
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.12
    • /
    • pp.895-906
    • /
    • 2002
  • Due to increased concern about the distributed web application, the interest in EJB - server-side Java component architecture that enables to make out Business Logic without writing codes related to complicated distributed framework - is also increasing. Despite of these increased interest, However, efforts for reliability of these systems have been insufficient. Thus, in this paper, we propose specification technique for applying DbC approach, which can elevate the reliability of software in the Object-Oriented system development, in writing formal specification of EJB-based application. Through this specification technique, developers can gain reliability in the EJB-based application development.

Distributed FTP Server for Log Mining System on ACE (분산 FTP 서버의 ACE 기반 로그 마이닝 시스템)

  • Min, Su-Hong;Cho, Dong-Sub
    • Proceedings of the KIEE Conference
    • /
    • 2002.11c
    • /
    • pp.465-468
    • /
    • 2002
  • Today large corporations are constructing distributed server environment. Many corporations are respectively operating Web server, FTP server, Mail server and DB server on heterogeneous operation. However, there is the problem that a manager must manage each server individually. In this paper, we present distributed FTP server for log mining system on ACE. Proposed log mining system is based upon ACE (Adaptive Communication Environment) framework and data mining techniques. This system provides a united operation with distributed FTP server.

  • PDF

Design and Implementation of MongoDB-based Unstructured Log Processing System over Cloud Computing Environment (클라우드 환경에서 MongoDB 기반의 비정형 로그 처리 시스템 설계 및 구현)

  • Kim, Myoungjin;Han, Seungho;Cui, Yun;Lee, Hanku
    • Journal of Internet Computing and Services
    • /
    • v.14 no.6
    • /
    • pp.71-84
    • /
    • 2013
  • Log data, which record the multitude of information created when operating computer systems, are utilized in many processes, from carrying out computer system inspection and process optimization to providing customized user optimization. In this paper, we propose a MongoDB-based unstructured log processing system in a cloud environment for processing the massive amount of log data of banks. Most of the log data generated during banking operations come from handling a client's business. Therefore, in order to gather, store, categorize, and analyze the log data generated while processing the client's business, a separate log data processing system needs to be established. However, the realization of flexible storage expansion functions for processing a massive amount of unstructured log data and executing a considerable number of functions to categorize and analyze the stored unstructured log data is difficult in existing computer environments. Thus, in this study, we use cloud computing technology to realize a cloud-based log data processing system for processing unstructured log data that are difficult to process using the existing computing infrastructure's analysis tools and management system. The proposed system uses the IaaS (Infrastructure as a Service) cloud environment to provide a flexible expansion of computing resources and includes the ability to flexibly expand resources such as storage space and memory under conditions such as extended storage or rapid increase in log data. Moreover, to overcome the processing limits of the existing analysis tool when a real-time analysis of the aggregated unstructured log data is required, the proposed system includes a Hadoop-based analysis module for quick and reliable parallel-distributed processing of the massive amount of log data. Furthermore, because the HDFS (Hadoop Distributed File System) stores data by generating copies of the block units of the aggregated log data, the proposed system offers automatic restore functions for the system to continually operate after it recovers from a malfunction. Finally, by establishing a distributed database using the NoSQL-based Mongo DB, the proposed system provides methods of effectively processing unstructured log data. Relational databases such as the MySQL databases have complex schemas that are inappropriate for processing unstructured log data. Further, strict schemas like those of relational databases cannot expand nodes in the case wherein the stored data are distributed to various nodes when the amount of data rapidly increases. NoSQL does not provide the complex computations that relational databases may provide but can easily expand the database through node dispersion when the amount of data increases rapidly; it is a non-relational database with an appropriate structure for processing unstructured data. The data models of the NoSQL are usually classified as Key-Value, column-oriented, and document-oriented types. Of these, the representative document-oriented data model, MongoDB, which has a free schema structure, is used in the proposed system. MongoDB is introduced to the proposed system because it makes it easy to process unstructured log data through a flexible schema structure, facilitates flexible node expansion when the amount of data is rapidly increasing, and provides an Auto-Sharding function that automatically expands storage. The proposed system is composed of a log collector module, a log graph generator module, a MongoDB module, a Hadoop-based analysis module, and a MySQL module. When the log data generated over the entire client business process of each bank are sent to the cloud server, the log collector module collects and classifies data according to the type of log data and distributes it to the MongoDB module and the MySQL module. The log graph generator module generates the results of the log analysis of the MongoDB module, Hadoop-based analysis module, and the MySQL module per analysis time and type of the aggregated log data, and provides them to the user through a web interface. Log data that require a real-time log data analysis are stored in the MySQL module and provided real-time by the log graph generator module. The aggregated log data per unit time are stored in the MongoDB module and plotted in a graph according to the user's various analysis conditions. The aggregated log data in the MongoDB module are parallel-distributed and processed by the Hadoop-based analysis module. A comparative evaluation is carried out against a log data processing system that uses only MySQL for inserting log data and estimating query performance; this evaluation proves the proposed system's superiority. Moreover, an optimal chunk size is confirmed through the log data insert performance evaluation of MongoDB for various chunk sizes.

Comparison and Analysis of Metadata Schema for Academic Paper Integrated DB (학술논문 통합 DB 구축을 위한 메타데이터 스키마 비교 분석)

  • Choi, Wonjun;Hwang, Hyekyong;Kim, Jeonghwan;Lee, Kangsandajeong;Lim, Seokjong
    • The Journal of the Korea Contents Association
    • /
    • v.20 no.2
    • /
    • pp.689-699
    • /
    • 2020
  • The National Science and Technology Information Center (NDSL) database, which provides academic papers at home and abroad, collects, builds, and manages data collected from various sources. In this study, we analyzed the DB paper schema and DB metadata that are currently constructed and managed to derive an integrated DB schema that can manage the high-value-added papers and manage them efficiently by analyzing distributed DB papers. Also, the final academic information data items were determined through comparison and analysis using the Web of Science and SCOPUS schemas that are currently purchased and possessed. The academic information data items constructed and serviced through this study were summarized into seven papers, authors, abstracts, institutions, themes, journals, and references, and defined as core contents under construction. The integrated DB schema was created through this study, and the results of this study will be used as a basis for constructing the integrated DB collection of high quality academic papers and designing the integrated system.

Analysis of Characteristics in the Land Cover Types of Inland Wetlands Using the National Wetland DB at South Korea (국가습지 DB를 활용한 남한 내륙습지의 토지피복 유형 특성 분석)

  • Lee, Ye-Seul;Yoon, Hye-Yeon;Lee, Seong-Ho;JANG, Dong-Ho;Yun, Kwang-Sung;Lee, Chang-Su
    • Journal of The Geomorphological Association of Korea
    • /
    • v.27 no.4
    • /
    • pp.71-88
    • /
    • 2020
  • This study modified the properties and boundaries of the inland wetland types through the structural edit of the National Wetland DB, and analyzed the characteristics of the different land cover by area and the entire inland wetlands of South Korea. The inland wetlands of the Gangwon Basin had a small area of waters. In addition, the ratio of natural barren was high, reflecting the characteristics of the upper reaches of the large river in the east and west part of Gangwon Province. The Geum River Basin had a high percentage of aggregate land due to the development of large alluvial land, and the ratio of artistic barren was low, so various ecosystem service of wetland elements were distributed evenly. The Nakdong River Basin had a high proportion of waters as water level in the channel rose due to the installation of 4 Major Rivers Beam, and the ratio of Natural barren was low. Moreover, the water level of the main attributes flowing into the Nakdong River drainage system was not high, so the ratio of vegetation concentration was high. The Yeongsan River Basin showed that Waters had the high proportion. And the distribution of Natural barrens represented differently according to the Yeongsan River Basin and the Seomjin River Basin. Finally, Sand and Gravels supplied to rivers during precipitation were deposited in the main stream of the Han River Basin, and the differences between the side and high side was large in the area, reflecting the characteristics of the mouth of a river, so the Natural barren of Clay was distributed.

Implementation of text to speech terminal system by distributed database (데이터베이스 분산을 통한 소용량 문자-음성 합성 단말기 구현)

  • 김영길;박창현;양윤기
    • Proceedings of the IEEK Conference
    • /
    • 2003.07e
    • /
    • pp.2431-2434
    • /
    • 2003
  • In this research, our goal is to realize Korean Distribute TTS system with server/client function in wireless network. The speech databases and some routines of TTS system is stuck with the server which has strong functions and we made Korean speech databases and accomplished research about DB which is suitable for distributed TTS. We designed a terminal has the minimum setting which operate this TTS and designed proper protocol so we will check action of Distributed TTS.

  • PDF

An implementation of MongoDB based Distributed Triple Store on Jena Framework (MongoDB를 활용한 Jena 프레임워크 기반의 분산 트리플 저장소 구현)

  • Ahn, Jinhyun;Yang, Sungkwon;Lee, Munhwan;Jung, Jinuk;Kim, Eung-Hee;Im, Dong-Hyuk;Kim, Hong-Gee
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2015.10a
    • /
    • pp.1615-1617
    • /
    • 2015
  • 웹을 통한 데이터 공유에 대한 관심의 증가로 RDF 트리플 형태의 데이터가 폭발적으로 증가하고 있다. 대용량 RDF 데이터를 저장하고 빠른 SPARQL 질의 처리를 지원하는 트리플 저장소의 개발이 중요하다. 아파치 프로젝트 중 하나인 Jena-TDB는 가장 잘 알려진 오픈소스 트리플 저장소 중 하나로서 Jena 프레임워크 기반으로 구현됐다. 하지만 Jena-TDB 의 경우 단일 컴퓨터에서 작동하기 때문에 대용량 RDF 데이터를 다룰 수 없다는 문제점이 있다. 본 논문에서는 MongoDB를 활용한 Jena 프레임워크 기반의 트리플 저장소인 Jena-MongoDB를 제안한다. Jena 프레임워크를 사용했기 때문에 기존 Jena-TDB와 동일한 인터페이스로 사용할 수 있고 최신 표준 SPARQL 문법도 지원한다. 또한 MongoDB를 사용했기 때문에 분산환경에서도 작동할 수 있다. 대용량 LUBM 데이터셋에 대한 SPARQL 질의 처리 실험결과 Jena-MongoDB가 Jena-TDB 보다 빠른 질의 응답 속도를 보여줬다.

Effects of Bambusae Caulis in Liquamen on Blood Sugar in db/db Mice (죽력이 db/db mouse의 혈당강하에 미치는 영향)

  • Cheong Ki Sang;Choi Chan Hun;Jang Kyeong Seon
    • Journal of Physiology & Pathology in Korean Medicine
    • /
    • v.17 no.1
    • /
    • pp.177-182
    • /
    • 2003
  • This study was carried out to understand the effects of Bambusae Caulis in Liquamen on blood sugar in the db/db mice. Refined Bambusae Caulis in Liquamen C. D(BCL.C. D)manufactured by high temperature production process and Bambusae Caulis in Liquamen(H-BCL) manufactured & distributed by HANLIM PHARM.COM., LTD were used. The Bambusae Caulis in Liquamen extracted from bamboo charooal manufacturing process was filtered and refined. The effects of Bambusae Caulis in Liquamen were administered orally to mice for 6weeks and its anti-diabetic effect examined. The effects of BCL.C. D and H-BCL were observed in terms of blood sugar. creatinine. BUN and GPT in db/db mice. The results were as follows : The amount of glucose was slightly decreased (P<0.05) in the B CL.C-treated groups compared with the control. The amount of glucose was significantly decreased (P<0.01) in the BCL.D and H-BCL-treated groups compared with the control. The amount of creatinine did not show any differences among four groups. The amount of blood urea nitrogen did not show any differences in the case of BCL.C-treated groups. but observed significant decrease in the case of BCL.D and H-BCL-treated groups. The amount of GPT did not show any differences in the case of BCL.D-treated groups. but observed significant increase in the case of BCL.C and H-BCL-treated groups.