• Title/Summary/Keyword: 데이터레이크 프레임워크

Search Result 4, Processing Time 0.022 seconds

Apache NiFi-based ETL Process for Building Data Lakes (데이터 레이크 구축을 위한 Apache NiFi기반 ETL 프로세스)

  • Lee, Kyoung Min;Lee, Kyung-Hee;Cho, Wan-Sup
    • The Journal of Bigdata
    • /
    • v.6 no.1
    • /
    • pp.145-151
    • /
    • 2021
  • In recent years, digital data has been generated in all areas of human activity, and there are many attempts to safely store and process the data to develop useful services. A data lake refers to a data repository that is independent of the source of the data and the analytical framework that leverages the data. In this paper, we designed a tool to safely store various big data generated by smart cities in a data lake and ETL it so that it can be used in services, and a web-based tool necessary to use it effectively. Implement. A series of processes (ETLs) that quality-check and refine source data, store it safely in a data lake, and manage it according to data life cycle policies are often significant for costly infrastructure and development and maintenance. It is a labor-intensive technology. The mounting technology makes it possible to set and execute ETL work monitoring and data life cycle management visually and efficiently without specialized knowledge in the IT field. Separately, a data quality checklist guide is needed to store and use reliable data in the data lake. In addition, it is necessary to set and reserve data migration and deletion cycles using the data life cycle management tool to reduce data management costs.

Draft Design of AI Services through Concept Extension of Connected Data Architecture (Connected Data Architecture 개념의 확장을 통한 AI 서비스 초안 설계)

  • Cha, ByungRae;Park, Sun;Oh, Su-Yeol;Kim, JongWon
    • Smart Media Journal
    • /
    • v.7 no.4
    • /
    • pp.30-36
    • /
    • 2018
  • Single domain model like DataLake framework is in spotlight because it can improve data efficiency and process data smarter in big data environment, where large scaled business system generates huge amount of data. In particular, efficient operation of network, storage, and computing resources in logical single domain model is very important for physically partitioned multi-site data process. Based on the advantages of Data Lake framework, we define and extend the concept of Connected Data Architecture and functions of DataLake framework for integrating multiple sites in various domains and managing the lifecycle of data. Also, we propose the design of CDA-based AI service and utilization scenarios in various application domain.

Design and Verification of Connected Data Architecture Concept employing DataLake Framework over Abyss Storage Cluster (Abyss Storage Cluster 기반 DataLake Framework의 Connected Data Architecture 개념 설계 및 검증)

  • Cha, ByungRae;Cha, Yun-Seok;Park, Sun;Shin, Byeong-Chun;Kim, JongWon
    • Smart Media Journal
    • /
    • v.7 no.3
    • /
    • pp.57-63
    • /
    • 2018
  • With many types of data generated in the shift of business environment as a result of growth of an organization or enterprise, there is a need to improve the data-processing efficiency in smarter means with a single domain model such as Data Lake. In particular, creating a logical single domain model from physical partitioned multi-site data by the finite resources of nature and shared economy is very important in terms of efficient operation of computing resources. Based on the advantages of the existing Data Lake framework, we define the CDA-Concept (connected data architecture concept) and functions of Data Lake Framework over Abyss Storage for integrating multiple sites in various application domains and managing the data lifecycle. Also, it performs the interface design and validation verification for Interface #2 & #3 of the connected data architecture-concept.

Draft Design of DataLake Framework based on Abyss Storage Cluster (Abyss Storage Cluster 기반의 DataLake Framework의 설계)

  • Cha, ByungRae;Park, Sun;Shin, Byeong-Chun;Kim, JongWon
    • Smart Media Journal
    • /
    • v.7 no.1
    • /
    • pp.9-15
    • /
    • 2018
  • As an organization or organization grows in size, many different types of data are being generated in different systems. There is a need for a way to improve efficiency by processing data smarter in different systems. Just like DataLake, we are creating a single domain model that accurately describes the data and can represent the most important data for the entire business. In order to realize the benefits of a DataLake, it is import to know how a DataLake may be expected to work and what components architecturally may help to build a fully functional DataLake. DataLake components have a life cycle according to the data flow. And while th data flows into a DataLake from the point of acquisition, its meta-data is captured and managed along with data traceability, data lineage, and security aspects based on data sensitivity across its life cycle. According to this reason, we have designed the DataLake Framework based on Abyss Storage Cluster.