• Title, Summary, Keyword: Data

Search Result 190,337, Processing Time 0.204 seconds

A Data Quality Management Maturity Model

  • Ryu, Kyung-Seok;Park, Joo-Seok;Park, Jae-Hong
    • ETRI Journal
    • /
    • v.28 no.2
    • /
    • pp.191-204
    • /
    • 2006
  • Many previous studies of data quality have focused on the realization and evaluation of both data value quality and data service quality. These studies revealed that poor data value quality and poor data service quality were caused by poor data structure. In this study we focus on metadata management, namely, data structure quality and introduce the data quality management maturity model as a preferred maturity model. We empirically show that data quality improves as data management matures.

  • PDF

Development Procedure of Data Organization of Data Repositories for Construction Engineering Research Cyberinfrastructure (건설공학 연구의 사이버 인프라를 위한 데이터 저장소의 데이터 구성의 단계적 개발방법)

  • Lee, Chang-Ho
    • Journal of the Architectural Institute of Korea
    • /
    • v.36 no.10
    • /
    • pp.177-188
    • /
    • 2020
  • The cyberinfrastructure for construction engineering research provides construction engineering researchers and engineers with a research environment that includes data repository, tools, and other computing services through the internet. As a main component of the cyberinfrastructure, the data repository stores the research project data and serves for data curation with data uploads/downloads. Since the data curation naturally depends on how the data is organized in the data repository, the data organization is important for practically useful data repositories. This paper uses the notation of classes and attributes of a data model to discuss the procedural steps to develop the efficient data organization of data repositories such as the data depot of DesignSafe for natural hazards engineering. The procedural development steps begins with the definition of uses for and the size of data repository. The basic organization of main data of the data repository is explored, and then the elaboration of data is proceeded. After the usage of data is evaluated by using a number of evaluation criteria, the data organization is improved based on the evaluation results. These development steps are repeated with various possible sequences until the efficient data organization is finally developed for data repositories for construction engineering research.

Street Fashion Information Analysis System Design Using Data Fusion

  • Park, Hee-Chang;Park, Hye-Won
    • 한국데이터정보과학회:학술대회논문집
    • /
    • /
    • pp.35-45
    • /
    • 2005
  • Data fusion is method to combination data. The purpose of this study is to design and implementation for street fashion information analysis system using data fusion. It can offer variety and actually information because it can fuse image data and survey data for street fashion. Data fusion method exists exact matching method, judgemental matching method, probability matching method, statistical matching method, data linking method, etc. In this study, we use exact matching method. Our system can be visual information analysis of customer's viewpoint because it can analyze both each data and fused data for image data and survey data.

  • PDF

Capturing Data from Untapped Sources using Apache Spark for Big Data Analytics (빅데이터 분석을 위해 아파치 스파크를 이용한 원시 데이터 소스에서 데이터 추출)

  • Nichie, Aaron;Koo, Heung-Seo
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.65 no.7
    • /
    • pp.1277-1282
    • /
    • 2016
  • The term "Big Data" has been defined to encapsulate a broad spectrum of data sources and data formats. It is often described to be unstructured data due to its properties of variety in data formats. Even though the traditional methods of structuring data in rows and columns have been reinvented into column families, key-value or completely replaced with JSON documents in document-based databases, the fact still remains that data have to be reshaped to conform to certain structure in order to persistently store the data on disc. ETL processes are key in restructuring data. However, ETL processes incur additional processing overhead and also require that data sources are maintained in predefined formats. Consequently, data in certain formats are completely ignored because designing ETL processes to cater for all possible data formats is almost impossible. Potentially, these unconsidered data sources can provide useful insights when incorporated into big data analytics. In this project, using big data solution, Apache Spark, we tapped into other sources of data stored in their raw formats such as various text files, compressed files etc and incorporated the data with persistently stored enterprise data in MongoDB for overall data analytics using MongoDB Aggregation Framework and MapReduce. This significantly differs from the traditional ETL systems in the sense that it is compactible regardless of the data formats at source.

Study on Data Control System Design Method with Complex Data-Algorithm Data Processing (복합적 자료-알고리즘 자료처리 방식을 적용한 자료처리 시스템 설계 방안 연구)

  • Kim, Min Wook;Park, Yeon Gu;Yi, Jonghyuk;Lee, Jeong-Deok
    • Journal of Satellite, Information and Communications
    • /
    • v.10 no.3
    • /
    • pp.11-15
    • /
    • 2015
  • In this study, we present the architecture design of data control system in water hazard information platform with analyzing the complexity of the data processing. Generally, data control systems in data collection and analysis platforms base on the constant data-algorithm data processing meaning that data processing between data and algorithm is fixed. But the number of data processing in data control system is rapidly increasing because of increasing of complexity of system. To hold down the number of data processing, dynamic data-algorithm data processing is able to be applied to data control system. After comparison each data-algorithm data processing method, we suggest design method of the data control system optimizing water hazard information platform.

A Data Design for Increasing the Usability of Subway Public Data

  • Min, Meekyung
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.11 no.4
    • /
    • pp.18-25
    • /
    • 2019
  • The public data portal provides various public data created by the government in the form of files and open APIs. In order to increase the usability of public open data, a variety of information should be provided to users and should be convenient to use for users. This requires the structured data design plan of the public data. In this paper, we propose a data design method to improve the usability of the Seoul subway public data. For the study, we first identify some properties of the current subway public data and then classify the data based on these properties. The properties used as classification criteria are stored properties, derived properties, static properties, and dynamic properties. We also analyze the limitations of current data for each property. Based on this analysis, we classify currently used subway public data into code entities, base entities, and history entities and present the improved design of entities according to this classification. In addition, we propose data retrieval functions to increase the utilization of the data. If the data is designed according to the proposed design of this paper, it will be possible to solve the problem of duplication and inconsistency of the data currently used and to implement more structural data. As a result, it can provide more functions for users, which is the basis for increasing usability of subway public data.

Design of Client-Server Model For Effective Processing and Utilization of Bigdata (빅데이터의 효과적인 처리 및 활용을 위한 클라이언트-서버 모델 설계)

  • Park, Dae Seo;Kim, Hwa Jong
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.4
    • /
    • pp.109-122
    • /
    • 2016
  • Recently, big data analysis has developed into a field of interest to individuals and non-experts as well as companies and professionals. Accordingly, it is utilized for marketing and social problem solving by analyzing the data currently opened or collected directly. In Korea, various companies and individuals are challenging big data analysis, but it is difficult from the initial stage of analysis due to limitation of big data disclosure and collection difficulties. Nowadays, the system improvement for big data activation and big data disclosure services are variously carried out in Korea and abroad, and services for opening public data such as domestic government 3.0 (data.go.kr) are mainly implemented. In addition to the efforts made by the government, services that share data held by corporations or individuals are running, but it is difficult to find useful data because of the lack of shared data. In addition, big data traffic problems can occur because it is necessary to download and examine the entire data in order to grasp the attributes and simple information about the shared data. Therefore, We need for a new system for big data processing and utilization. First, big data pre-analysis technology is needed as a way to solve big data sharing problem. Pre-analysis is a concept proposed in this paper in order to solve the problem of sharing big data, and it means to provide users with the results generated by pre-analyzing the data in advance. Through preliminary analysis, it is possible to improve the usability of big data by providing information that can grasp the properties and characteristics of big data when the data user searches for big data. In addition, by sharing the summary data or sample data generated through the pre-analysis, it is possible to solve the security problem that may occur when the original data is disclosed, thereby enabling the big data sharing between the data provider and the data user. Second, it is necessary to quickly generate appropriate preprocessing results according to the level of disclosure or network status of raw data and to provide the results to users through big data distribution processing using spark. Third, in order to solve the problem of big traffic, the system monitors the traffic of the network in real time. When preprocessing the data requested by the user, preprocessing to a size available in the current network and transmitting it to the user is required so that no big traffic occurs. In this paper, we present various data sizes according to the level of disclosure through pre - analysis. This method is expected to show a low traffic volume when compared with the conventional method of sharing only raw data in a large number of systems. In this paper, we describe how to solve problems that occur when big data is released and used, and to help facilitate sharing and analysis. The client-server model uses SPARK for fast analysis and processing of user requests. Server Agent and a Client Agent, each of which is deployed on the Server and Client side. The Server Agent is a necessary agent for the data provider and performs preliminary analysis of big data to generate Data Descriptor with information of Sample Data, Summary Data, and Raw Data. In addition, it performs fast and efficient big data preprocessing through big data distribution processing and continuously monitors network traffic. The Client Agent is an agent placed on the data user side. It can search the big data through the Data Descriptor which is the result of the pre-analysis and can quickly search the data. The desired data can be requested from the server to download the big data according to the level of disclosure. It separates the Server Agent and the client agent when the data provider publishes the data for data to be used by the user. In particular, we focus on the Big Data Sharing, Distributed Big Data Processing, Big Traffic problem, and construct the detailed module of the client - server model and present the design method of each module. The system designed on the basis of the proposed model, the user who acquires the data analyzes the data in the desired direction or preprocesses the new data. By analyzing the newly processed data through the server agent, the data user changes its role as the data provider. The data provider can also obtain useful statistical information from the Data Descriptor of the data it discloses and become a data user to perform new analysis using the sample data. In this way, raw data is processed and processed big data is utilized by the user, thereby forming a natural shared environment. The role of data provider and data user is not distinguished, and provides an ideal shared service that enables everyone to be a provider and a user. The client-server model solves the problem of sharing big data and provides a free sharing environment to securely big data disclosure and provides an ideal shared service to easily find big data.

A Comparative Study of Big Data, Open Data, and My Data (빅데이터, 오픈데이터, 마이데이터의 비교 연구)

  • Park, Jooseok
    • The Journal of Bigdata
    • /
    • v.3 no.1
    • /
    • pp.41-46
    • /
    • 2018
  • With the advent of the fourth industrial revolution, data becomes very important resource. Now is called as 'Data Revolution Age.' It is said that Data Revolution Age started with Big Data, then accelerated with Open Data, finally completed with My Data. In this paper, we compared Big Data, Open Data, and suggested roles and effects of My Data as a digital resource.

Modeling and Implementation of Public Open Data in NoSQL Database

  • Min, Meekyung
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.10 no.3
    • /
    • pp.51-58
    • /
    • 2018
  • In order to utilize various data provided by Korea public open data portal, data should be systematically managed using a database. Since the range of open data is enormous, and the amount of data continues to increase, it is preferable to use a database capable of processing big data in order to analyze and utilize the data. This paper proposes data modeling and implementation method suitable for public data. The target data is subway related data provided by the public open data portal. Schema of the public data related to Seoul metro stations are analyzed and problems of the schema are presented. To solve these problems, this paper proposes a method to normalize and structure the subway data and model it in NoSQL database. In addition, the implementation result is shown by using MongDB which is a document-based database capable of processing big data.

Component Development and Importance Weight Analysis of Data Governance (Data Governance 구성요소 개발과 중요도 분석)

  • Jang, Kyoung-Ae;Kim, Woo-Je
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.41 no.3
    • /
    • pp.45-58
    • /
    • 2016
  • Data are important in an organization because they are used in making decisions and obtaining insights. Furthermore, given the increasing importance of data in modern society, data governance should be requested to increase an organization's competitive power. However, data governance concepts have caused confusion because of the myriad of guidelines proposed by related institutions and researchers. In this study, we re-established the concept of ambiguous data governance and derived the top-level components by analyzing previous research. This study identified the components of data governance and quantitatively analyzed the relation between these components by using DEMATEL and context analysis techniques that are often used to solve complex problems. Three higher components (data compliance management, data quality management, and data organization management) and 13 lower components are derived as data governance components. Furthermore, importance analysis shows that data quality management, data compliance management, and data organization management are the top components of data governance in order of priority. This study can be used as a basis for presenting standards or establishing concepts of data governance.