• Title/Summary/Keyword: Data

Search Result 215,259, Processing Time 0.105 seconds

A Data Quality Management Maturity Model

  • Ryu, Kyung-Seok;Park, Joo-Seok;Park, Jae-Hong
    • ETRI Journal
    • /
    • v.28 no.2
    • /
    • pp.191-204
    • /
    • 2006
  • Many previous studies of data quality have focused on the realization and evaluation of both data value quality and data service quality. These studies revealed that poor data value quality and poor data service quality were caused by poor data structure. In this study we focus on metadata management, namely, data structure quality and introduce the data quality management maturity model as a preferred maturity model. We empirically show that data quality improves as data management matures.

  • PDF

The Data Sharing Economy and Open Governance of Big Data as Public Good

  • LEE, Jung Wan
    • The Journal of Asian Finance, Economics and Business
    • /
    • v.8 no.11
    • /
    • pp.87-96
    • /
    • 2021
  • Data-driven markets depend on access to data as a resource for products and services. Since the quality of information that can be drawn from data increases with the available amount and quality of the data, businesses involved in the data economy have a great interest in accessing data from other market players and sharing data with other stakeholders. Despite the growing need for access to data and evidence of the economic and social benefits, data access and sharing remains below its potential. Individuals, businesses, and governments often face barriers to data access, which may be compounded by the reluctance to share, including within and across sectors. To address these challenges, this paper focuses on finding possible solutions for a better data-sharing economy. This paper 1) Discusses opportunities and challenges of open data and the data-sharing economy, limitations of private sector data, and issues with open government data. 2) Introduces open government data initiatives and open governance networks initiatives. 3) Suggests possible solutions, including the governance and management, the legal and policy frameworks, and the technical standards for open data with proposing an open data governance model for the data-sharing economy.

A Study on the Data-Based Organizational Capabilities by Convergence Capabilities Level of Public Data (공공데이터 융합역량 수준에 따른 데이터 기반 조직 역량의 연구)

  • Jung, Byoungho;Joo, Hyungkun
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.18 no.4
    • /
    • pp.97-110
    • /
    • 2022
  • The purpose of this study is to analyze the level of public data convergence capabilities of administrative organizations and to explore important variables in data-based organizational capabilities. The theoretical background was summarized on public data and use activation, joint use, convergence, administrative organization, and convergence constraints. These contents were explained Public Data Act, the Electronic Government Act, and the Data-Based Administrative Act. The research model was set as the data-based organizational capabilities effect by a data-based administrative capability, public data operation capabilities, and public data operation constraints. It was also set whether there is a capabilities difference data-based on an organizational operation by the level of data convergence capabilities. This study analysis was conducted with hierarchical cluster analysis and multiple regression analysis. As the research result, First, hierarchical cluster analysis was classified into three groups. It was classified into a group that uses only public data and structured data, a group that uses public data on both structured and unstructured data, and a group that uses both public and private data. Second, the critical variables of data-based organizational operation capabilities were found in the data-based administrative planning and administrative technology, the supervisory organizations and technical systems by public data convergence, and the data sharing and market transaction constraints. Finally, the essential independent variables on data-based organizational competencies differ by group. This study contributed. As a theoretical implication, this research is updated on management information systems by explaining the Public Data Act, the Electronic Government Act, and the Data-Based Administrative Act. As a practical implication, the activity reinforcement of public data should be promoting the establishment of data standardization and search convenience and elimination of the lukewarm attitudes and Selfishness behavior for data sharing.

Study on Data Control System Design Method with Complex Data-Algorithm Data Processing (복합적 자료-알고리즘 자료처리 방식을 적용한 자료처리 시스템 설계 방안 연구)

  • Kim, Min Wook;Park, Yeon Gu;Yi, Jonghyuk;Lee, Jeong-Deok
    • Journal of Satellite, Information and Communications
    • /
    • v.10 no.3
    • /
    • pp.11-15
    • /
    • 2015
  • In this study, we present the architecture design of data control system in water hazard information platform with analyzing the complexity of the data processing. Generally, data control systems in data collection and analysis platforms base on the constant data-algorithm data processing meaning that data processing between data and algorithm is fixed. But the number of data processing in data control system is rapidly increasing because of increasing of complexity of system. To hold down the number of data processing, dynamic data-algorithm data processing is able to be applied to data control system. After comparison each data-algorithm data processing method, we suggest design method of the data control system optimizing water hazard information platform.

Capturing Data from Untapped Sources using Apache Spark for Big Data Analytics (빅데이터 분석을 위해 아파치 스파크를 이용한 원시 데이터 소스에서 데이터 추출)

  • Nichie, Aaron;Koo, Heung-Seo
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.65 no.7
    • /
    • pp.1277-1282
    • /
    • 2016
  • The term "Big Data" has been defined to encapsulate a broad spectrum of data sources and data formats. It is often described to be unstructured data due to its properties of variety in data formats. Even though the traditional methods of structuring data in rows and columns have been reinvented into column families, key-value or completely replaced with JSON documents in document-based databases, the fact still remains that data have to be reshaped to conform to certain structure in order to persistently store the data on disc. ETL processes are key in restructuring data. However, ETL processes incur additional processing overhead and also require that data sources are maintained in predefined formats. Consequently, data in certain formats are completely ignored because designing ETL processes to cater for all possible data formats is almost impossible. Potentially, these unconsidered data sources can provide useful insights when incorporated into big data analytics. In this project, using big data solution, Apache Spark, we tapped into other sources of data stored in their raw formats such as various text files, compressed files etc and incorporated the data with persistently stored enterprise data in MongoDB for overall data analytics using MongoDB Aggregation Framework and MapReduce. This significantly differs from the traditional ETL systems in the sense that it is compactible regardless of the data formats at source.

A Data Design for Increasing the Usability of Subway Public Data

  • Min, Meekyung
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.11 no.4
    • /
    • pp.18-25
    • /
    • 2019
  • The public data portal provides various public data created by the government in the form of files and open APIs. In order to increase the usability of public open data, a variety of information should be provided to users and should be convenient to use for users. This requires the structured data design plan of the public data. In this paper, we propose a data design method to improve the usability of the Seoul subway public data. For the study, we first identify some properties of the current subway public data and then classify the data based on these properties. The properties used as classification criteria are stored properties, derived properties, static properties, and dynamic properties. We also analyze the limitations of current data for each property. Based on this analysis, we classify currently used subway public data into code entities, base entities, and history entities and present the improved design of entities according to this classification. In addition, we propose data retrieval functions to increase the utilization of the data. If the data is designed according to the proposed design of this paper, it will be possible to solve the problem of duplication and inconsistency of the data currently used and to implement more structural data. As a result, it can provide more functions for users, which is the basis for increasing usability of subway public data.

A Comparative Study of Big Data, Open Data, and My Data (빅데이터, 오픈데이터, 마이데이터의 비교 연구)

  • Park, Jooseok
    • The Journal of Bigdata
    • /
    • v.3 no.1
    • /
    • pp.41-46
    • /
    • 2018
  • With the advent of the fourth industrial revolution, data becomes very important resource. Now is called as 'Data Revolution Age.' It is said that Data Revolution Age started with Big Data, then accelerated with Open Data, finally completed with My Data. In this paper, we compared Big Data, Open Data, and suggested roles and effects of My Data as a digital resource.

Modeling and Implementation of Public Open Data in NoSQL Database

  • Min, Meekyung
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.10 no.3
    • /
    • pp.51-58
    • /
    • 2018
  • In order to utilize various data provided by Korea public open data portal, data should be systematically managed using a database. Since the range of open data is enormous, and the amount of data continues to increase, it is preferable to use a database capable of processing big data in order to analyze and utilize the data. This paper proposes data modeling and implementation method suitable for public data. The target data is subway related data provided by the public open data portal. Schema of the public data related to Seoul metro stations are analyzed and problems of the schema are presented. To solve these problems, this paper proposes a method to normalize and structure the subway data and model it in NoSQL database. In addition, the implementation result is shown by using MongDB which is a document-based database capable of processing big data.

Design of Client-Server Model For Effective Processing and Utilization of Bigdata (빅데이터의 효과적인 처리 및 활용을 위한 클라이언트-서버 모델 설계)

  • Park, Dae Seo;Kim, Hwa Jong
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.4
    • /
    • pp.109-122
    • /
    • 2016
  • Recently, big data analysis has developed into a field of interest to individuals and non-experts as well as companies and professionals. Accordingly, it is utilized for marketing and social problem solving by analyzing the data currently opened or collected directly. In Korea, various companies and individuals are challenging big data analysis, but it is difficult from the initial stage of analysis due to limitation of big data disclosure and collection difficulties. Nowadays, the system improvement for big data activation and big data disclosure services are variously carried out in Korea and abroad, and services for opening public data such as domestic government 3.0 (data.go.kr) are mainly implemented. In addition to the efforts made by the government, services that share data held by corporations or individuals are running, but it is difficult to find useful data because of the lack of shared data. In addition, big data traffic problems can occur because it is necessary to download and examine the entire data in order to grasp the attributes and simple information about the shared data. Therefore, We need for a new system for big data processing and utilization. First, big data pre-analysis technology is needed as a way to solve big data sharing problem. Pre-analysis is a concept proposed in this paper in order to solve the problem of sharing big data, and it means to provide users with the results generated by pre-analyzing the data in advance. Through preliminary analysis, it is possible to improve the usability of big data by providing information that can grasp the properties and characteristics of big data when the data user searches for big data. In addition, by sharing the summary data or sample data generated through the pre-analysis, it is possible to solve the security problem that may occur when the original data is disclosed, thereby enabling the big data sharing between the data provider and the data user. Second, it is necessary to quickly generate appropriate preprocessing results according to the level of disclosure or network status of raw data and to provide the results to users through big data distribution processing using spark. Third, in order to solve the problem of big traffic, the system monitors the traffic of the network in real time. When preprocessing the data requested by the user, preprocessing to a size available in the current network and transmitting it to the user is required so that no big traffic occurs. In this paper, we present various data sizes according to the level of disclosure through pre - analysis. This method is expected to show a low traffic volume when compared with the conventional method of sharing only raw data in a large number of systems. In this paper, we describe how to solve problems that occur when big data is released and used, and to help facilitate sharing and analysis. The client-server model uses SPARK for fast analysis and processing of user requests. Server Agent and a Client Agent, each of which is deployed on the Server and Client side. The Server Agent is a necessary agent for the data provider and performs preliminary analysis of big data to generate Data Descriptor with information of Sample Data, Summary Data, and Raw Data. In addition, it performs fast and efficient big data preprocessing through big data distribution processing and continuously monitors network traffic. The Client Agent is an agent placed on the data user side. It can search the big data through the Data Descriptor which is the result of the pre-analysis and can quickly search the data. The desired data can be requested from the server to download the big data according to the level of disclosure. It separates the Server Agent and the client agent when the data provider publishes the data for data to be used by the user. In particular, we focus on the Big Data Sharing, Distributed Big Data Processing, Big Traffic problem, and construct the detailed module of the client - server model and present the design method of each module. The system designed on the basis of the proposed model, the user who acquires the data analyzes the data in the desired direction or preprocesses the new data. By analyzing the newly processed data through the server agent, the data user changes its role as the data provider. The data provider can also obtain useful statistical information from the Data Descriptor of the data it discloses and become a data user to perform new analysis using the sample data. In this way, raw data is processed and processed big data is utilized by the user, thereby forming a natural shared environment. The role of data provider and data user is not distinguished, and provides an ideal shared service that enables everyone to be a provider and a user. The client-server model solves the problem of sharing big data and provides a free sharing environment to securely big data disclosure and provides an ideal shared service to easily find big data.

Strategy for Establishing a Rights Processing Platform to Enhance the Utilization of Open Data (공공데이터 활용성 제고를 위한 권리처리 플랫폼 구축 전략)

  • Sim, Junbo;Kwon, Hun-yeong
    • Journal of Information Technology Services
    • /
    • v.21 no.3
    • /
    • pp.27-42
    • /
    • 2022
  • Open Data is an essential resource for the data industry. 'Act On Promotion Of The Provision And Use Of Public Data', enacted on July 30, 2013, mandates public institutions to manage the quality of Open Data and provide it to the public. Via such a legislation, the legal basis for the public to Open Data is prepared. Furthermore, public institutions are prohibited from developing and providing open data services that are duplicated or similar to those of the private sector, and private start-ups using open data are supported. However, as the demand for Open Data gradually increases, the cases of refusal to provide or interruption of Open Data held by public institutions are also increasing. Accordingly, the 'Open Data Mediation Committee' is established and operated so that the right to use data can be rescued through a simple dispute mediation procedure rather than complicated administrative litigation. The main issues dealt with in dispute settlement so far are usually the rights of third parties, such as open data including personal information, private information such as trade secrets, and copyrights. Plus, non-open data cannot be provided without the consent of the information subject. Rather than processing non-open data into open data through de-identification processing, positive results can be expected if consent is provided through active rights processing of the personal information subject. Not only can the Public Mydata Service be used by the information subject, but Open Data applicants will also be able to secure higher quality Open Data, which will have a positive impact on fostering the private data industry. This study derives a plan to establish a rights processing platform to enhance the usability of Open Data, including private information such as personal information, trade secrets, and copyright, which have become an issue when providing Open Data since 2014. With that, the proposals in this study are expected to serve as a stepping stone to revitalize private start-ups through the use of wide Open Data and improve public convenience through Public MyData services of information subjects.