• Title/Summary/Keyword: Public Open Datasets

Search Result 14, Processing Time 0.024 seconds

Development of a Method for Analyzing and Visualizing Concept Hierarchies based on Relational Attributes and its Application on Public Open Datasets

  • Hwang, Suk-Hyung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.9
    • /
    • pp.13-25
    • /
    • 2021
  • In the age of digital innovation based on the Internet, Information and Communication and Artificial Intelligence technologies, huge amounts of datasets are being generated, collected, accumulated, and opened on the web by various public institutions providing useful and public information. In order to analyse, gain useful insights and information from data, Formal Concept Analysis(FCA) has been successfully used for analyzing, classifying, clustering and visualizing data based on the binary relation between objects and attributes in the dataset. In this paper, we present an approach for enhancing the analysis of relational attributes of data within the extended framework of FCA, which is designed to classify, conceptualize and visualize sets of objects described not only by attributes but also by relations between these objects. By using the proposed tool, RCA wizard, several experiments carried out on some public open datasets demonstrate the validity and usability of our approach on generating and visualizing conceptual hierarchies for extracting more useful knowledge from datasets. The proposed approach can be used as an useful tool for effective data analysis, classifying, clustering, visualization and exploration.

Opening the Nation: Leveraging Open Data to Create New Business and Provide Services

  • Cruz, Ruth Angelie B.;Lee, Hong Joo
    • Knowledge Management Research
    • /
    • v.16 no.4
    • /
    • pp.157-168
    • /
    • 2015
  • Opening government data has been one of the main goals of nations building their e-government structures. Nonetheless, more than publishing government data for public viewing, the bigger concern right now is promoting the use change to "and proving the usefulness of available public data". In order to do this, governments must be able to, not only publicize data but more so, publish the kind of data usable to infomediaries and developers in order to create new products and services for citizens. This research investigates 30 open data use cases of South Korea as listed in Data.go.kr. This study aims to contribute to a better understanding of open datasets utilization in a technologically-advanced and well-developed nation and hopefully provide some useful insights on how open data is currently being used, how it is opening up new business, and more importantly, how it is contributing to the civic society by providing services to the public.

A Public Open Civil Complaint Data Analysis Model to Improve Spatial Welfare for Residents - A Case Study of Community Welfare Analysis in Gangdong District - (거주민 공간복지 향상을 위한 공공 개방 민원 데이터 분석 모델 - 강동구 공간복지 분석 사례를 중심으로 -)

  • Shin, Dongyoun
    • Journal of KIBIM
    • /
    • v.13 no.3
    • /
    • pp.39-47
    • /
    • 2023
  • This study aims to introduce a model for enhancing community well-being through the utilization of public open data. To objectively assess abstract notions of residential satisfaction, text data from complaints is analyzed. By leveraging accessible public data, costs related to data collection are minimized. Initially, relevant text data containing civic complaints is collected and refined by removing extraneous information. This processed data is then combined with meaningful datasets and subjected to topic modeling, a text mining technique. The insights derived are visualized using Geographic Information System (GIS) and Application Programming Interface (API) data. The efficacy of this analytical model was demonstrated in the Godeok/Gangil area. The proposed methodology allows for comprehensive analysis across time, space, and categories. This flexible approach involves incorporating specific public open data as needed, all within the overarching framework.

Knowledge Graph of Administrative Codes in Korea: The Case for Improving Data Quality and Interlinking of Public Data

  • Haklae Kim
    • Journal of Information Science Theory and Practice
    • /
    • v.11 no.3
    • /
    • pp.43-57
    • /
    • 2023
  • Government codes are created and utilized to streamline and standardize government administrative procedures. They are generally employed in government information systems. Because they are included in open datasets of public data, users must be able to understand them. However, information that can be used to comprehend administrative code is lost during the process of releasing data in the government system, making it difficult for data consumers to grasp the code and limiting the connection or convergence of different datasets that use the same code.This study proposes a way to employ the administrative code produced by the Korean government as a standard in a public data environment on a regular basis. Because consumers of public data are barred from accessing government systems, a means of universal access to administrative code is required. An ontology model is used to represent the administrative code's data structure and meaning, and the full administrative code is built as a knowledge graph. The knowledge graph thus created is used to assess the accuracy and connection of administrative codes in public data. The method proposed in this study has the potential to increase the quality of coded information in public data as well as data connectivity.

Knowledge Mining from Many-valued Triadic Dataset based on Concept Hierarchy (개념계층구조를 기반으로 하는 다치 삼원 데이터집합의 지식 추출)

  • Suk-Hyung Hwang;Young-Ae Jung;Se-Woong Hwang
    • Journal of Platform Technology
    • /
    • v.12 no.3
    • /
    • pp.3-15
    • /
    • 2024
  • Knowledge mining is a research field that applies various techniques such as data modeling, information extraction, analysis, visualization, and result interpretation to find valuable knowledge from diverse large datasets. It plays a crucial role in transforming raw data into useful knowledge across various domains like business, healthcare, and scientific research etc. In this paper, we propose analytical techniques for performing knowledge discovery and data mining from various data by extending the Formal Concept Analysis method. It defines algorithms for representing diverse formats and structures of the data to be analyzed, including models such as many-valued data table data and triadic data table, as well as algorithms for data processing (dyadic scaling and flattening) and the construction of concept hierarchies and the extraction of association rules. The usefulness of the proposed technique is empirically demonstrated by conducting experiments applying the proposed method to public open data.

  • PDF

Assessment of Needs and Accessibility Towards Health Insurance Claims Data (연구를 위한 건강보험 청구자료 요구 및 이용 요인분석)

  • Lee, Jung-A;Oh, Ju-Hwan;Moon, Sang-Jun;Lim, Jun-Tae;Lee, Jin-Seok;Lee, Jin-Yong;Kim, Yoon
    • Health Policy and Management
    • /
    • v.21 no.1
    • /
    • pp.77-92
    • /
    • 2011
  • Objectives : This study examined the health policy researchers' needs and their accessibility towards health insurance claim datasets according to their academic capacity. Methods : An online questionnaire to capture relevant proxy variables for academic needs, accessibility, and research capacity was constructed based on previous studies. The survey was delivered to active health policy researchers through three major scholarly associations in South Korea. Seven-hundred and one scholars responded while the survey as open for 12 days (starting on December 20th, 2010). Descriptive statistics and logistic regression analysis were carried out. Results : Regardless of the definition for operational needs, the prevalent needs of survey respondents were not met with the current provision of claim data. Greater research capacity was shown to be correlated with increased demand for claim data along with a positive correlation between attempts to obtain claim datasets and research capacity. A greater research capacity, however, was not necessarily correlated with better accessibility to the claim data. Conclusions : The substantial unmet need for claim data among the healthcare policy research community calls for establishing proactive institutions which could systematically prepare and make available public datasets and provide call-in services to facilitate proper handling of data.

A Study on the Service of the Integrated Administrative Information Dataset Management System (행정정보 데이터세트 종합관리시스템의 서비스 방안 연구)

  • Kim, Ji-Hye;Yoon, Sung-Ho;Yang, Dongmin
    • Journal of Korean Society of Archives and Records Management
    • /
    • v.22 no.2
    • /
    • pp.27-49
    • /
    • 2022
  • According to the amendment of the Enforcement Decree of the Public Records Management Act in 2020, an administrative information dataset record management plan will be enacted, and the National Archives of Korea plans to establish an integrated administrative information dataset management system to support it. However, there is no specific service plan that considers the characteristics of the datasets and the Management Reference Table. Therefore, this paper compared and analyzed the current status of dataset services at 14 domestic and foreign public data portals and archives websites, derived implications, and proposed 6 service plans applicable to the integrated administrative information dataset management system. This study's results will lead to utilizing the administrative datasets and the activation of services.

FAIR Principle-Based Metadata Assessment Framework (FAIR 원칙 기반 메타데이터 평가 프레임워크)

  • Park, Jin Hyo;Kim, Sung-Hee;Youn, Joosang
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.11 no.12
    • /
    • pp.461-468
    • /
    • 2022
  • Development of the big data industry, the cases of providing data utilization services on digital platforms are increasing. In this regard, research in data-related fields is being conducted to apply the FAIR principle that can be applied to the assessment of (meta)data quality, service, and function to data quality evaluation. Especially, the European Open Data Portal applies an assessment model based on FAIR principles. Based on this, a data maturity assessment is conducted and the results are disclosed in reports every year. However, public data portals do not conduct data maturity evaluations based on metadata. In this paper, we propose and evaluate a new model for data maturity evaluation on a big data platform built for multiple domestic public data portals and data transactions, FAIR principles used for data maturity evaluation in Europe's open data portals. The proposed maturity evaluation model is a model that evaluates the quality of public data portal datasets.

Access to and Utilization of the Open Source Data-related to Adolescent Health (청소년 건강관련 공개자료 접근 및 활용에 관한 고찰)

  • Lee, Jae-Eun;Sung, Jung-Hye;Lee, Won-Jae;Moon, In-Ok
    • The Journal of Korean Society for School & Community Health Education
    • /
    • v.11 no.1
    • /
    • pp.67-78
    • /
    • 2010
  • Background & Objectives: Current trend is that funding agencies require investigators to share their data with others. However, there is limited guidance how to access and utilize the shared data. We sought to determine what common data sharing practices in U.S.A. are, what data-related to adolescent health are freely available, and how we deal with the large dataset adopting the complex study design. Methods: The study included only research data-related to adolescent health which was collected in USA and unlimitedly accessible through the internet. Only the raw data, not aggregated, was considered for the study. Major keywords for web search were "adolescent", "children", "health", and "school". Results: Current approaches for public health data sharing lacked of common standards and varied largely due to the data's complex nature, large size, local expertise and internal procedures. Some common data sharing practices are unlimited access, formal screened access, restricted access, and informal exclusive access. The Inter-University Consortium for Political and Social Research and the Center for Disease Control and Prevention were the best data depository. "Data on the net" was search engine for the website providing data freely available. Six datasets related to adolescent health freely available were identified. The importance and methods of incorporating complex research design into analysis was discussed. Conclusion: There have been various attempts to standardize process for open access and open data using the information technology concept. However, it may not be easy for researchers to adapt themselves to this high technology. Therefore, guidance provided by this study may help researchers enhance the accessibility to and the utilization of the open source data.

  • PDF

Digital Epidemiology: Use of Digital Data Collected for Non-epidemiological Purposes in Epidemiological Studies

  • Park, Hyeoun-Ae;Jung, Hyesil;On, Jeongah;Park, Seul Ki;Kang, Hannah
    • Healthcare Informatics Research
    • /
    • v.24 no.4
    • /
    • pp.253-262
    • /
    • 2018
  • Objectives: We reviewed digital epidemiological studies to characterize how researchers are using digital data by topic domain, study purpose, data source, and analytic method. Methods: We reviewed research articles published within the last decade that used digital data to answer epidemiological research questions. Data were abstracted from these articles using a data collection tool that we developed. Finally, we summarized the characteristics of the digital epidemiological studies. Results: We identified six main topic domains: infectious diseases (58.7%), non-communicable diseases (29.4%), mental health and substance use (8.3%), general population behavior (4.6%), environmental, dietary, and lifestyle (4.6%), and vital status (0.9%). We identified four categories for the study purpose: description (22.9%), exploration (34.9%), explanation (27.5%), and prediction and control (14.7%). We identified eight categories for the data sources: web search query (52.3%), social media posts (31.2%), web portal posts (11.9%), webpage access logs (7.3%), images (7.3%), mobile phone network data (1.8%), global positioning system data (1.8%), and others (2.8%). Of these, 50.5% used correlation analyses, 41.3% regression analyses, 25.6% machine learning, and 19.3% descriptive analyses. Conclusions: Digital data collected for non-epidemiological purposes are being used to study health phenomena in a variety of topic domains. Digital epidemiology requires access to large datasets and advanced analytics. Ensuring open access is clearly at odds with the desire to have as little personal data as possible in these large datasets to protect privacy. Establishment of data cooperatives with restricted access may be a solution to this dilemma.