• Title/Summary/Keyword: 데이터셋 목록

Search Result 9, Processing Time 0.025 seconds

Designing Dataset Management and Service System for Digital Libraries Using DCAT (DCAT을 활용한 디지털도서관 데이터셋 관리와 서비스 설계)

  • Park, Jin Ho
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.53 no.2
    • /
    • pp.247-266
    • /
    • 2019
  • The purpose of this study is to propose a W3C standard, DCAT, to manage and service dataset that is becoming increasingly important as new knowledge information resources. To do this, we first analyzed the class and properties of the four core classes of DCAT. In addition, I modeled and presented a system that can manage and service various data sets based on DCAT in digital library. The system is divided into source data, data set management, linked data connection, and user service. Especially, the DCAT mapping function is suggested in dataset management. This feature can ensure interoperability of various datasets.

The digital transformation of mask dance movement in intangible cultural asset based on human pose recognition (휴먼포즈 인식을 적용한 무형문화재 탈춤 동작 디지털전환)

  • SooHyuong Kang;SungGeon Park;KwangYoung Park
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.11a
    • /
    • pp.678-680
    • /
    • 2023
  • 본 연구는 2022년 유네스코 인류무형유산 대표목록에 등재된 탈춤 동작을 디지털화하여 후속 세대에게 정보를 제공하는 것을 목적으로 한다. 데이터 수집은 국가무형문화제로 지정된 탈춤 단체 13개, 시도무형문화재 단체 5개에 소속된 무형문화재, 전승자 39명이 관성식 모션 캡처 장비를 착용하고, 8대의 카메라를 이용하여 수집하였다. 데이터 가공은 바운딩박스를 수행하였고, 탈춤동작 추정은 YOLO v8을 사용하였고 탈춤 동작 분류는 YOLO v8에 CNN모델을 결합하여 130개의 탈춤을 분류하였다. 연구결과, mAP-50은 0.953, mAP50-95는 0.596, Accuracy 70%를 달성하였다. 향후 학습용 데이터셋 구축량이 늘어나고, 데이터 품질이 개선된다면 탈춤 분류 성능은 더욱 개선될 것이라 기대한다.

Activity Type Detection Of Random Forest Model Using UWB Radar And Indoor Environmental Measurement Sensor (UWB 레이더와 실내 환경 측정 센서를 이용한 랜덤 포레스트 모델의 재실활동 유형 감지)

  • Park, Jin Su;Jeong, Ji Seong;Yang, Chul Seung;Lee, Jeong Gi
    • The Journal of the Convergence on Culture Technology
    • /
    • v.8 no.6
    • /
    • pp.899-904
    • /
    • 2022
  • As the world becomes an aging society due to a decrease in the birth rate and an increase in life expectancy, a system for health management of the elderly population is needed. Among them, various studies on occupancy and activity types are being conducted for smart home care services for indoor health management. In this paper, we propose a random forest model that classifies activity type as well as occupancy status through indoor temperature and humidity, CO2, fine dust values and UWB radar positioning for smart home care service. The experiment measures indoor environment and occupant positioning data at 2-second intervals using three sensors that measure indoor temperature and humidity, CO2, and fine dust and two UWB radars. The measured data is divided into 80% training set data and 20% test set data after correcting outliers and missing values, and the random forest model is applied to evaluate the list of important variables, accuracy, sensitivity, and specificity.

Analysis and Recognition of Depressive Emotion through NLP and Machine Learning (자연어처리와 기계학습을 통한 우울 감정 분석과 인식)

  • Kim, Kyuri;Moon, Jihyun;Oh, Uran
    • The Journal of the Convergence on Culture Technology
    • /
    • v.6 no.2
    • /
    • pp.449-454
    • /
    • 2020
  • This paper proposes a machine learning-based emotion analysis system that detects a user's depression through their SNS posts. We first made a list of keywords related to depression in Korean, then used these to create a training data by crawling Twitter data - 1,297 positive and 1,032 negative tweets in total. Lastly, to identify the best machine learning model for text-based depression detection purposes, we compared RNN, LSTM, and GRU in terms of performance. Our experiment results verified that the GRU model had the accuracy of 92.2%, which is 2~4% higher than other models. We expect that the finding of this paper can be used to prevent depression by analyzing the users' SNS posts.

Evaluation of English Term Extraction based on Inner/Outer Term Statistics

  • Kang, In-Su
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.4
    • /
    • pp.141-148
    • /
    • 2020
  • Automatic term extraction is to recognize domain-specific terms given a collection of domain-specific text. Previous term extraction methods operate effectively in unsupervised manners which include extracting candidate terms, and assigning importance scores to candidate terms. Regarding the calculation of term importance scores, the study focuses on utilizing sets of inner and outer terms of a candidate term. For a candidate term, its inner terms are shorter terms which belong to the candidate term as components, and its outer terms are longer terms which include the candidate term as their component. This work presents various functions that compute, for a candidate term, term strength from either set of its inner or outer terms. In addition, a scoring method of a term importance is devised based on C-value score and the term strength values obtained from the sets of inner and outer terms. Experimental evaluations using GENIA and ACL RD-TEC 2.0 datasets compare and analyze the effectiveness of the proposed term extraction methods for English. The proposed method performed better than the baseline method by up to 1% and 3% respectively for GENIA and ACL datasets.

A Study on Design of Metadata Management Demonstration System for damage prediction from storm and flood (풍수해 피해예측지도 메타데이터 관리 시범 시스템 설계에 대한 연구)

  • Lim, So Mang;Baeck, Seung Hyub;Hwang, Eui Ho
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2017.05a
    • /
    • pp.472-472
    • /
    • 2017
  • 재해로 인한 피해가 급증함에 따라 이를 예방하기 위한 풍수해 피해예측의 필요성이 증가하였고 관련된 다양한 연구가 진행되고 있다. 타 부처 및 각 지자체에서는 각종 재해지도들을 작성하여 만들어진 재해지도는 작성 유형과 방법 등에 따라 다양한 데이터와 서로 다른 정보를 포함하고 있어 데이터 정보를 표준화 시키고 필요한 정보를 효율적으로 찾아 연계 활용하기 위하여 본 연구를 수행하고자 한다. 메타데이터란 데이터에 대한 정보를 의미하며 데이터 변화의 근원과 변화의 흐름을 말한다. 메타데이터 관련 표준으로는 ISO19115(국제표준), KSXISO19115(국가표준), TTAS.KO-10.0139(유통목록 표준), TTAS.IS-19115(관리용 표준)이 있다. 본 연구에서는 국제표준을 준용하여 풍수해 피해 예측지도의 체계적 관리를 위한 메타데이터 설계 및 관리 시스템 구축 방안을 제시하고자 하였다. 풍수해 피해예측지도 메타데이터 관리 시범 시스템 구축을 위한 표준, 정보의 특성, 사용자 수준 등을 고려하여 설계 기본방향 설정하였으며, 풍수해 피해예측지도 정보 메타데이터 표준안 수립에 반영하였다. 그 결과, 메타데이터 패키지는 총 9개의 섹션(클래스)으로 구성하여 정의하였고 하위개체를 설정 및 연계하여 메타데이터 개체셋 정보를 구성하였다. 풍수해 피해예측지도 메타데이터 관리 시범 시스템 설계 제시를 위해 DB항목 조사 및 도출, 데이터 연계 활용 모델 구축, 프로토타입 개발순으로 연구를 수행하였다. 또한 표출 대상 데이터 항목별 분류, 방재활용 단계, 지역구분 등을 주제로 데이터 Mapping 자료를 작성하였고, 설계 기본 방향에 의하여 설정된 기준으로 데이터 항목별 메타데이터 DB를 작성하여 풍수해 피해예측지도 메타데이터 관리 시범 시스템을 설계하였다. 본 연구 결과는 추후 풍수해 피해예측지도 표준 데이터 및 풍수해 피해예측지도 표준 데이터 모델 구축에 활용 가능하며 표준화 연계활용을 위한 연구에 기여할 것으로 판단된다.

  • PDF

Prediction of Agricultural Purchases Using Structured and Unstructured Data: Focusing on Paprika (정형 및 비정형 데이터를 이용한 농산물 구매량 예측: 파프리카를 중심으로)

  • Somakhamixay Oui;Kyung-Hee Lee;HyungChul Rah;Eun-Seon Choi;Wan-Sup Cho
    • The Journal of Bigdata
    • /
    • v.6 no.2
    • /
    • pp.169-179
    • /
    • 2021
  • Consumers' food consumption behavior is likely to be affected not only by structured data such as consumer panel data but also by unstructured data such as mass media and social media. In this study, a deep learning-based consumption prediction model is generated and verified for the fusion data set linking structured data and unstructured data related to food consumption. The results of the study showed that model accuracy was improved when combining structured data and unstructured data. In addition, unstructured data were found to improve model predictability. As a result of using the SHAP technique to identify the importance of variables, it was found that variables related to blog and video data were on the top list and had a positive correlation with the amount of paprika purchased. In addition, according to the experimental results, it was confirmed that the machine learning model showed higher accuracy than the deep learning model and could be an efficient alternative to the existing time series analysis modeling.

A Study on the Design of Metadata Elements in Textbooks (교과서 메타데이터 요소 설계에 관한 연구)

  • Euikyung Oh
    • The Journal of the Convergence on Culture Technology
    • /
    • v.9 no.4
    • /
    • pp.401-408
    • /
    • 2023
  • The purpose of this study is to design textbook metadata as a basic task for building a textbook database. To this end, reading textbooks were defined as a category of textbooks, and a metadata development methodology was established through previous research. In order to ensure that bibliographically essential elements are not omitted, the catalog description elements of institutions that collect, accumulate, and service textbooks such as the National Library of Korea were investigated. The elements of Dublin Core, MODS, and KEM were mapped to derive elements suitable for describing textbooks. Finally, a set of textbook metadata elements consisting of 14 elements in three categories - bibliography, context, and textbook characteristics were presented by adding publication type, genre, and curriculum period elements. The 14 elements are titles, authors, publications, formats, identification sign, languages, locations, subject names, annotation, genres, table of contents, subjects, curriculum period, and curriculum information. In this study, we contributed to this field by discussing how to organize textbook resources with national knowledge resources, and in future studies, we proposed to evaluate usability by applying metadata elements to actual textbooks and revise and supplement them according to the evaluation results.

A Method for Same Author Name Disambiguation in Domestic Academic Papers (국내 학술논문의 동명이인 저자명 식별을 위한 방법)

  • Shin, Daye;Yang, Kiduk
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.28 no.4
    • /
    • pp.301-319
    • /
    • 2017
  • The task of author name disambiguation involves identifying an author with different names or different authors with the same name. The author name disambiguation is important for correctly assessing authors' research achievements and finding experts in given areas as well as for the effective operation of scholarly information services such as citation indexes. In the study, we performed error correction and normalization of data and applied rules-based author name disambiguation to compare with baseline machine learning disambiguation in order to see if human intervention could improve the machine learning performance. The improvement of over 0.1 in F-measure by the corrected and normalized email-based author name disambiguation over machine learning demonstrates the potential of human pattern identification and inference, which enabled data correction and normalization process as well as the formation of the rule-based diambiguation, to complement the machine learning's weaknesses to improve the author name disambiguation results.