• Title/Summary/Keyword: 데이터셋 검색

Search Result 93, Processing Time 0.024 seconds

Performance Comparison and Analysis of Embedding methods based on Clustering Algorithms (클러스터링 알고리즘 기반의 임베딩 기법 성능 비교 및 분석)

  • Park, Jungmin;Park, Heemin;Yang, Seona;Sun, Yuxiang;Lee, Yongju
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • fall
    • /
    • pp.164-167
    • /
    • 2021
  • 최근 구글, 아마존, LOD 등을 중심으로 지식 그래프(Knowledge graph)와 같은 검색 고도화 연구가 활발히 수행되고 있다.그러나 대규모 지식 그래프 인덱싱 시스템에서 데이터가 어떻게 임베딩(embedding)되고, 딥러닝(deep learning) 되는지는 상대적으로 거의 연구가 되지 않고 있다. 이에 본 논문에서는 임베딩 모델에 대한 성능평가를 통해 데이터셋에 대해 어떤 모델이 가장 좋은 지식 임베딩 방법을 도출하는지 분석한다.

  • PDF

Study on Knowledge Augmented Prompting for Text to SPARQL (Text to SPARQL을 위한 지식 증강 프롬프팅 연구)

  • Yeonjin Lee;Jeongjae Nam;Wooyoung Kim;Wooju Kim
    • Annual Conference on Human and Language Technology
    • /
    • 2023.10a
    • /
    • pp.185-189
    • /
    • 2023
  • Text to SPARQL은 지식 그래프 기반 질의응답의 한 형태로 자연어 질문을 지식 그래프 검색 쿼리로 변환하는 태스크이다. SPARQL 쿼리는 지식 그래프의 정보를 기반으로 작성되어야 하기 때문에 기존 언어 모델을 통한 코드 생성방법으로는 잘 동작하지 않는다. 이에 우리는 거대 언어 모델을 활용하여 Text to SPARQL를 해결하기 위해 프롬프트에 지식 그래프의 정보를 증강시켜주는 방법론을 제안한다. 이에 더하여 다국어 정보 활용에 대한 영향을 검증하기 위해 한국어, 영어 각각의 레이블을 교차적으로 실험하였다. 추가로 한국어 Text to SPARQL 실험을 위하여 대표적인 Text to SPARQL 벤치마크 데이터셋 QALD-10을 한국어로 번역하여 공개하였다. 위 데이터를 이용해 지식 증강 프롬프팅의 효과를 실험적으로 입증하였다.

  • PDF

Tag Value Measurement Algorithm for Personalized Recommendation (개인화 추천을 위한 태그 가치 측정 알고리즘)

  • Jeong, Kwang-Jae;Park, Gun-Woo;Lee, Sang-Hoon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2010.04a
    • /
    • pp.1078-1081
    • /
    • 2010
  • 웹 2.0의 영향으로 인터넷 상에 범람하는 컨텐츠를 이용함에 있어 태깅 시스템은 매우 유연하고 효과적인 분류를 가능케 한다. 대부분의 웹 2.0 사이트에서는 검색된 정보에 해당하는 태그와 연관성이 있는 태그를 나타냄으로써 또 다른 관련 컨텐츠를 이용할 수 있는 서비스를 제공한다. 컨텐츠 사용자에 의해 생성되는 태그는 개인 성향에 따라 동일 컨텐츠에 다양하게 적용될 수 있으며 이로 인해 태그를 이용한 검색은 낮은 정확도를 나타낼 수 있다. 본 논문에서는 태그 선택에 있어 인간 상호작용의 특성을 파악하여 개인이 선호하고, 필요로 하는 컨텐츠에 대한 태그를 추천할 수 있는 태그 가치 측정 알고리즘을 제안한다. 컨텐츠 선택에 있어 의사결정에 영향을 미치는 요인을 식별하고 선호영화 추천 서비스인 MovieLens 사이트의 데이터 셋을 적용하여 태그 추천의 예측 정확도를 비교 평가함으로써 향상된 태그 가치 산정 결과를 제시한다.

Interoperability between NoSQL and RDBMS via Auto-mapping Scheme in Distributed Parallel Processing Environment (분산병렬처리 환경에서 오토매핑 기법을 통한 NoSQL과 RDBMS와의 연동)

  • Kim, Hee Sung;Lee, Bong Hwan
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.21 no.11
    • /
    • pp.2067-2075
    • /
    • 2017
  • Lately big data processing is considered as an emerging issue. As a huge amount of data is generated, data processing capability is getting important. In processing big data, both Hadoop distributed file system and unstructured date processing-based NoSQL data store are getting a lot of attention. However, there still exists problems and inconvenience to use NoSQL. In case of low volume data, MapReduce of NoSQL normally consumes unnecessary processing time and requires relatively much more data retrieval time than RDBMS. In order to address the NoSQL problem, in this paper, an interworking scheme between NoSQL and the conventional RDBMS is proposed. The developed auto-mapping scheme enables to choose an appropriate database (NoSQL or RDBMS) depending on the amount of data, which results in fast search time. The experimental results for a specific data set shows that the database interworking scheme reduces data searching time by 35% at the maximum.

Implementation of Ontology-based Service by Exploiting Massive Crime Investigation Records: Focusing on Intrusion Theft (대규모 범죄 수사기록을 활용한 온톨로지 기반 서비스 구현 - 침입 절도 범죄 분야를 중심으로 -)

  • Ko, Gun-Woo;Kim, Seon-Wu;Park, Sung-Jin;No, Yoon-Joo;Choi, Sung-Pil
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.53 no.1
    • /
    • pp.57-81
    • /
    • 2019
  • An ontology is a complex structure dictionary that defines the relationship between terms and terms related to specific knowledge in a particular field. There have been attempts to construct various ontologies in Korea and abroad, but there has not been a case in which a large scale crime investigation record is constructed as an ontology and a service is implemented through the ontology. Therefore, this paper describes the process of constructing an ontology based on information extracted from instrusion theft field of unstructured data, a crime investigation document, and implementing an ontology-based search service and a crime spot recommendation service. In order to understand the performance of the search service, we have tested Top-K accuracy measurement, which is one of the accuracy measurement methods for event search, and obtained a maximum accuracy of 93.52% for the experimental data set. In addition, we have obtained a suitable clue field combination for the entire experimental data set, and we can calibrate the field location information in the database with the performance of F1-measure 76.19% Respectively.

Implementation of Responsive Web Application for Location-based Semantic Search (위치기반 시맨틱 검색을 위한 반응형 웹 애플리케이션 구현)

  • Lee, Suhyoung;Lee, Yongju
    • The Journal of Korean Institute of Information Technology
    • /
    • v.17 no.5
    • /
    • pp.1-12
    • /
    • 2019
  • Unlike existing Open APIs, Linked Data are made as a huge intelligent base to perform high-level SPARQL queries, and it is possible to create efficiently a new content by mashuping different information from various datasets. This paper implements a responsive web application for location-based semantic search. We mashup DBpedia, a kind of Linked Data, and GoogleMap API provided by Google, and provide a semantic browser function to confirm detail information regarding retrieved objects. Our system can be used in various access environments such as PC and mobile by applying responsive web design idea. The system implemented in this paper compares functional specifications with existing systems with similar functions. The comparison results show the superiority of our system in various aspects such as using semantic, linked-based browser, and mashup function.

A music similarity function based on probabilistic linear discriminant analysis for cover song identification (커버곡 검색을 위한 확률적 선형 판별 분석 기반 음악 유사도)

  • Jin Soo, Seo;Junghyun, Kim;Hyemi, Kim
    • The Journal of the Acoustical Society of Korea
    • /
    • v.41 no.6
    • /
    • pp.662-667
    • /
    • 2022
  • Computing music similarity is an indispensable component in developing music search service. This paper focuses on learning a music similarity function in order to boost cover song identification performance. By using the probabilistic linear discriminant analysis, we construct a latent music space where the distances between cover song pairs reduces while the distances between the non-cover song pairs increases. We derive a music similarity function by testing hypothesis, whether two songs share the same latent variable or not, using the probabilistic models with the assumption that observed music features are generated from the learned latent music space. Experimental results performed on two cover music datasets show that the proposed music similarity improves the cover song identification performance.

Audio fingerprint matching based on a power weight (파워 가중치를 이용한 오디오 핑거프린트 정합)

  • Seo, Jin Soo;Kim, Junghyun;Kim, Hyemi
    • The Journal of the Acoustical Society of Korea
    • /
    • v.38 no.6
    • /
    • pp.716-723
    • /
    • 2019
  • Fingerprint matching accuracy is essential in deploying a music search service. This paper deals with a method to improve fingerprint matching accuracy by utilizing an auxiliary information which is called power weight. Power weight is an expected robustness of each hash bit. While the previous power mask binarizes the expected robustness into strong and weak bits, the proposed method utilizes a real-valued function of the expected robustness as weights for fingerprint matching. As a countermeasure to the increased storage cost, we propose a compression method for the power weight which has strong temporal correlation. Experiments on the publicly-available music datasets confirmed that the proposed power weight is effective in improving fingerprint matching performance.

Query Processing of Spatio-temporal Trajectory for Moving Objects (이동 객체를 위한 시공간 궤적의 질의 처리)

  • Byoungwoo Oh
    • Journal of Platform Technology
    • /
    • v.11 no.1
    • /
    • pp.52-59
    • /
    • 2023
  • The importance of spatio-temporal trajectories for contact tracing has increased due to the recent COVID-19 pandemic. Spatio-temporal trajectories store time and spatial data of moving objects. In this paper, I propose query processing for spatio-temporal trajectories of moving objects. The spatio-temporal trajectory model of moving objects has point type spatial data for storing locations and timestamp type temporal data for time. A trajectory query is a query to search for pairs of users who have been in close contact by boarding the same bus. To process the trajectory query, I use the Geolife dataset provided by Microsoft. The proposed trajectory query processing method divides trajectory data by date and checks whether users' trajectories were nearby for each date to generate information about contacts as the result.

  • PDF

Analysis and Recognition of Depressive Emotion through NLP and Machine Learning (자연어처리와 기계학습을 통한 우울 감정 분석과 인식)

  • Kim, Kyuri;Moon, Jihyun;Oh, Uran
    • The Journal of the Convergence on Culture Technology
    • /
    • v.6 no.2
    • /
    • pp.449-454
    • /
    • 2020
  • This paper proposes a machine learning-based emotion analysis system that detects a user's depression through their SNS posts. We first made a list of keywords related to depression in Korean, then used these to create a training data by crawling Twitter data - 1,297 positive and 1,032 negative tweets in total. Lastly, to identify the best machine learning model for text-based depression detection purposes, we compared RNN, LSTM, and GRU in terms of performance. Our experiment results verified that the GRU model had the accuracy of 92.2%, which is 2~4% higher than other models. We expect that the finding of this paper can be used to prevent depression by analyzing the users' SNS posts.