• Title/Summary/Keyword: retrieval effectiveness

Search Result 256, Processing Time 0.023 seconds

Using Different Properties of Weighting Schemes for High Retrieval Effectiveness (높은 검색 효과를 위한 다른 특성을 갖는 가중치 기법의 이용)

  • 이준호
    • Proceedings of the Korean Society for Information Management Conference
    • /
    • 1995.08a
    • /
    • pp.33-36
    • /
    • 1995
  • 질의 또는 문서에 대한 상이한 표현 방법 또는 상이한 검색 기법은 서로 다른 집합의 문서들을 검색함이 알려져 왔다. 최근 이러한 특성을 이용하여 다양한 표현 방법 또는 검색 기법을 결합함으로써 보다 높은 검색 효과를 얻을 수 있음이 입증되었다. 본 논문에서는 질의와 문서에 대한 하나의 표현과 하나의 검색 기법하에서 서로 다른 특성을 갖는 가중치 기법을 결합함으로써 보다 높은 검색 효과를 얻을 수 있음을 기술한다. 문서의 형태를 분류하고 가중치기법의 특성을 기술한 후, 이를 기반으로 하여 서로 다른 특성을 갖는 가중치 기법은 서로 다른 형태의 문서를 검색함을 설명한다. 또한 실험을 통하여 서로 다른 특성을 갖는 가중치 기법을 결합함으로써 보다 높은 검색 효과를 얻을 수 있음을 입증한다.

  • PDF

A Comparative Study on the Effectiveness of Hangul Natural Language Retrieval Using KT Test Set (KT Test Set을 이용한 우리말 자연언어검색의 효율성에 관한 비교연구)

  • 이현아;김성혁
    • Proceedings of the Korean Society for Information Management Conference
    • /
    • 1995.08a
    • /
    • pp.37-40
    • /
    • 1995
  • 본 연구는 자연언어시스템에서 색인어와 탐색어의 특정성에 기인하는 재현율 감소를 극복하기 위한 방법론으로써 탐색어의 확장을 통한 검색효율을 평가하였다. 이를 위하여 우리말 데이터베이스를 대상으로 주제전문가가 자연언어로 작성한 원 질의문 (Q1), 원 질의문에 사용된 탐색어와 데이터베이스내의 색인어간의 유사도를 이용하여 탐색어를 확장한 질의문 (Q2(0.2), Q2(0.3)), 주제전문가인 이용자가 Q1의 의미적인 관계를 고려해서 자연언어로 탐색어를 확장한 질의문 (Q3)을 검색효율면에서 비교하였다. 실험결과, 평균재현율은 Q2(0.2), Q2(0.3), Q3, Q1의 검색의 순이었다. 평균정확율은 Q3, Q2(0.3), Q1, Q2(0.2)검색의 순으로 나타났다.

  • PDF

Towards a small language model powered chain-of-reasoning for open-domain question answering

  • Jihyeon Roh;Minho Kim;Kyoungman Bae
    • ETRI Journal
    • /
    • v.46 no.1
    • /
    • pp.11-21
    • /
    • 2024
  • We focus on open-domain question-answering tasks that involve a chain-of-reasoning, which are primarily implemented using large language models. With an emphasis on cost-effectiveness, we designed EffiChainQA, an architecture centered on the use of small language models. We employed a retrieval-based language model to address the limitations of large language models, such as the hallucination issue and the lack of updated knowledge. To enhance reasoning capabilities, we introduced a question decomposer that leverages a generative language model and serves as a key component in the chain-of-reasoning process. To generate training data for our question decomposer, we leveraged ChatGPT, which is known for its data augmentation ability. Comprehensive experiments were conducted using the HotpotQA dataset. Our method outperformed several established approaches, including the Chain-of-Thoughts approach, which is based on large language models. Moreover, our results are on par with those of state-of-the-art Retrieve-then-Read methods that utilize large language models.

KOREAN TOPIC MODELING USING MATRIX DECOMPOSITION

  • June-Ho Lee;Hyun-Min Kim
    • East Asian mathematical journal
    • /
    • v.40 no.3
    • /
    • pp.307-318
    • /
    • 2024
  • This paper explores the application of matrix factorization, specifically CUR decomposition, in the clustering of Korean language documents by topic. It addresses the unique challenges of Natural Language Processing (NLP) in dealing with the Korean language's distinctive features, such as agglutinative words and morphological ambiguity. The study compares the effectiveness of Latent Semantic Analysis (LSA) using CUR decomposition with the classical Singular Value Decomposition (SVD) method in the context of Korean text. Experiments are conducted using Korean Wikipedia documents and newspaper data, providing insight into the accuracy and efficiency of these techniques. The findings demonstrate the potential of CUR decomposition to improve the accuracy of document clustering in Korean, offering a valuable approach to text mining and information retrieval in agglutinative languages.

Design and Implementation of an Interactive Streaming Platform for Supporting Instant Retrieval of Product Information in Product Placement Advertisement (간접광고에서 제품 정보의 즉각적 검색을 지원하는 인터렉티브 동영상 플랫폼 설계 및 구현)

  • Im, Hyeon-Jin;Cho, Dae-Soo
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.15 no.5
    • /
    • pp.931-938
    • /
    • 2020
  • Recently, with the expansion of the use of cross media, the public is not just watching the broadcast, but is also consuming various information about actor, stories, products, etc. that appears during the broadcast. However, the devices used for viewing and the devices used for searching are different, which is inconveniences, and due to the differences between the point in time when the desired information is provided through the search, the public has difficulty in obtaining detailed information of the target product after encountering product placement advertisement. In addition, it is difficult for advertisers to confirm the effect of product placement advertising through the reaction of viewers who have encountered product placement advertising. In this paper, we intend to propose an interactive streaming platform that supports the instant retrieval of product information to users by including product placement advertisement information in broadcasting. Through this, viewers can quickly receive detailed information of products on the screen by giving an event when a product of interest comes out while watching the broadcast, and advertisers can check the effectiveness of product placement advertisements by receiving interactive responses from viewers.

An Efficient Inverted Index Technique based on RDBMS for XML Documents (XML 문서에 대한 RDBMS에 기반을 둔 효율적인 역색인 기법)

  • 서치영;이상원;김형주
    • Journal of KIISE:Databases
    • /
    • v.30 no.1
    • /
    • pp.27-40
    • /
    • 2003
  • The inverted index widely used in the existing information retrieval field should be extended for XML documents to support containment queries by XML information retrieval systems. In this paper, we consider that there are two methods in storing the inverted index and processing containment queries for XML documents as the previous work suggested: using a RDBMS or using an inverted lift engine. It has two drawbacks to extend the inverted index in the previous work. One is that using a RDBMS is moth worse in the performance than using an inverted list engine. The other is that when containment queries are processed in a RDBMS, there is an increase in the number of a join operation as the path length of a query increases and a join operation always happens between large fables. In this paper. we extend the inverted index in a different way to solve these problems and show the effectiveness of using a RDBMS.

Image Retrieval using Local Color Histogram and Shape Feature (지역별 색상 분포 히스토그램과 모양 특징을 이용한 영상 검색)

  • 정길선;김성만;이양원
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 1999.05a
    • /
    • pp.50-54
    • /
    • 1999
  • This paper is proposed to image retrieval system using color and shape feature. Color feature used to four maximum value feature among the maximum value extracted from local color distribution histogram. The preprocessing of shape feature consist of edge extraction and weight central point extraction and angular sampling. The sum of distance from weight central point to contour and variation and max/min used to shape feature. The similarity is estimated compare feature of query image with the feature of images in database and the candidate of image is retrieved in order of similarity. We evaluate the effectiveness of shape feature and color feature in experiment used to two hundred of the closed image. The Recall and the Precision is each 0.72 and 0.53 in the result of average experiment. So the proposed method is presented useful method.

  • PDF

Searching Human Motion Data by Sketching 3D Trajectories (3차원 이동 궤적 묘사를 통한 인간 동작 데이터 검색)

  • Lee, Kang Hoon
    • Journal of the Korea Computer Graphics Society
    • /
    • v.19 no.2
    • /
    • pp.1-8
    • /
    • 2013
  • Captured human motion data has been widely utilized for understanding the mechanism of human motion and synthesizing the animation of virtual characters. Searching for desired motions from given motion data is an important prerequisite of analyzing and editing those selected motions. This paper presents a new method of content-based motion retrieval without the need of additional metadata such as keywords. While existing search methods have focused on skeletal configurations of body pose or planar trajectories of locomotion, our method receives a three-dimensional trajectory as its input query and retrieves a set of motion intervals in which the trajectories of body parts such as hands, foods, and pelvis are similar to the input trajectory. In order to allow the user to intuitively sketch spatial trajectories, we used the Leap Motion controller that can precisely trace finger movements as the input device for our experiments. We have evaluated the effectiveness of our approach by conducting a user study in which the users search for dozens of pre-selected motions from baseketball motion data including a variety of moves such as dribbling and shooting.

A Development of Educational Program for Evaluating the Efficiency of Warehouse System (창고 시스템의 효율성 평가를 위한 교육용 프로그램 개발)

  • Kim, Moon-Ki;Kim, Hee-Sung
    • The Journal of Korean Institute for Practical Engineering Education
    • /
    • v.4 no.1
    • /
    • pp.80-85
    • /
    • 2012
  • The importance of warehouse is increasing since the role of warehouse in modern industry is changing from concept of storage to concept of circulation which facilitates purchasing, production, storage and distribution activities by induction of information system. In this study, a program is developed using C# for evaluating efficiency of automated storage and retrieval system(AS/RS). A simulation work is done for eight operating schemes under the combination of three conditions, which are storing method, the shape of automatic warehouse and the sequence of command performs, and the moving distance of stacker crane is calculated using the same gateway data. Using this program, the optimal operating scheme can be proposed based on the analyzed results of simulation. This progrm shows the effectiveness and applicability through the simulation work and can be utilized for courses which are related to factory facilities.

  • PDF

Keyword Extraction from News Corpus using Modified TF-IDF (TF-IDF의 변형을 이용한 전자뉴스에서의 키워드 추출 기법)

  • Lee, Sung-Jick;Kim, Han-Joon
    • The Journal of Society for e-Business Studies
    • /
    • v.14 no.4
    • /
    • pp.59-73
    • /
    • 2009
  • Keyword extraction is an important and essential technique for text mining applications such as information retrieval, text categorization, summarization and topic detection. A set of keywords extracted from a large-scale electronic document data are used for significant features for text mining algorithms and they contribute to improve the performance of document browsing, topic detection, and automated text classification. This paper presents a keyword extraction technique that can be used to detect topics for each news domain from a large document collection of internet news portal sites. Basically, we have used six variants of traditional TF-IDF weighting model. On top of the TF-IDF model, we propose a word filtering technique called 'cross-domain comparison filtering'. To prove effectiveness of our method, we have analyzed usefulness of keywords extracted from Korean news articles and have presented changes of the keywords over time of each news domain.

  • PDF