• 제목/요약/키워드: Language-Based Retrieval Model

검색결과 73건 처리시간 0.026초

Towards a small language model powered chain-of-reasoning for open-domain question answering

  • Jihyeon Roh;Minho Kim;Kyoungman Bae
    • ETRI Journal
    • /
    • 제46권1호
    • /
    • pp.11-21
    • /
    • 2024
  • We focus on open-domain question-answering tasks that involve a chain-of-reasoning, which are primarily implemented using large language models. With an emphasis on cost-effectiveness, we designed EffiChainQA, an architecture centered on the use of small language models. We employed a retrieval-based language model to address the limitations of large language models, such as the hallucination issue and the lack of updated knowledge. To enhance reasoning capabilities, we introduced a question decomposer that leverages a generative language model and serves as a key component in the chain-of-reasoning process. To generate training data for our question decomposer, we leveraged ChatGPT, which is known for its data augmentation ability. Comprehensive experiments were conducted using the HotpotQA dataset. Our method outperformed several established approaches, including the Chain-of-Thoughts approach, which is based on large language models. Moreover, our results are on par with those of state-of-the-art Retrieve-then-Read methods that utilize large language models.

콘도르 정보 검색 시스템 (Information Retrieval System : Condor)

  • 박순철;안동언
    • 한국산업정보학회논문지
    • /
    • 제8권4호
    • /
    • pp.31-37
    • /
    • 2003
  • 본 연구는 다중어 질의어를 제공하는 대용량 정보검색 시스템, 콘도르에 대한 고찰이다. 이 시스템은 전북대학교, (주)서치라인, 그리고 카네기멜론 대학교가 컨소시엄 형태로 개발하였다. 이 시스템의 질의처리는 확률 모델을 기반하고 있으며 최근 정보검색 시스템에서 제공하는 문서 클러스터링 기능을 제공하고 있다. 특히 시스템의 특징은 다중어 질의어를 처리하고 질의를 중심으로 온라인으로 문서를 클러스터링하고 요약하는 것이다. 본 시스템은 이미 국내의 3,000만개 웹페이지에 대한 테스트를 마쳤으며 그 안정성을 확보하고 있다.

  • PDF

SCORM 기반의 XML 학습 컨텐츠 검색 시스템 (XML-based Retrieval System for SCORM-based Virtual Learning Contents)

  • 최병욱;송미숙;조정원
    • 컴퓨터교육학회논문지
    • /
    • 제6권1호
    • /
    • pp.9-17
    • /
    • 2003
  • 차세대 인터넷 표준 언어인 XML(eXtensible Markup Language)은 데이터(data)와 표현(presentation) 그리고 구조(structure)가 구분되기 때문에 어느 환경에서나 재사용성이나 재구성이 용이한 장점을 보이고 있다. 본 논문에서는 XML 문서를 가상교육 시스템(Virtual Education System)의 멀티미디어 컨텐츠로 범위를 제한하여 사용자 위주의 효율적인 검색 시스템을 구현한다. 본 시스템에서는 가상교육 표준안으로 제안되고 있는 SCORM(Sharable Content Object Reference Model)에서 정의한 SCO(Sharable Content Object)단위의 메타데이터를 기반으로 컨테츠를 설계하고 각 문서를 키워드, 엘리먼트, 애트리뷰트 단위로 색인한다. 또한 사용자 인터페이스에서 엘리먼트 검색화면을 구조적으로 구성해줌으로써 사용자가 DTD(Document Type Definition)에 대한 사전지식 없이도 검색이 가능하며, XML-QL로 재구성된 XML 문서의 형태와 XSL(eXtensible markup language Stylesheet Language)을 이용한 HTML 형태의 두 가지 결과화면을 제시함으로써 사용자 선택의 폭을 넓혀준다.

  • PDF

Retrieval methodology for similar NPP LCO cases based on domain specific NLP

  • No Kyu Seong ;Jae Hee Lee ;Jong Beom Lee;Poong Hyun Seong
    • Nuclear Engineering and Technology
    • /
    • 제55권2호
    • /
    • pp.421-431
    • /
    • 2023
  • Nuclear power plants (NPPs) have technical specifications (Tech Specs) to ensure that the equipment and key operating parameters necessary for the safe operation of the power plant are maintained within limiting conditions for operation (LCO) determined by a safety analysis. The LCO of Tech Specs that identify the lowest functional capability of equipment required for safe operation for a facility must be complied for the safe operation of NPP. There have been previous studies to aid in compliance with LCO relevant to rule-based expert systems; however, there is an obvious limit to expert systems for implementing the rules for many situations related to LCO. Therefore, in this study, we present a retrieval methodology for similar LCO cases in determining whether LCO is met or not met. To reflect the natural language processing of NPP features, a domain dictionary was built, and the optimal term frequency-inverse document frequency variant was selected. The retrieval performance was improved by adding a Boolean retrieval model based on terms related to the LCO in addition to the vector space model. The developed domain dictionary and retrieval methodology are expected to be exceedingly useful in determining whether LCO is met.

내용기반 질의 처리를 위한 동영상 질의 처리기의 설계 및 구현 (Design and Implementation of the Video Query Processing Engine for Content-Based Query Processing)

  • 조은희;김용걸;이훈순;정영은;진성일
    • 한국정보처리학회논문지
    • /
    • 제6권3호
    • /
    • pp.603-614
    • /
    • 1999
  • As multimedia application services on high-speed information network have been rapidly developed, the need for the video information management system that provides an efficient way for users to retrieve video data is growing. In this paper, we propose a video data model that integrates free annotations, image features, and spatial-temporal features for video purpose of improving content-based retrieval of video data. The proposed video data model can act as a generic video data model for multimedia applications, and support free annotations, image features, spatial-temporal features, and structure information of video data within the same framework. We also propose the video query language for efficiently providing query specification to access video clips in the video data. It can formalize various kinds of queries based on the video contents. Finally we design and implement the query processing engine for efficient video data retrieval on the proposed metadata model and the proposed video query language.

  • PDF

객체지향 시공간 데이터베이스 시스템의 객체기반 설계 및 질의어 (Object-Based Modeling and Language for an Object-Oriented Spatiao-Temporal Database System)

  • 김양희
    • 컴퓨터교육학회논문지
    • /
    • 제10권2호
    • /
    • pp.101-113
    • /
    • 2007
  • 본 논문에서는 객체지향 시공간 데이터베이스 시스템의 데이터 모델링과 질의어를 객체지향 기법을 사용하여 소개한다. 시공간 객체와 시공간 연산자를 다루기 위해 다음과 같은 두 단계 객체지향 데이터 모델을 제안 한다: 시공간 객체 모델과 시공간 내부 기술 모델 또한 객체지향 시공간 질의어인 STOQL을 제안한다. STOQL은 공간 객체의 다양한 출력과 시공간 및 비 공간 객체의 검색을 수행할 수 있는 통합 기능을 제공해준다.

  • PDF

정렬기법을 활용한 와/과 병렬명사구 범위 결정 (Range Detection of Wa/Kwa Parallel Noun Phrase by Alignment method)

  • 최용석;신지애;최기선;김기태;이상태
    • 한국감성과학회:학술대회논문집
    • /
    • 한국감성과학회 2008년도 추계학술대회
    • /
    • pp.90-93
    • /
    • 2008
  • In natural language, it is common that repetitive constituents in an expression are to be left out and it is necessary to figure out the constituents omitted at analyzing the meaning of the sentence. This paper is on recognition of boundaries of parallel noun phrases by figuring out constituents omitted. Recognition of parallel noun phrases can greatly reduce complexity at the phase of sentence parsing. Moreover, in natural language information retrieval, recognition of noun with modifiers can play an important role in making indexes. We propose an unsupervised probabilistic model that identifies parallel cores as well as boundaries of parallel noun phrases conjoined by a conjunctive particle. It is based on the idea of swapping constituents, utilizing symmetry (two or more identical constituents are repeated) and reversibility (the order of constituents is changeable) in parallel structure. Semantic features of the modifiers around parallel noun phrase, are also used the probabilistic swapping model. The model is language-independent and in this paper presented on parallel noun phrases in Korean language. Experiment shows that our probabilistic model outperforms symmetry-based model and supervised machine learning based approaches.

  • PDF

텍스트 기반 의료영상 검색의 최근 발전 (Recent Development in Text-based Medical Image Retrieval)

  • 황경훈;이해준;고건;김석균;선용한;최덕주
    • 대한의용생체공학회:의공학회지
    • /
    • 제36권3호
    • /
    • pp.55-60
    • /
    • 2015
  • An effective image retrieval system is required as the amount of medical imaging data is increasing recently. Authors reviewed the recent development of text-based medical image retrieval including the use of controlled vocabularies - RadLex (Radiology Lexicon), FMA (Foundational Model of Anatomy), etc - natural language processing, semantic ontology, and image annotation and markup.

객체 지향 멀티미디어 데이타베이스를 위한 멀티미디어 질의어 (A Multimedia Query Language for Object-Oriented Multimedia Databases)

  • 노윤묵;이석호;김규철
    • 전자공학회논문지B
    • /
    • 제32B권5호
    • /
    • pp.671-682
    • /
    • 1995
  • In this paper, we propose a multimedia query language MQL which defines and manipulates multimedia data as integration of monomedia data in time and space. The MQL is designed for a multimedia data model, called the object-relationship model, and based on the multimedia object calculus which formally describes operations on multimedia data. The SQL- like syntax for class definition and object manipulation, such as retrieval, insert, update, and delete, is defined. We show how the MQL can represent the user queries using composite temporal-spatial class structures and various relationships, such as equivalence and sequence.

  • PDF

자연어 질의가 가능한 퍼지 기반 지능형 전자상거래 검색 에이전트 (Fuzzy Theory based Electronic Commerce Navigation Agent that can Query by Natural Language)

  • 김명순;정환묵
    • 한국지능시스템학회:학술대회논문집
    • /
    • 한국퍼지및지능시스템학회 2001년도 춘계학술대회 학술발표 논문집
    • /
    • pp.270-273
    • /
    • 2001
  • In this paper, we proposed the intelligent navigation agent model for successive electronic commerce management. For allowing intelligence, we used fuzzy theory. Fuzzy theory is very useful method where keywords have vague conditions and system must process that conditions. So, using theory, we proposed the model that can process the vague keywords effectively. Through the this, we verified that we can get the more appropriate navigation result than any other crisp retrieval keywords condition.

  • PDF