• Title/Summary/Keyword: Language-Based Retrieval Model

Search Result 73, Processing Time 0.021 seconds

A Study on the Performance Analysis of Entity Name Recognition Techniques Using Korean Patent Literature

  • Gim, Jangwon
    • Journal of Advanced Information Technology and Convergence
    • /
    • v.10 no.2
    • /
    • pp.139-151
    • /
    • 2020
  • Entity name recognition is a part of information extraction that extracts entity names from documents and classifies the types of extracted entity names. Entity name recognition technologies are widely used in natural language processing, such as information retrieval, machine translation, and query response systems. Various deep learning-based models exist to improve entity name recognition performance, but studies that compared and analyzed these models on Korean data are insufficient. In this paper, we compare and analyze the performance of CRF, LSTM-CRF, BiLSTM-CRF, and BERT, which are actively used to identify entity names using Korean data. Also, we compare and evaluate whether embedding models, which are variously used in recent natural language processing tasks, can affect the entity name recognition model's performance improvement. As a result of experiments on patent data and Korean corpus, it was confirmed that the BiLSTM-CRF using FastText method showed the highest performance.

A Semantic Web Service for Tourism Information over the Mobile Web (시맨틱 웹에 기초한 모바일 관광정보 서비스)

  • Lee, Yang-Won
    • Journal of the Korean Geographical Society
    • /
    • v.42 no.5
    • /
    • pp.788-807
    • /
    • 2007
  • To better publish geographical information on the Web, it is important to capture how Web technologies are changing. For a recent decade, Semantic Web has been developed by incorporating ontologies into the current Web, with an aim to make computers understand rather than simply display. Ontology, an explicit specification of a conceptualization, and the Semantic Web grounded on the ontology, have the potential for effective sharing and appropriate retrieval of geographical information. This paper describes a Semantic Web Service over the mobile Web that can offer pertinent tourism information according to user contexts. To do this, a tourism ontology was formalized in the PARA(Place-Attraction-Resource-Activity) ontology model by organizing tourist places, tourist attractions, tourism resources, and activities. Locational relationships between tourist places were also included in the PARA ontology model to take into account the movements of tourists on a railway network. The XML(Extensible Markup Language) Web Service in the middle tier manages the client-side request for information retrieval and the corresponding server-side response from the data provider. The PARA ontology was integrated into the XML Web Service for the concept-based discovery of tourism information. The applicability of the proposed system was tested through a simulation experiment for Tokyo tourism.

Structured Information Modeling and Query Method for SMIL Documents (SMIL 문서의 구조 정보 모델 및 검색)

  • 류은숙;이기호;이규철
    • Journal of Korea Multimedia Society
    • /
    • v.7 no.3
    • /
    • pp.293-307
    • /
    • 2004
  • The SMIL(Synchronized Multimedia Integration Language) documents are represented as logical structure information, spatial layout structure information, temporal synchronization structure information and hyperlink structure information, according as the structural characteristics of SMIL documents based on XML. This paper proposes the effective modeling and query method for the multi -structure information of inherent SMIL documents. In particular, we present the object-oriented modeling by using UML class diagram in order to represent the objects classes for the structured information of SMIL documents, and the hierarchical structure and the relationships for the objects classes. In addition, the objects classes definition is specified in compliance with SQL3 for database standard language. We also propose the access method and the query representation for hierarchical structure in order to retrieve efficiently the structural objects of SMIL documents.

  • PDF

Generative AI service implementation using LLM application architecture: based on RAG model and LangChain framework (LLM 애플리케이션 아키텍처를 활용한 생성형 AI 서비스 구현: RAG모델과 LangChain 프레임워크 기반)

  • Cheonsu Jeong
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.4
    • /
    • pp.129-164
    • /
    • 2023
  • In a situation where the use and introduction of Large Language Models (LLMs) is expanding due to recent developments in generative AI technology, it is difficult to find actual application cases or implementation methods for the use of internal company data in existing studies. Accordingly, this study presents a method of implementing generative AI services using the LLM application architecture using the most widely used LangChain framework. To this end, we reviewed various ways to overcome the problem of lack of information, focusing on the use of LLM, and presented specific solutions. To this end, we analyze methods of fine-tuning or direct use of document information and look in detail at the main steps of information storage and retrieval methods using the retrieval augmented generation (RAG) model to solve these problems. In particular, similar context recommendation and Question-Answering (QA) systems were utilized as a method to store and search information in a vector store using the RAG model. In addition, the specific operation method, major implementation steps and cases, including implementation source and user interface were presented to enhance understanding of generative AI technology. This has meaning and value in enabling LLM to be actively utilized in implementing services within companies.

Semantic Process Retrieval with Similarity Algorithms (유사도 알고리즘을 활용한 시맨틱 프로세스 검색방안)

  • Lee, Hong-Joo;Klein, Mark
    • Asia pacific journal of information systems
    • /
    • v.18 no.1
    • /
    • pp.79-96
    • /
    • 2008
  • One of the roles of the Semantic Web services is to execute dynamic intra-organizational services including the integration and interoperation of business processes. Since different organizations design their processes differently, the retrieval of similar semantic business processes is necessary in order to support inter-organizational collaborations. Most approaches for finding services that have certain features and support certain business processes have relied on some type of logical reasoning and exact matching. This paper presents our approach of using imprecise matching for expanding results from an exact matching engine to query the OWL(Web Ontology Language) MIT Process Handbook. MIT Process Handbook is an electronic repository of best-practice business processes. The Handbook is intended to help people: (1) redesigning organizational processes, (2) inventing new processes, and (3) sharing ideas about organizational practices. In order to use the MIT Process Handbook for process retrieval experiments, we had to export it into an OWL-based format. We model the Process Handbook meta-model in OWL and export the processes in the Handbook as instances of the meta-model. Next, we need to find a sizable number of queries and their corresponding correct answers in the Process Handbook. Many previous studies devised artificial dataset composed of randomly generated numbers without real meaning and used subjective ratings for correct answers and similarity values between processes. To generate a semantic-preserving test data set, we create 20 variants for each target process that are syntactically different but semantically equivalent using mutation operators. These variants represent the correct answers of the target process. We devise diverse similarity algorithms based on values of process attributes and structures of business processes. We use simple similarity algorithms for text retrieval such as TF-IDF and Levenshtein edit distance to devise our approaches, and utilize tree edit distance measure because semantic processes are appeared to have a graph structure. Also, we design similarity algorithms considering similarity of process structure such as part process, goal, and exception. Since we can identify relationships between semantic process and its subcomponents, this information can be utilized for calculating similarities between processes. Dice's coefficient and Jaccard similarity measures are utilized to calculate portion of overlaps between processes in diverse ways. We perform retrieval experiments to compare the performance of the devised similarity algorithms. We measure the retrieval performance in terms of precision, recall and F measure? the harmonic mean of precision and recall. The tree edit distance shows the poorest performance in terms of all measures. TF-IDF and the method incorporating TF-IDF measure and Levenshtein edit distance show better performances than other devised methods. These two measures are focused on similarity between name and descriptions of process. In addition, we calculate rank correlation coefficient, Kendall's tau b, between the number of process mutations and ranking of similarity values among the mutation sets. In this experiment, similarity measures based on process structure, such as Dice's, Jaccard, and derivatives of these measures, show greater coefficient than measures based on values of process attributes. However, the Lev-TFIDF-JaccardAll measure considering process structure and attributes' values together shows reasonably better performances in these two experiments. For retrieving semantic process, we can think that it's better to consider diverse aspects of process similarity such as process structure and values of process attributes. We generate semantic process data and its dataset for retrieval experiment from MIT Process Handbook repository. We suggest imprecise query algorithms that expand retrieval results from exact matching engine such as SPARQL, and compare the retrieval performances of the similarity algorithms. For the limitations and future work, we need to perform experiments with other dataset from other domain. And, since there are many similarity values from diverse measures, we may find better ways to identify relevant processes by applying these values simultaneously.

A Model of Natural Language Information Retrieval Using Main Keywords and Sub-keywords (주 키워드와 부 키워드를 이용한 자연언어 정보 검색 모델)

  • Kang, Hyun-Kyu;Park, Se-Young
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.12
    • /
    • pp.3052-3062
    • /
    • 1997
  • An Information Retrieval (IR) is to retrieve relevant information that satisfies user's information needs. However a major role of IR systems is not just the generation of sets of relevant documents, but to help determine which documents are most likely to be relevant to the given requirements. Various attempts have been made in the recent past to use syntactic analysis methods for the generation of complex construction that are essential for content identification in various automatic text analysis systems. Unfortunately, it is known that methods based on syntactic understanding alone are not sufficiently powerful to Produce complete analyses of arbitrary text samples. In this paper, we present a document ranking method based on two-level ranking. The first level is used to retrieve the documents, and the second level to reorder the retrieved documents. The main keywords used in the first level can be defined as nouns and/or compound nouns that possess good document discrimination powers. The sub-keywords used in the second level can be also defined as adjectives, adverbs, and/or verbs that are not main keywords, and function words. An empirical study was conducted from a Korean encyclopedia with 23,113 entries and 161 Korean natural language queries collected by end users. 850% of the natural language queries contained sub-keywords. The two-level document ranking methods provides significant improvement in retrieval effectiveness over traditional ranking methods.

  • PDF

Reputation Analysis of Document Using Probabilistic Latent Semantic Analysis Based on Weighting Distinctions (가중치 기반 PLSA를 이용한 문서 평가 분석)

  • Cho, Shi-Won;Lee, Dong-Wook
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.58 no.3
    • /
    • pp.632-638
    • /
    • 2009
  • Probabilistic Latent Semantic Analysis has many applications in information retrieval and filtering, natural language processing, machine learning from text, and in related areas. In this paper, we propose an algorithm using weighted Probabilistic Latent Semantic Analysis Model to find the contextual phrases and opinions from documents. The traditional keyword search is unable to find the semantic relations of phrases, Overcoming these obstacles requires the development of techniques for automatically classifying semantic relations of phrases. Through experiments, we show that the proposed algorithm works well to discover semantic relations of phrases and presents the semantic relations of phrases to the vector-space model. The proposed algorithm is able to perform a variety of analyses, including such as document classification, online reputation, and collaborative recommendation.

Fuzzy Theory based Electronic Commerce Navigation Agent that can Process Natural Language (자연어 처리가 가능한 퍼지 이론 기반 전자상거래 검색 에이전트)

  • 김명순;정환묵
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.11 no.3
    • /
    • pp.246-251
    • /
    • 2001
  • In this paper, we proposed the intelligent navigation agent model for successive electronic commerce system management. Fuzzy theory is very useful method where keywords have vague conditions and system must process that conditions. So, using fuzzy theory, we proposed the model that can process the vague keywords effectively. Through the this, we verified that we can get the more appropriate navigation result than any other crisp retrieval keywords condition.

  • PDF

Incorporation of Fuzzy Theory with Heavyweight Ontology and Its Application on Vague Information Retrieval for Decision Making

  • Bukhari, Ahmad C.;Kim, Yong-Gi
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.11 no.3
    • /
    • pp.171-177
    • /
    • 2011
  • The decision making process is based on accurate and timely available information. To obtain precise information from the internet is becoming more difficult due to the continuous increase in vagueness and uncertainty from online information resources. This also poses a problem for blind people who desire the full use from online resources available to other users for decision making in their daily life. Ontology is considered as one of the emerging technology of knowledge representation and information sharing today. Fuzzy logic is a very popular technique of artificial intelligence which deals with imprecision and uncertainty. The classical ontology can deal ideally with crisp data but cannot give sufficient support to handle the imprecise data or information. In this paper, we incorporate fuzzy logic with heavyweight ontology to solve the imprecise information extraction problem from heterogeneous misty sources. Fuzzy ontology consists of fuzzy rules, fuzzy classes and their properties with axioms. We use Fuzzy OWL plug-in of Protege to model the fuzzy ontology. A prototype is developed which is based on OWL-2 (Web Ontology Language-2), PAL (Protege Axiom Language), and fuzzy logic in order to examine the effectiveness of the proposed system.

Semantic Image Retrieval Using Color Distribution and Similarity Measurement in WordNet (컬러 분포와 WordNet상의 유사도 측정을 이용한 의미적 이미지 검색)

  • Choi, Jun-Ho;Cho, Mi-Young;Kim, Pan-Koo
    • The KIPS Transactions:PartB
    • /
    • v.11B no.4
    • /
    • pp.509-516
    • /
    • 2004
  • Semantic interpretation of image is incomplete without some mechanism for understanding semantic content that is not directly visible. For this reason, human assisted content-annotation through natural language is an attachment of textual description to image. However, keyword-based retrieval is in the level of syntactic pattern matching. In other words, dissimilarity computation among terms is usually done by using string matching not concept matching. In this paper, we propose a method for computerized semantic similarity calculation In WordNet space. We consider the edge, depth, link type and density as well as existence of common ancestors. Also, we have introduced method that applied similarity measurement on semantic image retrieval. To combine wi#h the low level features, we use the spatial color distribution model. When tested on a image set of Microsoft's 'Design Gallery Line', proposed method outperforms other approach.