• Title/Summary/Keyword: Text Retrieval System

Search Result 177, Processing Time 0.022 seconds

A Study on the Indexing System Using a Controlled Vocabulary and Natural Language in the Secondary Legal Information Full-Text Databases : an Evaluation and Comparison of Retrieval Effectiveness (2차 법률정보 전문데이터베이스에 있어서 통제어 색인시스템과 자연어 색인시스템의 검색효율 평가에 관한 연구)

  • Roh Jeong-Ran
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.32 no.4
    • /
    • pp.69-86
    • /
    • 1998
  • The purpose of velop the indexing algorithm of secondary legal information by the study of characteristics of legal information, to compare the indexing system using controlled vocabulary to the indexing system using natural language in the secondary legal information full-text databases, and to prove propriety and superiority of the indexing system using controlled vocabulary. The results are as follows; 1)The indexing system using controlled vocabulary in the secondary legal information full-text databases has more effectiveness than the indexing system using natural language, in the recall rate, the precision rate, the distribution of propriety, and the faculty of searching for the unique proper-records which the indexing system using natural language fans to find 2)The indexing system which adds more words to the controlled vocabulary in the secondary legal information full-text databases does not better effectiveness in the retail rate, the precision rate, comparing to the indexing system using controlled vocabulary. 3)The indexing system using word-added controlled vocabulary with an extra weight in the secondary legal information full-text databases does not better effectiveness in the recall rate, the precision rate, comparing to the indexing system using word-added controlled vocabulary without an extra weight. This study indicates that it is necessary to have characteristic information the information experts recognize - that is to say, experimental and inherent knowledge only human being can have built-in into the system rather than to approach the information system by the linguistic, statistic or structuralistic way, and it can be more essential and intelligent information system.

  • PDF

DESIGN OF METADATA MANAGEMENT SYSTEM FOR RETRIEVAL OF VIDEO DATA

  • Heo, Byeong-Mun;Lee, Yang-Koo;Chai, Duck-Jin;Wang, Ling;Lee, Yong-Mi;Ryu, Keun-Ho
    • Proceedings of the KSRS Conference
    • /
    • 2007.10a
    • /
    • pp.314-316
    • /
    • 2007
  • Currently for the development of internet and network technology, since request of service for the large volume multimedia data has been more increased, multimedia users want the convenience and accuracy of multimedia service system about storing and retrieving of the multimedia contents. To satisfy the request of users, metadata management for the diverse information of multimedia contents is very important. However, the metadata management for the multimedia contents is difficult because the metadata standards are different each other for the type of multimedia data and service. In this paper, we propose the integration metadata management system structure which extends previous metadata management system based on text for the multimedia contents metadata which are expressed differently each other according to the multimedia data or service type.

  • PDF

Design and Implementation of a Low-level Storage Manager for Efficient Storage and Retrieval of Multimedia Data in NOD Services (NoD서비스용 멀티미디어 데이터의 효율적인 저장 및 검색을 위한 하부저장 관리자의 설계 및 구현)

  • Jin, Ki-Sung;Jung, Jae-Wuk;Chang, Jae-Woo
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.4
    • /
    • pp.1033-1043
    • /
    • 2000
  • Recently as the user request on NoD (News-on-Demand) is largely increasing, there are a lot of researches to fulfill it. However, because of short life-cycle of new video data and periodical change of video data depending on anchor, it is difficult to apply the conventional video storage techniques to NOD applications directly. For this, we design and implement low-level storage manager for efficient storage and retrieval of multimedia data in NOD Services. Our low-level storage manager not only efficiently sotres video stream dat of new video itself, but also handles its index information. It provides an inverted file method for efficient text-based retrieval and an X-tree index structure for high-dimensional feature vectors. In addition, our low-level storage manager provides some application program interfaces (APIs) for storing video objects itself and index information extracted from hierarchial new video and some APIs for retrieving video objects easily by using cursors. Finally, we implement our low-level storage manager based on SHORE (Scalable Heterogeneous Object REpository) storage system by sunig a standard C++ language under UNIX operating system.

  • PDF

A Hangul Document Classification System using Case-based Reasoning (사례기반 추론을 이용한 한글 문서분류 시스템)

  • Lee, Jae-Sik;Lee, Jong-Woon
    • Asia pacific journal of information systems
    • /
    • v.12 no.2
    • /
    • pp.179-195
    • /
    • 2002
  • In this research, we developed an efficient Hangul document classification system for text mining. We mean 'efficient' by maintaining an acceptable classification performance while taking shorter computing time. In our system, given a query document, k documents are first retrieved from the document case base using the k-nearest neighbor technique, which is the main algorithm of case-based reasoning. Then, TFIDF method, which is the traditional vector model in information retrieval technique, is applied to the query document and the k retrieved documents to classify the query document. We call this procedure 'CB_TFIDF' method. The result of our research showed that the classification accuracy of CB_TFIDF was similar to that of traditional TFIDF method. However, the average time for classifying one document decreased remarkably.

Image Classification Approach for Improving CBIR System Performance (콘텐트 기반의 이미지검색을 위한 분류기 접근방법)

  • Han, Woo-Jin;Sohn, Kyung-Ah
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.41 no.7
    • /
    • pp.816-822
    • /
    • 2016
  • Content-Based image retrieval is a method to search by image features such as local color, texture, and other image content information, which is different from conventional tag or labeled text-based searching. In real life data, the number of images having tags or labels is relatively small, so it is hard to search the relevant images with text-based approach. Existing image search method only based on image feature similarity has limited performance and does not ensure that the results are what the user expected. In this study, we propose and validate a machine learning based approach to improve the performance of the image search engine. We note that when users search relevant images with a query image, they would expect the retrieved images belong to the same category as that of the query. Image classification method is combined with the traditional image feature similarity method. The proposed method is extensively validated on a public PASCAL VOC dataset consisting of 11,530 images from 20 categories.

A Research on Enhancement of Text Categorization Performance by using Okapi BM25 Word Weight Method (Okapi BM25 단어 가중치법 적용을 통한 문서 범주화의 성능 향상)

  • Lee, Yong-Hun;Lee, Sang-Bum
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.11 no.12
    • /
    • pp.5089-5096
    • /
    • 2010
  • Text categorization is one of important features in information searching system which classifies documents according to some criteria. The general method of categorization performs the classification of the target documents by eliciting important index words and providing the weight on them. Therefore, the effectiveness of algorithm is so important since performance and correctness of text categorization totally depends on such algorithm. In this paper, an enhanced method for text categorization by improving word weighting technique is introduced. A method called Okapi BM25 has been proved its effectiveness from some information retrieval engines. We applied Okapi BM25 and showed its good performance in the categorization. Various other words weights methods are compared: TF-IDF, TF-ICF and TF-ISF. The target documents used for this experiment is Reuter-21578, and SVM and KNN algorithms are used. Finally, modified Okapi BM25 shows the most excellent performance.

Text Corpus-based Question Answering System (문서 말뭉치 기반 질의응답 시스템)

  • Kim, Han-Joon;Kim, Min-Kyoung;Chang, Jae-Young
    • Journal of Digital Contents Society
    • /
    • v.11 no.3
    • /
    • pp.375-383
    • /
    • 2010
  • In developing question-answering (QA) systems, it is hard to analyze natural language questions syntactically and semantically and to find exact answers to given query questions. In order to avoid these difficulties, we propose a new style of question-answering system that automatically generate natural language queries and can allow to search queries fit for given keywords. The key idea behind generating natural queries is that after significant sentences within text documents are applied to the named entity recognition technique, we can generate a natural query (interrogative sentence) for each named entity (such as person, location, and time). The natural query is divided into two types: simple type and sentence structure type. With the large database of question-answer pairs, the system can easily obtain natural queries and their corresponding answers for given keywords. The most important issue is how to generate meaningful queries which can present unambiguous answers. To this end, we propose two principles to decide which declarative sentences can be the sources of natural queries and a pattern-based method for generating meaningful queries from the selected sentences.

The Character Recognition System of Mobile Camera Based Image (모바일 이미지 기반의 문자인식 시스템)

  • Park, Young-Hyun;Lee, Hyung-Jin;Baek, Joong-Hwan
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.11 no.5
    • /
    • pp.1677-1684
    • /
    • 2010
  • Recently, due to the development of mobile phone and supply of smart phone, many contents have been developed. Especially, since the small-sized cameras are equiped in mobile devices, people are interested in the image based contents development, and it also becomes important part in their practical use. Among them, the character recognition system can be widely used in the applications such as blind people guidance systems, automatic robot navigation systems, automatic video retrieval and indexing systems, automatic text translation systems. Therefore, this paper proposes a system that is able to extract text area from the natural images captured by smart phone camera. The individual characters are recognized and result is output in voice. Text areas are extracted using Adaboost algorithm and individual characters are recognized using error back propagated neural network.

Development of Dental Consultation Chatbot using Retrieval Augmented LLM (검색 증강 LLM을 이용한 치과 상담용 챗봇 개발)

  • Jongjin Park
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.24 no.2
    • /
    • pp.87-92
    • /
    • 2024
  • In this paper, a RAG system was implemented using an existing Large Language Model (LLM) and Langchain library to develop a dental consultation chatbot. For this purpose, we collected contents from the webpage bulletin boards of domestic dental university hospitals and constructed consultation data with the advice and supervision of dental specialists. In order to divide the input consultation data into appropriate sizes, the chunk size and the size of the overlapping text in each chunk were set to 1001 and 100, respectively. As a result of the simulation, the Retrieval Augmented LLM searched for and output the consultation content that was most similar to the user input. It was confirmed that the accessibility of dental consultation and the accuracy of consultation content could be improved through the built chatbot.

A Study on Planning & Implementation of the Multimedia Meta Database and Digital Library's Integrated Information System for the Oceanographic Information Center (해양전문정보센터의 멀티미디어 메타데이터베이스 및 디지털도서관 통합정보시스템 구현에 관한 연구)

  • Han, Jong-Yup;Choi, Young-Jun
    • Journal of the Korean Society for information Management
    • /
    • v.21 no.4 s.54
    • /
    • pp.5-26
    • /
    • 2004
  • A literature analysis for the planning and realization of the multimedia meta database and digital library's integrated information system was carried out to establish the various oceanographic resources in the Oceanographic Information Center, the first in Korea. The study targeted from printed matter, network resources, full-text and to VOD. The focus of the analysis lies in the providing practical integrated information retrieval service for oceanographic resources based on the framework of effective MODS metadata with network resources description. The analyses included oceanographic resources, multimedia information processing, MODS metadata descriptive elements, metadata classification, system organization, and retrieval for planning and implementation of the multimedia meta database system.