• Title/Summary/Keyword: Information Retrieval Engine

Search Result 137, Processing Time 0.027 seconds

An Efficient Inverted Index Technique based on RDBMS for XML Documents (XML 문서에 대한 RDBMS에 기반을 둔 효율적인 역색인 기법)

  • 서치영;이상원;김형주
    • Journal of KIISE:Databases
    • /
    • v.30 no.1
    • /
    • pp.27-40
    • /
    • 2003
  • The inverted index widely used in the existing information retrieval field should be extended for XML documents to support containment queries by XML information retrieval systems. In this paper, we consider that there are two methods in storing the inverted index and processing containment queries for XML documents as the previous work suggested: using a RDBMS or using an inverted lift engine. It has two drawbacks to extend the inverted index in the previous work. One is that using a RDBMS is moth worse in the performance than using an inverted list engine. The other is that when containment queries are processed in a RDBMS, there is an increase in the number of a join operation as the path length of a query increases and a join operation always happens between large fables. In this paper. we extend the inverted index in a different way to solve these problems and show the effectiveness of using a RDBMS.

MultiDisplay for HCI and Web Information Retrieval (HCI를 위한 다중 디스플레이와 웹 정보검색)

  • 양현택;박나연;김원중
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2000.04b
    • /
    • pp.402-404
    • /
    • 2000
  • 현재 WWW(World Wide Web)는 가장 중요한 정보전달 및 획득 수단이 되었다. 대부분의 네트워크 사용자들은 웹에서 정보를 습득하기 위하여 대부분 검색엔진(Search Engine)을 사용한다. 그러나 웹에 등록되는 정보의 종류와 분량이 폭발적으로 증가함에 따라 검색엔진들이 제공하여 주는 인덱스화된 정보의 리스트들이 너무 많고, 또한 많은 문서들은 중복되어 나타나 사용자들이 효율적으로 정보를 검색하는데 문제점이 되고 있다. 본 연구에서는 다중디스플레이(MultiDisplay) 기법을 이용하여 사용자들에게 친숙하고, 웹의 정보검색에 필요한 시간과 노력을 대폭 줄이는 방안을 제시하였다.

  • PDF

A Search-Result Clustering Method based on Word Clustering for Effective Browsing of the Paper Retrieval Results (논문 검색 결과의 효과적인 브라우징을 위한 단어 군집화 기반의 결과 내 군집화 기법)

  • Bae, Kyoung-Man;Hwang, Jae-Won;Ko, Young-Joong;Kim, Jong-Hoon
    • Journal of KIISE:Software and Applications
    • /
    • v.37 no.3
    • /
    • pp.214-221
    • /
    • 2010
  • The search-results clustering problem is defined as the automatic and on-line grouping of similar documents in search results returned from a search engine. In this paper, we propose a new search-results clustering algorithm specialized for a paper search service. Our system consists of two algorithmic phases: Category Hierarchy Generation System (CHGS) and Paper Clustering System (PCS). In CHGS, we first build up the category hierarchy, called the Field Thesaurus, for each research field using an existing research category hierarchy (KOSEF's research category hierarchy) and the keyword expansion of the field thesaurus by a word clustering method using the K-means algorithm. Then, in PCS, the proposed algorithm determines the category of each paper using top-down and bottom-up methods. The proposed system can be used in the application areas for retrieval services in a specialized field such as a paper search service.

Design and Performance Evaluation of an Indexing Method for Partial String Searches (문자열 부분검색을 위한 색인기법의 설계 및 성능평가)

  • Gang, Seung-Heon;Yu, Jae-Su
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.6
    • /
    • pp.1458-1467
    • /
    • 1999
  • Existing index structures such as extendable hashing and B+-tree do not support partial string searches perfectly. The inverted file method and the signature file method that are used in the web retrieval engine also have problems that they do not provide partial string searches and suffer from serious retrieval performance degradation respectively. In this paper, we propose an efficient index method that supports partial string searches and achieves good retrieval performance. The proposed index method is based on the Inverted file structure. It constructs the index file with patterns that result from dividing terms by two syllables to support partial string searches. We analyze the characteristics of our proposed method through simulation experiments using wide range of parameter values. We analyze the derive analytic performance evaluation models of the existing inverted file method, signature file method and the proposed index method in terms of retrieval time and storage overhead. We show through performance comparison based on analytic models that the proposed method significantly improves retrieval performance over the existing method.

  • PDF

A Study of Personalized Information Retrieval (개인화 정보 검색에 대한 연구)

  • Kim, Tae-Hwan;Jeon, Ho-Chul;Choi, Joong-Min
    • 한국HCI학회:학술대회논문집
    • /
    • 2008.02a
    • /
    • pp.683-687
    • /
    • 2008
  • Many search algorithms have been implemented by many researchers on the world wide web. One of the best algorithms is Google using PageRank technology, PageRank approach, computes the number of inlink of each documents then represents documents in order of many inlink. But It is difficult to find the results that user needs. Because this method finds documents not valueable for a person but valueable for public, this paper propose a personalized search engine mixed public with personal worth to solve this problem.

  • PDF

Optimized Multi Agent Personalized Search Engine

  • DishaVerma;Barjesh Kochar;Y. S. Shishodia
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.9
    • /
    • pp.150-156
    • /
    • 2024
  • With the advent of personalized search engines, a myriad of approaches came into practice. With social media emergence the personalization was extended to different level. The main reason for this preference of personalized engine over traditional search was need of accurate and precise results. Due to paucity of time and patience users didn't want to surf several pages to find the result that suits them most. Personalized search engines could solve this problem effectively by understanding user through profiles and histories and thus diminishing uncertainty and ambiguity. But since several layers of personalization were added to basic search, the response time and resource requirement (for profile storage) increased manifold. So it's time to focus on optimizing the layered architectures of personalization. The paper presents a layout of the multi agent based personalized search engine that works on histories and profiles. Further to store the huge amount of data, distributed database is used at its core, so high availability, scaling, and geographic distribution are built in and easy to use. Initially results are retrieved using traditional search engine, after applying layer of personalization the results are provided to user. MongoDB is used to store profiles in flexible form thus improving the performance of the engine. Further Weighted Sum model is used to rank the pages in personalization layer.

Analysis of Search Engine Use, Search Behaviors and Aptitude by Web Users (웹 이용자의 검색엔진 활용 및 탐색 행위와 성향 분석)

  • Rieh, Hae-Young
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.36 no.3
    • /
    • pp.69-91
    • /
    • 2002
  • This study examines overall user experience associated with Web search engine use including selection, usage of search features, evaluation. The data were collected through individual interviews with 28 faculty members and graduate students. It was found that users tend to select a search engine based on experience and knowledge of certain features and familiarity with an engine itself more than based on previous experience with search results. The results showed the users had mixed opinions regarding cross language retrieval while they did not believe the usage of operators effect the search results. It appears that users are interested in interface design as well as the accuracy of search results.

Implementation on the Filters Using Color and Intensity for the Content based Image Retrieval (내용기반 영상검색을 위한 색상과 휘도 정보를 이용한 필터 구현)

  • Noh, Jin-Soo;Baek, Chang-Hui;Rhee, Kang-Hyeon
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.44 no.1
    • /
    • pp.122-129
    • /
    • 2007
  • As the availability of an image information has been significantly increasing, necessity of system that can manage an image information is increasing. Accordingly, we proposed the content-based image retrieval(CBIR) method based on an efficient combination of a color feature and an image's shape and position information. As a color feature, a HSI color histogram is chosen which is known to measure spatial of colors well. Shape and position information are obtained using Hu invariant moments in the luminance of HSI model. For efficient similarity computation, the extracted features(Color histogram, Hu invariant moments) are combined and then measured precision. As a experiment result using DB that was supported by http://www.freefoto.com, the proposed image search engine has 93% precision and can apply successfully image retrieval applications.

An Art Image Retrieval System Using Ontology Reasoning Engine (온톨로지 추론 엔진을 이용한 미술 작품 검색 시스템)

  • 한상진;조우상;이복주
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2004.10a
    • /
    • pp.139-141
    • /
    • 2004
  • 본 논문에서는 웹 상의 미술 작품 검색하기 위해 기존의 키워드 매칭 검색 대신에 시맨틱 기반의 확장된 검색 방법을 소개한다. 온톨로지를 만들기 위해서는 많은 온톨로지 관련 언어가 있다. 그 중 최근의 연구 방향은 RDFS/RDF와 OWL로 작성된 온톨로지와 온톨로지의 추론 분야이다. 지금까지의 정보 검색이 단순한 구문중심의 검색이었다면 앞으로의 정보 검색은 의미 중심의 지식 기반의 정보 검색을 발전할 것이다. 이에 본 논문이 온톨로지를 활용한 지식 기반 검색 시스템을 제안한다.

  • PDF

Optimized Structures with Hop Constraints for Web Information Retrieval (Hop 제약조건이 고려된 최적화 웹정보검색)

  • Lee, Woo-Key;Kim, Ki-Baek;Lee, Hwa-Ki
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.33 no.4
    • /
    • pp.63-82
    • /
    • 2008
  • The explosively growing attractiveness of the Web is commencing significant demands for a structuring analysis on various web objects. The larger the substantial number of web objects are available, the more difficult for the clients(i.e. common web users and web robots) and the servers(i.e. Web search engine) to retrieve what they really want. We have in mind focusing on the structure of web objects by introducing optimization models for more convenient and effective information retrieval. For this purpose, we represent web objects and hyperlinks as a directed graph from which the optimal structures are derived in terms of rooted directed spanning trees and Top-k trees. Computational experiments are executed for synthetic data as well as for real web sites' domains so that the Lagrangian Relaxation approaches have exploited the Top-k trees and Hop constraint resolutions. In the experiments, our methods outperformed the conventional approaches so that the complex web graph can successfully be converted into optimal-structured ones within a reasonable amount of computation time.