• Title/Summary/Keyword: web Indexing

Search Result 113, Processing Time 0.021 seconds

k-Bitmap Clustering Method for XML Data based on Relational DBMS (관계형 DBMS 기반의 XML 데이터를 위한 k-비트맵 클러스터링 기법)

  • Lee, Bum-Suk;Hwang, Byung-Yeon
    • The KIPS Transactions:PartD
    • /
    • v.16D no.6
    • /
    • pp.845-850
    • /
    • 2009
  • Use of XML data has been increased with growth of Web 2.0 environment. XML is recognized its advantages by using based technology of RSS or ATOM for transferring information from blogs and news feed. Bitmap clustering is a method to keep index in main memory based on Relational DBMS, and which performed better than the other XML indexing methods during the evaluation. Existing method generates too many clusters, and it causes deterioration of result of searching quality. This paper proposes k-Bitmap clustering method that can generate user defined k clusters to solve above-mentioned problem. The proposed method also keeps additional inverted index for searching excluded terms from representative bits of k-Bitmap. We performed evaluation and the result shows that the users can control the number of clusters. Also our method has high recall value in single term search, and it guarantees the searching result includes all related documents for its query with keeping two indices.

A Study on Radiological Image Retrieval System (방사선 의료영상 검색 시스템에 관한 연구)

  • Park, Byung-Rae;Shin, Yong-Won
    • Journal of radiological science and technology
    • /
    • v.28 no.1
    • /
    • pp.19-24
    • /
    • 2005
  • The purpose of this study was to design and implement a useful annotation-based Radiological image retrieval system to accurately determine on education and image information for Radiological technologists. For better retrieval performance based on large image databases, we presented an indexing technique that integrated $B^+-tree$ proposed by Bayer for indexing simple attributes and inverted file structure for text medical keywords acquired from additional description information about Radiological images. In our results, we implemented proposed retrieval system with Delphi under Windows XP environment. End users, Radiological technologists, are able to store simple attributes information such as doctor name, operator name, body parts, disease and so on, additional text-based description information, and Radiological image itself as well as to retrieve wanted results by using simple attributes and text keywords from large image databases by graphic user interface. Consequently proposed system can be used for effective clinical decision on Radiological image, reduction of education time by organizing the knowledge, and well organized education in the clinical fields. In addition, It can be expected to develop as decision support system by constructing web-based integrated imaging system included general image and special contrast image for the future.

  • PDF

Methods for Integration of Documents using Hierarchical Structure based on the Formal Concept Analysis (FCA 기반 계층적 구조를 이용한 문서 통합 기법)

  • Kim, Tae-Hwan;Jeon, Ho-Cheol;Choi, Joong-Min
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.3
    • /
    • pp.63-77
    • /
    • 2011
  • The World Wide Web is a very large distributed digital information space. From its origins in 1991, the web has grown to encompass diverse information resources as personal home pasges, online digital libraries and virtual museums. Some estimates suggest that the web currently includes over 500 billion pages in the deep web. The ability to search and retrieve information from the web efficiently and effectively is an enabling technology for realizing its full potential. With powerful workstations and parallel processing technology, efficiency is not a bottleneck. In fact, some existing search tools sift through gigabyte.syze precompiled web indexes in a fraction of a second. But retrieval effectiveness is a different matter. Current search tools retrieve too many documents, of which only a small fraction are relevant to the user query. Furthermore, the most relevant documents do not nessarily appear at the top of the query output order. Also, current search tools can not retrieve the documents related with retrieved document from gigantic amount of documents. The most important problem for lots of current searching systems is to increase the quality of search. It means to provide related documents or decrease the number of unrelated documents as low as possible in the results of search. For this problem, CiteSeer proposed the ACI (Autonomous Citation Indexing) of the articles on the World Wide Web. A "citation index" indexes the links between articles that researchers make when they cite other articles. Citation indexes are very useful for a number of purposes, including literature search and analysis of the academic literature. For details of this work, references contained in academic articles are used to give credit to previous work in the literature and provide a link between the "citing" and "cited" articles. A citation index indexes the citations that an article makes, linking the articleswith the cited works. Citation indexes were originally designed mainly for information retrieval. The citation links allow navigating the literature in unique ways. Papers can be located independent of language, and words in thetitle, keywords or document. A citation index allows navigation backward in time (the list of cited articles) and forwardin time (which subsequent articles cite the current article?) But CiteSeer can not indexes the links between articles that researchers doesn't make. Because it indexes the links between articles that only researchers make when they cite other articles. Also, CiteSeer is not easy to scalability. Because CiteSeer can not indexes the links between articles that researchers doesn't make. All these problems make us orient for designing more effective search system. This paper shows a method that extracts subject and predicate per each sentence in documents. A document will be changed into the tabular form that extracted predicate checked value of possible subject and object. We make a hierarchical graph of a document using the table and then integrate graphs of documents. The graph of entire documents calculates the area of document as compared with integrated documents. We mark relation among the documents as compared with the area of documents. Also it proposes a method for structural integration of documents that retrieves documents from the graph. It makes that the user can find information easier. We compared the performance of the proposed approaches with lucene search engine using the formulas for ranking. As a result, the F.measure is about 60% and it is better as about 15%.

A Study on the Advanced Electronic Book System Based in Web (웹기반의 전자원문 관리 시스템에 관한 연구)

  • Nam, Young-Joon;Jeong, Eui-Seob;Yoo, Jae-Young;Cho, Hyun-Yang
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.16 no.2
    • /
    • pp.139-156
    • /
    • 2005
  • In this paper, we design and implement electronic book system providing web-based interface for the ebook. The aim of this study is to optimize the effective reading and management of electronic text for its users(readers and librarians). Advanced functions of the electronic book system are the following: 1) Electronic book system is not dependent to specific software and tool. 2) Electronic book system is able to. minimize images(table, image, icon etc) to improve the meaning and readability of information. 3) Electronic book system is able to reduce the effort for indexing extraction and constructing the table of content. 4) The system is able to collect the user log files that are created during the process of reading ebook from various points of view. 5) When reading, the system uses the DRM through decoding and encoding the ebook.

  • PDF

Enhanced Method for Person Name Retrieval in Academic Information Service (학술정보서비스에서 인명검색 고도화 방법)

  • Han, Hee-Jun;Yae, Yong-Hee;You, Beom-Jong
    • The Journal of the Korea Contents Association
    • /
    • v.10 no.2
    • /
    • pp.490-498
    • /
    • 2010
  • In the web or not, all academic information have the creator which produces that information. The creator can be individual, organization, institution, or country. Most information consist of the title, author and content. The article among academic information is described by title, author, keywords, abstract, publisher, ISSN(International Standard Serial Number) and etc., and the patent information is consisted some metadata such as invention title, applicant, inventors, agents, application number, claim items etc. Most web-based academic information services provide search functions to user by processing and handling these metadata, and the search function using the author field is important. In this paper, we propose an effective indexing management for person name search, and search techniques using boosting factor and near operation based on phrase search to improve precision rate of search result. And we describe person name retrieval result with another expression name, co-authors and persons in same research field. The approach presented in this paper provides accurate data and additional search results to user efficiently.

Digital Competencies Required for Information Science Specialists at Saudi Universities

  • Yamani, Hanaa;AlHarthi, Ahmed;Elsigini, Waleed
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.2
    • /
    • pp.212-220
    • /
    • 2021
  • The objectives of this research were to identify the digital competencies required for information science specialists at Saudi universities and to examine whether there existed conspicuous differences in the standpoint of these specialists due to years of work experience with regard to the importance of these competencies. A descriptive analytical method was used to accomplish these objectives while extracting the required digital competency list and ascertaining its importance. The research sample comprised 24 experts in the field of information science from several universities in the Kingdom of Saudi Arabia. The participants in the sample were asked to complete a questionnaire prepared to acquire the pertinent data in the period between January 5, 2021 and January 20, 2021. The results reveal that the digital competencies required for information science specialists at Saudi universities encompass general features such as the ability to use computer, Internet, Web2, Web3, and smartphone applications, digital learning resource development, data processing (big data) and its sharing via the Internet, system analysis, dealing with multiple electronic indexing applications and learning management systems and its features, using electronic bibliographic control tools, artificial intelligence tools, cybersecurity system maintenance, ability to comprehend and use different programming languages, simulation, and augmented reality applications, and knowledge and skills for 3D printing. Furthermore, no statistically significant differences were observed between the mean ranks of scores of specialists with less than 10 years of practical experience and those with practical experience of 10 years or more with regard to conferring importance to digital competencies.

TFSCAN 검색 프로그램 TFSCAN의 개발

  • Lee, Byung-Uk;Park, Kie-Jung;Kim, Ki-Bong;Park, Wan;Park, Yong-Ha
    • Microbiology and Biotechnology Letters
    • /
    • v.24 no.3
    • /
    • pp.371-375
    • /
    • 1996
  • TFD is a transcription factor database which consists of short functional DNA sequences called as signals and their references. SIGNAL SCAN, developed by Dan S. Prestridge, is used to determine what signals of TFD may exist in a DNA sequence. This program searches TFD database by using a simple algorithm for character string comparison. We developed TFSCAN that aims at searching for signals in an input DNA sequence more efficently than SIGNAL SCAN. Our algorithms consist of two parts, one constructs an automata by scanning sequences of rFD, the other searches for signals through this automata. Searching for signal-related references is radically improved in time by using an indexing method. Usage of TFSCAN is very simple and its output is obvious. We developed and installed a TFSCAN input form and a CGI program in GINet Web server, to use TFSCAN. The algorithm applying automata showed drastical results in improvement of computing time. This approach may apply to recognizing several biological patterns. We have been developing our algorithm to optimize the automata and to search more sensitively for signals.

  • PDF

Shape-Based Leaf Image Retrieval System (모양 기반의 식물 잎 이미지 검색 시스템)

  • Nam Yun-Young;Hwang Een-Jun
    • The KIPS Transactions:PartD
    • /
    • v.13D no.1 s.104
    • /
    • pp.29-36
    • /
    • 2006
  • In this paper, we present a leaf image retrieval system that represents and retrieves leaf images based on their shape. For more effective representation of leaf images, we improved an existing MPP algorithm. Also, in order to reduce the response time, we proposed a new dynamic matching algorithm at basically revises the Nearest Neighbor search. The system provides users with an interface for uploading query images or tools to generate queries based on shape features and retrieves images based on their similarity. For convenience, users are allowed to easily query images by sketching leaf shape or leaf arrangement on the web. In the experiment, we constructed an image database of Korean native plants and measured the system performance by counting the number of similar images retrieved for queries.

Equivalence Heuristics for Malleability-Aware Skylines

  • Lofi, Christoph;Balke, Wolf-Tilo;Guntzer, Ulrich
    • Journal of Computing Science and Engineering
    • /
    • v.6 no.3
    • /
    • pp.207-218
    • /
    • 2012
  • In recent years, the skyline query paradigm has been established as a reliable method for database query personalization. While early efficiency problems have been solved by sophisticated algorithms and advanced indexing, new challenges in skyline retrieval effectiveness continuously arise. In particular, the rise of the Semantic Web and linked open data leads to personalization issues where skyline queries cannot be applied easily. We addressed the special challenges presented by linked open data in previous work; and now further extend this work, with a heuristic workflow to boost efficiency. This is necessary; because the new view on linked open data dominance has serious implications for the efficiency of the actual skyline computation, since transitivity of the dominance relationships is no longer granted. Therefore, our contributions in this paper can be summarized as: we present an intuitive skyline query paradigm to deal with linked open data; we provide an effective dominance definition, and establish its theoretical properties; we develop innovative skyline algorithms to deal with the resulting challenges; and we design efficient heuristics for the case of predicate equivalences that may often happen in linked open data. We extensively evaluate our new algorithms with respect to performance, and the enriched skyline semantics.

A Comparative Study of WWW Search Engine Performance (WWW 탐색도구의 색인 및 탐색 기능 평가에 관한 연구)

  • Chung Young-Mee;Kim Seong-Eun
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.31 no.1
    • /
    • pp.153-184
    • /
    • 1997
  • The importance of WWW search services is increasing as Internet information resources explode. An evaluation of current 9 search services was first conducted by comparing descriptively the features concerning indexing, searching, and ranking of search results. Secondly, a couple of search queries were used to evaluate search performance of those services by the measures of retrieval effectiveness. the degree of overlap in searching sites, and the degree of similarity between services. In this experiment, Alta Vista, HotBot and Open Text Index showed better results for the retrieval effectiveness. The level of similarity among the 9 search services was extremely low.

  • PDF