• Title/Summary/Keyword: Semantic Indexing

Search Result 82, Processing Time 0.016 seconds

A Study on Creation and Development of Folksonomy Tags on LibraryThing (폭소노미 태그의 생성과 성장에 관한 연구 - LibraryThing을 중심으로 -)

  • Kim, Dong-Suk;Chung, Yeon-Kyoung
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.44 no.4
    • /
    • pp.203-230
    • /
    • 2010
  • This study analyzed the development and growth of folksonomy by examining tags associated with 40 bestsellers on LibraryThing.com in 6-month intervals. It was found that tag values do not decrease but grow in terms of quantity and quality. Accordingly, we examined the major significances of the tags and their potential utilization as an expression of subjects. Our findings were as follows. First, the motivations for tagging can be categorized into personal information for search purposes, self-fulfillment such as sense of achievement, display of emotion and sharing of one's experience with others, or an altruistic objective that emphasizes sociality with a desire that one's actions might provide social benefits. According to our analysis, 74.12% of tags had a social motivation. Second, the total number of tags and the frequency of usage increased with time. Third, the categories that showed a high increase in tag usage were dates of publication and reading, key words, main characters, and book reviews. Tags related to subjects had the highest ratio. Fourth, among Library of Congress Subject Headings (LCSH), multiple genres, key words and main characters were assigned to books, and specific key words and other properties were added as time progressed. There was also a slight increase in the number of tags consistent with LCSH. Fifth, we found that key tags could serve as a compilation of terms that reflects the knowledge base of the corresponding era. Thus, folksonomy should be continuously monitored for its quantitative and qualitative development of the tags to make improvements on its formative disadvantages, and identify internal semantic significance, be actively utilized in conjunction with taxonomy as a flexible compilation of terms that incorporate the history of a specific era.

Methods for Integration of Documents using Hierarchical Structure based on the Formal Concept Analysis (FCA 기반 계층적 구조를 이용한 문서 통합 기법)

  • Kim, Tae-Hwan;Jeon, Ho-Cheol;Choi, Joong-Min
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.3
    • /
    • pp.63-77
    • /
    • 2011
  • The World Wide Web is a very large distributed digital information space. From its origins in 1991, the web has grown to encompass diverse information resources as personal home pasges, online digital libraries and virtual museums. Some estimates suggest that the web currently includes over 500 billion pages in the deep web. The ability to search and retrieve information from the web efficiently and effectively is an enabling technology for realizing its full potential. With powerful workstations and parallel processing technology, efficiency is not a bottleneck. In fact, some existing search tools sift through gigabyte.syze precompiled web indexes in a fraction of a second. But retrieval effectiveness is a different matter. Current search tools retrieve too many documents, of which only a small fraction are relevant to the user query. Furthermore, the most relevant documents do not nessarily appear at the top of the query output order. Also, current search tools can not retrieve the documents related with retrieved document from gigantic amount of documents. The most important problem for lots of current searching systems is to increase the quality of search. It means to provide related documents or decrease the number of unrelated documents as low as possible in the results of search. For this problem, CiteSeer proposed the ACI (Autonomous Citation Indexing) of the articles on the World Wide Web. A "citation index" indexes the links between articles that researchers make when they cite other articles. Citation indexes are very useful for a number of purposes, including literature search and analysis of the academic literature. For details of this work, references contained in academic articles are used to give credit to previous work in the literature and provide a link between the "citing" and "cited" articles. A citation index indexes the citations that an article makes, linking the articleswith the cited works. Citation indexes were originally designed mainly for information retrieval. The citation links allow navigating the literature in unique ways. Papers can be located independent of language, and words in thetitle, keywords or document. A citation index allows navigation backward in time (the list of cited articles) and forwardin time (which subsequent articles cite the current article?) But CiteSeer can not indexes the links between articles that researchers doesn't make. Because it indexes the links between articles that only researchers make when they cite other articles. Also, CiteSeer is not easy to scalability. Because CiteSeer can not indexes the links between articles that researchers doesn't make. All these problems make us orient for designing more effective search system. This paper shows a method that extracts subject and predicate per each sentence in documents. A document will be changed into the tabular form that extracted predicate checked value of possible subject and object. We make a hierarchical graph of a document using the table and then integrate graphs of documents. The graph of entire documents calculates the area of document as compared with integrated documents. We mark relation among the documents as compared with the area of documents. Also it proposes a method for structural integration of documents that retrieves documents from the graph. It makes that the user can find information easier. We compared the performance of the proposed approaches with lucene search engine using the formulas for ranking. As a result, the F.measure is about 60% and it is better as about 15%.