• Title/Summary/Keyword: web Indexing

Search Result 113, Processing Time 0.021 seconds

Semantic Conceptual Relational Similarity Based Web Document Clustering for Efficient Information Retrieval Using Semantic Ontology

  • Selvalakshmi, B;Subramaniam, M;Sathiyasekar, K
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.9
    • /
    • pp.3102-3119
    • /
    • 2021
  • In the modern rapid growing web era, the scope of web publication is about accessing the web resources. Due to the increased size of web, the search engines face many challenges, in indexing the web pages as well as producing result to the user query. Methodologies discussed in literatures towards clustering web documents suffer in producing higher clustering accuracy. Problem is mitigated using, the proposed scheme, Semantic Conceptual Relational Similarity (SCRS) based clustering algorithm which, considers the relationship of any document in two ways, to measure the similarity. One is with the number of semantic relations of any document class covered by the input document and the second is the number of conceptual relation the input document covers towards any document class. With a given data set Ds, the method estimates the SCRS measure for each document Di towards available class of documents. As a result, a class with maximum SCRS is identified and the document is indexed on the selected class. The SCRS measure is measured according to the semantic relevancy of input document towards each document of any class. Similarly, the input query has been measured for Query Relational Semantic Score (QRSS) towards each class of documents. Based on the value of QRSS measure, the document class is identified, retrieved and ranked based on the QRSS measure to produce final population. In both the way, the semantic measures are estimated based on the concepts available in semantic ontology. The proposed method had risen efficient result in indexing as well as search efficiency also has been improved.

Index Ontology Repository for Video Contents (비디오 콘텐츠를 위한 색인 온톨로지 저장소)

  • Hwang, Woo-Yeon;Yang, Jung-Jin
    • Journal of Korea Multimedia Society
    • /
    • v.12 no.10
    • /
    • pp.1499-1507
    • /
    • 2009
  • With the abundance of digital contents, the necessity of precise indexing technology is consistently required. To meet these requirements, the intelligent software entity needs to be the subject of information retrieval and the interoperability among intelligent entities including human must be supported. In this paper, we analyze the unifying framework for multi-modality indexing that Snoek and Worring proposed. Our work investigates the method of improving the authenticity of indexing information in contents-based automated indexing techniques. It supports the creation and control of abstracted high-level indexing information through ontological concepts of Semantic Web skills. Moreover, it attempts to present the fundamental model that allows interoperability between human and machine and between machine and machine. The memory-residence model of processing ontology is inappropriate in order to take-in an enormous amount of indexing information. The use of ontology repository and inference engine is required for consistent retrieval and reasoning of logically expressed knowledge. Our work presents an experiment for storing and retrieving the designed knowledge by using the Minerva ontology repository, which demonstrates satisfied techniques and efficient requirements. At last, the efficient indexing possibility with related research is also considered.

  • PDF

An Implementation and Performance Evaluation of Fast Web Crawler with Python

  • Kim, Cheong Ghil
    • Journal of the Semiconductor & Display Technology
    • /
    • v.18 no.3
    • /
    • pp.140-143
    • /
    • 2019
  • The Internet has been expanded constantly and greatly such that we are having vast number of web pages with dynamic changes. Especially, the fast development of wireless communication technology and the wide spread of various smart devices enable information being created at speed and changed anywhere, anytime. In this situation, web crawling, also known as web scraping, which is an organized, automated computer system for systematically navigating web pages residing on the web and for automatically searching and indexing information, has been inevitably used broadly in many fields today. This paper aims to implement a prototype web crawler with Python and to improve the execution speed using threads on multicore CPU. The results of the implementation confirmed the operation with crawling reference web sites and the performance improvement by evaluating the execution speed on the different thread configurations on multicore CPU.

Implications of Social Tagging for Digital Libraries: Benefiting from User Collaboration in the Creation of Digital Knowledge (디지털 도서관을 위한 소셜 태깅의 의미: 이용자 협력을 활용한 디지털 지식 생성)

  • Choi, Yun-Seon
    • Journal of the Korean Society for information Management
    • /
    • v.27 no.2
    • /
    • pp.225-239
    • /
    • 2010
  • This study aims to answer whether social tagging through user collaboration could be utilized for the creation of digital knowledge of the web, and whether we could verify the quality and efficacy of social tagging to obtain benefits from it. In particular, this paper examines the inter-indexer consistency of social tagging in comparison to professional indexing. It employs two different similarity measures, both of which are based on the Vector Space Model to deal with numerous indexers. It contributes to the utilization of social tagging in the organization of the web, and encourages to adopt social knowledge in developing suitable vocabularies for resources newly generated in the digital library environment. Furthermore, the comparative analysis with two different measures produced more credible results by illustrating a similar pattern of indexing tendency in both measures.

Effective Indexing for Evolving Data Collection by Using Ontology (온톨로지를 이용한 변화하는 데이터의 효과적인 인덱싱 방법)

  • Kim, Jong Wook;Bae, Myung Soo
    • Journal of Korea Multimedia Society
    • /
    • v.17 no.2
    • /
    • pp.240-247
    • /
    • 2014
  • Data which is created and shared on the Web is characterized by the massive amount of user generated content on various applications and dynamically evolving content on the basis of user interests. Thus, in order to benefit from Web data, it is essential to provide (a) the mechanisms which enable scalable processing of large data collections and (b) the organization schemes which reduce the navigational overhead within complex and dynamically growing content. Between these two impending needs, in this paper, we are interested in developing an indexing scheme which aims to reduce the time and effort needed to access the relevant piece of information by leveraging ontologies. In particular, considering evolving nature of Web contents, the proposed technique in this paper computes the sub-ontology, which best matches a given data collection, from the existing large size of ontology. Case studies show that the proposed indexing scheme in this paper indeed helps organize dynamically evolving content.

A Study of Retrieval Model Providing Relevant Sentences in Storytelling on Semantic Web (시맨틱 웹 환경에서 적합한 문장을 제공하는 이야기 쓰기 도우미에 관한 연구)

  • Lee, Tae-Young
    • Journal of the Korean Society for information Management
    • /
    • v.26 no.4
    • /
    • pp.7-34
    • /
    • 2009
  • Structures of stories, paragraphs, and sentences and inferences applied to indexing and searching were studied to construct the full-text and sentence retrieval system for storytelling. The system designed the database of stories, paragraphs, and sentences and the knowledge-base of inference rules to aid to write the story. The Knowledge-base comprised the files of story frames, paragraph scripts, and sentence logics made by mark-up languages like SWRL etc. able to operate in semantic web. It is necessary to establish more precise indexing language represented the sentences and to create a mark-up languages able to construct more accurate inference rules.

A Study on Analysis of Requirements and Design of IR System for Semantic-based Information Retrieval (시멘틱 검색시스템 구축을 위한 요구사항 분석 및 설계에 관한 연구)

  • Kim, Yong
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.23 no.1
    • /
    • pp.91-111
    • /
    • 2012
  • With the rapid expansion of web information, conventional information retrieval techniques are becoming inadequate for users and often result in disappointment, because a couple of simple keywords can easily produce information too much. This study aims at the development of Web information retrieval techniques based on semantics to improve the quality of understanding for information. To achieve the goal, this study analyzes technologies and current status of researches on semantic information retrieval. With the results which are requirements, system architecture and indexing method, this study proposes the system architecture of semantic-based information retrieval system.

An Efficient Information Retrieval System for Unstructured Data Using Inverted Index

  • Abdullah Iftikhar;Muhammad Irfan Khan;Kulsoom Iftikhar
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.7
    • /
    • pp.31-44
    • /
    • 2024
  • The inverted index is combination of the keywords and posting lists associated for indexing of document. In modern age excessive use of technology has increased data volume at a very high rate. Big data is great concern of researchers. An efficient Document indexing in big data has become a major challenge for researchers. All organizations and web engines have limited number of resources such as space and storage which is very crucial in term of data management of information retrieval system. Information retrieval system need to very efficient. Inverted indexing technique is introduced in this research to minimize the delay in retrieval of data in information retrieval system. Inverted index is illustrated and then its issues are discussed and resolve by implementing the scalable inverted index. Then existing algorithm of inverted compared with the naïve inverted index. The Interval list of inverted indexes stores on primary storage except of auxiliary memory. In this research an efficient architecture of information retrieval system is proposed particularly for unstructured data which don't have a predefined structure format and data volume.

A Exploratory Study on the Expansion of Academic Information Services Based on Automatic Semantic Linking Between Academic Web Resources and Information Services (웹 정보의 자동 의미연계를 통한 학술정보서비스의 확대 방안 연구)

  • Jeong, Do-Heon;Yu, So-Young;Kim, Hwan-Min;Kim, Hye-Sun;Kim, Yong-Kwang;Han, Hee-Jun
    • Journal of Information Management
    • /
    • v.40 no.1
    • /
    • pp.133-156
    • /
    • 2009
  • In this study, we link informal Web resources to KISTI NDSL's collections using automatic semantic indexing and tagging to examine the possibility of the service which recommends related documents using the similarity between KISTI's formal information resources and informal web resources. We collect and index Web resources and make automatic semantic linking through STEAK with KISTI's collections for NDSL retrieval. The macro precision which shows retrieval precision per a subject category is 62.6% and the micro precision which shows retrieval precision per a query is 66.9%. The experts' evaluation score is 76.7. This study shows the possibility of semantic linking NDSL retrieval results with Web information resources and expanding information services' coverage to informal information resources.

A Study on Christian Website Indexing (기독교 관련 웹 사이트 내 색인에 관한 연구)

  • Yoo, Yeong-Jun
    • Journal of Korean Library and Information Science Society
    • /
    • v.38 no.4
    • /
    • pp.257-276
    • /
    • 2007
  • Back-of-book-style indexes have a similar function as back-of-book indexes. The best advantage o4 back-of-book-style indexes for Information access on the web is to give direct access to specific subjects of interest. Though back-of-book-style indexes are alphabetically arranged as back-of-book indexes, they have linked index entries to contents on the site by using a anchor tag of HTML. In this research, I have created back-of-book-style indexes in two separated ways, by hand-crafted and semi-automatic Indexing. We have utilized back-of-book-style indexes, that is similar to back-of-book index of traditional information organization method of library and information science, in library circumstances.

  • PDF