• Title/Summary/Keyword: web structure

Search Result 1,303, Processing Time 0.075 seconds

Spamming page filtering algorithm using Web structure management management (Web Structure Management기법을 이용한 Spamming page filtering algorithm)

  • 신광섭;이우기;강석호
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2004.04b
    • /
    • pp.238-240
    • /
    • 2004
  • 정보 통신 기술의 발달로 엄청난 양의 정보가 World Wide Web을 통해 저장되고 공유된다. 특히, 사용자가 WWW을 이용하여 필요한 정보를 얻고자할 때, 가장 많이 사용되는 것이 Web search engine이다. 그러나 Web search engine의 algorithm 자체의 부정확성과 악의적으로 작성된 Web page로 인해 search engine 결과가 사용자의 요구와 일치하지 못하는 문제가 발생한다. 본 논문에서는 여러 Web search algorithm 중에서 Web structure management 기법을 중심으로 문제점을 분석하고 이를 해결할 수 있는 수정된 algorithm을 제시한다. 마지막으로 제시된 algorithm이 spamming page를 filtering하는 과정을 예시하여 논증한다.

  • PDF

Optimization Model on the World Wide Web Organization with respect to Content Centric Measures (월드와이드웹의 내용기반 구조최적화)

  • Lee Wookey;Kim Seung;Kim Hando;Kang Sukho
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.30 no.1
    • /
    • pp.187-198
    • /
    • 2005
  • The structure of a Web site can prevent the search robots or crawling agents from confusion in the midst of huge forest of the Web pages. We formalize the view on the World Wide Web and generalize it as a hierarchy of Web objects such as the Web as a set of Web sites, and a Web site as a directed graph with Web nodes and Web edges. Our approach results in the optimal hierarchical structure that can maximize the weight, tf-idf (term frequency and inverse document frequency), that is one of the most widely accepted content centric measures in the information retrieval community, so that the measure can be used to embody the semantics of search query. The experimental results represent that the optimization model is an effective alternative in the dynamically changing Web environment by replacing conventional heuristic approaches.

Web Navigation Mining by Integrating Web Usage Data and Hyperlink Structures (웹 사용 데이타와 하이퍼링크 구조를 통합한 웹 네비게이션 마이닝)

  • Gu Heummo;Choi Joongmin
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.5
    • /
    • pp.416-427
    • /
    • 2005
  • Web navigation mining is a method of discovering Web navigation patterns by analyzing the Web access log data. However, it is admitted that the log data contains noisy information that leads to the incorrect recognition of user navigation path on the Web's hyperlink structure. As a result, previous Web navigation mining systems that exploited solely the log data have not shown good performance in discovering correct Web navigation patterns efficiently, mainly due to the complex pre-processing procedure. To resolve this problem, this paper proposes a technique of amalgamating the Web's hyperlink structure information with the Web access log data to discover navigation patterns correctly and efficiently. Our implemented Web navigation mining system called SPMiner produces a WebTree from the hyperlink structure of a Web site that is used trl eliminate the possible noises in the Web log data caused by the user's abnormal navigational activities. SPMiner remarkably reduces the pre-processing overhead by using the structure of the Web, and as a result, it could analyze the user's search patterns efficiently.

An Efficient Candidate Pattern Storage Tree Structure and Algorithm for Incremental Web Mining (점진적인 웹 마이닝을 위한 효율적인 후보패턴 저장 트리구조 및 알고리즘)

  • Kang, Hee-Seong;Park, Byung-Jun
    • Proceedings of the KIEE Conference
    • /
    • 2006.04a
    • /
    • pp.3-5
    • /
    • 2006
  • Recent advances in the internet infrastructure have resulted in a large number of huge Web sites and portals worldwide. These Web sites are being visited by various types of users in many different ways. Among all the web page access sequences from different users, some of them occur so frequently that may need an attention from those who are interested. We call them frequent access patterns and access sequences that can be frequent the candidate patterns. Since these candidate patterns play an important role in the incremental Web mining, it is important to efficiently generate, add, delete, and search for them. This thesis presents a novel tree structure that can efficiently store the candidate patterns and a related set of algorithms for generating the tree structure adding new patterns, deleting unnecessary patterns, and searching for the needed ones. The proposed tree structure has a kind of the 3 dimensional link structure and its nodes are layered.

  • PDF

Characterization of nano-fiber web structures using a morphological image processing

  • Kim, Jooyong;Lee, Jung-Hae
    • Proceedings of the Korean Fiber Society Conference
    • /
    • 2003.10a
    • /
    • pp.100-100
    • /
    • 2003
  • An image processing algorithm has been developed in order to analyze the nanofiber web images obtained from a high magnification microscope. It has been known that precise pore detection on thick webs is extremely difficult mainly due to lack of light uniformity, difficulty of fine focusing and translucency of nanofiber web. The pore detection algorithm developed has been found to show excellent performance in characterizing the porous structure, thus being a promising tool for on-line quality control system under mass production. Since the images obtained from an optical microscope represent only web surface, a scale factor has been introduced to estimate the web structure as a whole. Resulting web structures have been compared to those by mercury porosimetry, especially in pore size distribution. It has been shown that those two structures have a strong correlation, indicating that scaling of a single layer web structure can be an effective way of estimating the structure of thick fiber webs.

  • PDF

Web-Based Simulation under Distributed Environment (분산 환경하에서의 웹기반 시뮬레이션에 관한 연구)

  • 이영해
    • Journal of the Korea Society for Simulation
    • /
    • v.7 no.2
    • /
    • pp.79-90
    • /
    • 1998
  • This paper introduces the concept of web-based simulation and suggests the structure of web-based simulation which reduces the simulation run time and performs simulations in efficient way under distributed environments. Since its introducing in 1996, web-based simulation has been studied only with a tool of applet, but in this paper a method of server applications for client applets will be used. In server application, server transfers objects requested by clients such as simulation engines, reports, files. After each client connects to web-server, and then server allocates simulation modules to connected clients. These work magnify the transferring applets from server and simulation models which were made by clients. This paper also processes a structure for managing efficiently web-based simulation under distributed environment and steps in which clients connect, model, simulate with distributed structure, and programs of proposed structure.

  • PDF

The Structure of a Web site and Navigability (웹 사이트의 구조와 항해가능성)

  • Min, Kyung-Sil;Chun, Sung-Kyu;Jang, Gi-Ho;Jung, Hyo-Sook;Park, Seong-Bin
    • The Journal of Korean Association of Computer Education
    • /
    • v.14 no.3
    • /
    • pp.51-62
    • /
    • 2011
  • Navigability refers to how easy a user can find desired information in a web site and is influenced by the structure of a web site. In this paper, we created three types of Web sites, that is a Web site whose structure forms a small world, a Web site whose structure forms a semi-matroid, and a Web site based on an ontology and measured the navigability of each Web site based on two criteria (the number of hyperlinks clicked by users to find the desired information and the elapsed time for finding the desired information). The reason that we selected three structures is because hyperlinks can be created in a way that helps a user find desired information in each site. From the experiments, we found that the average number of hyperlinks which a user clicked to find out the desired information was as follows: a Web site that had semi-matroid property (100.37 hyperlinks) < a Web site that was created based on an ontology (117.63 hyperlinks) < a Web site that had small-world property (236.17 hyperlinks). In addition, we found that the average elapsed time during which a user found out the desired information was as follows: a Web site that was created based on an ontology (20 min 26 sec) < a Web site that had semi-matroid property (23 min 6 sec) < a Web site that had small-world property (30 min 47 sec). Therefore, we can consider a Web site that is created based on a semi-matroid or an ontology is relatively more navigable than a Web site that has small-world property. In this paper, we also propose a way by which our experimental results can be reflected in designing an educational Web site.

  • PDF

Web Structure Mining by Extracting Hyperlinks from Web Documents and Access Logs (웹 문서와 접근로그의 하이퍼링크 추출을 통한 웹 구조 마이닝)

  • Lee, Seong-Dae;Park, Hyu-Chan
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.11 no.11
    • /
    • pp.2059-2071
    • /
    • 2007
  • If the correct structure of Web site is known, the information provider can discover users# behavior patterns and characteristics for better services, and users can find useful information easily and exactly. There may be some difficulties, however, to extract the exact structure of Web site because documents one the Web tend to be changed frequently. This paper proposes new method for extracting such Web structure automatically. The method consists of two phases. The first phase extracts the hyperlinks among Web documents, and then constructs a directed graph to represent the structure of Web site. It has limitations, however, to discover the hyperlinks in Flash and Java Applet. The second phase is to find such hidden hyperlinks by using Web access log. It fist extracts the click streams from the access log, and then extract the hidden hyperlinks by comparing with the directed graph. Several experiments have been conducted to evaluate the proposed method.

Extracting Logical Structure from Web Documents (웹 문서로부터 논리적 구조 추출)

  • Lee Min-Hyung;Lee Kyong-Ho
    • Journal of Korea Multimedia Society
    • /
    • v.7 no.10
    • /
    • pp.1354-1369
    • /
    • 2004
  • This paper presents a logical structure analysis method which transforms Web documents into XML ones. The proposed method consists of three phases: visual grouping, element identification, and logical grouping. To produce a logical structure more accurately, the proposed method defines a document model that is able to describe logical structure information of topic-specific document class. Since the proposed method is based on a visual structure from the visual grouping phase as well as a document model that describes logical structure information of a document type, it supports sophisticated structure analysis. Experimental results with HTML documents from the Web show that the method has performed logical structure analysis successfully compared with previous works. Particularly, the method generates XML documents as the result of structure analysis, so that it enhances the reusability of documents.

  • PDF

Design and Implementation of a WebEditor Specialized for Web-Site Maintenance (유지보수에 특화된 웹 문서 작성기의 설계 및 구현)

  • Cho, Young-Suk;Kwon, Yong-Ho;Do, Jae-Su
    • Convergence Security Journal
    • /
    • v.7 no.4
    • /
    • pp.73-81
    • /
    • 2007
  • Users of World Wide Web (Web) experience difficulties in the retrieval of pertinent information due to the increased information provided by Web sites and the complex structure of Web documents that are continuously created, deleted, restructured, and updated. Web providers' efforts to maintain their sites are tend to be less than that of site creation due to the expenses required for maintenance. If information of relationship among Web documents and their validity is provided to Web managers as well as Web developers, they can better serve users. In order to grasp the whole structure of a Web site and to verify the validity of hyperlinks, traversal and analysis of hyperlinks in a Web document are required to provide information for effective and efficient creation and maintenance of the Web. In this paper, we introduce a Web Editor specialized for Web maintenance. We emphasized on two aspects: first, the analysis of HTML Tags to extract hyperlink information and second, establishment of the relationship among hyperlinked documents, and verification of the validity of them.

  • PDF