• Title/Summary/Keyword: web log mining

Search Result 82, Processing Time 0.042 seconds

OLAP System and Performance Evaluation for Analyzing Web Log Data (웹 로그 분석을 위한 OLAP 시스템 및 성능 평가)

  • 김지현;용환승
    • Journal of Korea Multimedia Society
    • /
    • v.6 no.5
    • /
    • pp.909-920
    • /
    • 2003
  • Nowadays, IT for CRM has been growing and developed rapidly. Typical techniques are statistical analysis tools, on-line multidimensional analytical processing (OLAP) tools, and data mining algorithms (such neural networks, decision trees, and association rules). Among customer data, web log data is very important and to use these data efficiently, applying OLAP technology to analyze multi-dimensionally. To make OLAP cube, we have to precalculate multidimensional summary results in order to get fast response. But as the number of dimensions and sparse cells increases, data explosion occurs seriously and the performance of OLAP decreases. In this paper, we presented why the web log data sparsity occurs and then what kinds of sparsity patterns generate in the two and t.he three dimensions for OLAP. Based on this research, we set up the multidimensional data models and query models for benchmark with each sparsity patterns. Finally, we evaluated the performance of three OLAP systems (MS SQL 2000 Analysis Service, Oracle Express and C-MOLAP).

  • PDF

Web Log Mining for Adaptive Web Sites (적응형 웹 사이트를 위한 웹 로그 마이닝)

  • Ko, Kyong-Ja;Kim, In-Cheol
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2001.04a
    • /
    • pp.325-328
    • /
    • 2001
  • 본 논문에서는 웹 사이트에 접근하는 이용자의 패턴을 분석하여 정보 제공이 보다 용이한 구조로 자동 개선시켜 나가는 적응형 웹 사이트의 구현 방안을 제시한다. 특히, 본 연구에서는 기존 웹 사이트의 구조를 가능한 파괴하지 않는 범위 내에서 웹 사이트론 변경하고자 이용자의 접근 패턴상 연관성은 높으나 접근 경로가 긴 문서들을 추출하여 색인 페이지를 추가 생성한다. 이를 위하여, 먼저 대용량의 웹 서버 로그 데이터를 대상으로 하이퍼 링크 구조에 따라 필터링된 최후 전진 문서만을 가지고 데이터 시퀀스를 구성한다. 이러한 데이터 시퀀스에 새로운 순차 접근 패턴 탐색 알고리즘인 TPA를 적용함으로써 웹 문서간 충분한 지지도를 갖는 연관성 있는 문서들의 시퀀스를 구한다. 이와같은 빈발 시퀀스들에 대한 색인 페이지를 추가로 생성시켜주는 서비스를 통하여 이용자들의 효과적인 정보 접근을 지원할 수 있는 웹 사이트로의 변경이 가능하다.

  • PDF

Design of a Product Recommender based on Web Log Analysis (웹 로그 분석에 기반한 상품 추천기의 설계)

  • 김건량;이도헌
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2000.10a
    • /
    • pp.349-352
    • /
    • 2000
  • As a lot of people have used electronic commerce, many shopping malls have appeared on the Interne and the shopping information in them has been enormous. So, the need for a system to recommend product to customers is on the increase so as to reduce time and efforts for shopping. In this paper, we suppose a Product Recommender System which is constructed by applying data mining techniques to web for files and analyzing customer's action pattern, customer's profile and product purchase data. This system offers convenience that customers can get their desired information easily, by sending e-mail or mail and recommending web pages when they visit a shopping mall.

  • PDF

Directed Graph by Integrating Web Document Hyperlink and Web Access Log for Web Mining (웹 마이닝을 위한 웹 문서 하이퍼링크와 웹 접근로그를 통합한 방향그래프)

  • Park, Chul-Hyun;Lee, Seong-Dae;Kwak, Yong-Won;Jeon, Sung-Hwan;Park, Hyu-Chan
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2005.11b
    • /
    • pp.16-18
    • /
    • 2005
  • 웹은 사용자가 원하는 정보를 쉽고 정확하게 검색할 수 있도록 웹 문서를 자료구조화하여 보다 신뢰성 있는 패턴을 추출하고 사용자의 특성과 행동 패턴을 적용하여 개인화 하여야한다. 본 논문에서는 개인화하기 위한 전처리 과정으로서 웹 문서를 구조화 하는 방법을 제안한다. 제안 방법은 기본적으로 웹 문서 태그의 하이퍼링크를 깊이 우선 탐색 알고리즘을 사용하여 방향그래프를 만드는 것이다. 이때 웹 문서 태그 탐색 시 플래시, 스크립트 등의 찾기 힘든 하이퍼링크를 찾는 문제와 '뒤로' 버튼 사용 시 웹 접근로그에 기록되지 않는 문제점을 보완한다. 이를 위해 클릭 스트림을 스택에 저장하여 이미 만들어진 방향그래프와 비교하여 새롭게 찾은 정점과 간선을 추가함으로써 보다 신뢰성높은 방향그래프를 만든다.

  • PDF

Algorithm for Extracting the General Web Search Path Pattern (일반적인 웹 검색 경로패턴 추출 알고리즘)

  • Jang, Min-Seok;Ha, Eun-Mi
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • v.9 no.1
    • /
    • pp.771-773
    • /
    • 2005
  • There have been researches about analyzing the information retrieval patterns of log file to efficiently obtain the users' information research patters in web environment. The methods frequently used in their researches is to suggest the algorithms by which the frequent one is derived from the path traversal patterns in efficient way. But one of their general problems is not to provide the proper solution in case of complex, that is, general topological patterns. Therefore this paper tries to suggest a efficient algorithm after defining the general information retrieval pattern.

  • PDF

Discovering Temporal Relation Rules from Temporal Interval Data (시간간격을 고려한 시간관계 규칙 탐사 기법)

  • Lee, Yong-Joon;Seo, Sung-Bo;Ryu, Keun-Ho;Kim, Hye-Kyu
    • Journal of KIISE:Databases
    • /
    • v.28 no.3
    • /
    • pp.301-314
    • /
    • 2001
  • Data mining refers to a set of techniques for discovering implicit and useful knowledge from large database. Many studies on data mining have been pursued and some of them have involved issues of temporal data mining for discovering knowledge from temporal database, such as sequential pattern, similar time sequence, cyclic and temporal association rules, etc. However, all of the works treat problems for discovering temporal pattern from data which are stamped with time points and do not consider problems for discovering knowledge from temporal interval data. For example, there are many examples of temporal interval data that it can discover useful knowledge from. These include patient histories, purchaser histories, web log, and so on. Allen introduces relationships between intervals and operators for reasoning about relations between intervals. We present a new data mining technique that can discover temporal relation rules in temporal interval data by using the Allen's theory. In this paper, we present two new algorithms for discovering algorithm for generating temporal relation rules, discovers rules from temporal interval data. This technique can discover more useful knowledge in compared with conventional data mining techniques.

  • PDF

Temporal Data Mining Framework (시간 데이타마이닝 프레임워크)

  • Lee, Jun-Uk;Lee, Yong-Jun;Ryu, Geun-Ho
    • The KIPS Transactions:PartD
    • /
    • v.9D no.3
    • /
    • pp.365-380
    • /
    • 2002
  • Temporal data mining, the incorporation of temporal semantics to existing data mining techniques, refers to a set of techniques for discovering implicit and useful temporal knowledge from large quantities of temporal data. Temporal knowledge, expressible in the form of rules, is knowledge with temporal semantics and relationships, such as cyclic pattern, calendric pattern, trends, etc. There are many examples of temporal data, including patient histories, purchaser histories, and web log that it can discover useful temporal knowledge from. Many studies on data mining have been pursued and some of them have involved issues of temporal data mining for discovering temporal knowledge from temporal data, such as sequential pattern, similar time sequence, cyclic and temporal association rules, etc. However, all of the works treated data in database at best as data series in chronological order and did not consider temporal semantics and temporal relationships containing data. In order to solve this problem, we propose a theoretical framework for temporal data mining. This paper surveys the work to date and explores the issues involved in temporal data mining. We then define a model for temporal data mining and suggest SQL-like mining language with ability to express the task of temporal mining and show architecture of temporal mining system.

The Evaluation for Web Mining and Analytics Service from the View of Personal Information Protection and Privacy (개인정보보호 관점에서의 웹 트래픽 수집 및 분석 서비스에 대한 타당성 연구)

  • Kang, Daniel;Shim, Mi-Na;Bang, Je-Wan;Lee, Sang-Jin;Lim, Jong-In
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.19 no.6
    • /
    • pp.121-134
    • /
    • 2009
  • Consumer-centric marketing business is surely one of the most successful emerging business but it poses a threat to personal privacy. Between the service provider and the user there are many contrary issues to each other. The enterprise asserts that to abuse the privacy data which is anonymous there is not a problem. The individual only will not be able to willingly submit the problem which is latent. Web traffic analysis technology itself doesn't create issues, but this technology when used on data of personal nature might cause concerns. The most criticized ethical issue involving web traffic analysis is the invasion of privacy. So we need to inspect how many and what kind of personal informations being used and if there is any illegal treatment of personal information. In this paper, we inspect the operation of consumer-centric marketing tools such as web log analysis solutions and data gathering services with web browser toolbar. Also we inspect Microsoft explorer-based toolbar application which records and analyzes personal web browsing pattern through reverse engineering technology. Finally, this identified and explored security and privacy requirement issues to develop more reliable solutions. This study is very important for the balanced development with personal privacy protection and web traffic analysis industry.

A Study on the Usage Patterns of Electronic Commerce Web System (수용도 향상을 위한 소비자의 쇼핑몰 사용패턴특성 분류 및 분석)

  • 곽효연;손일문
    • Journal of the Korea Society of Computer and Information
    • /
    • v.7 no.3
    • /
    • pp.149-157
    • /
    • 2002
  • Todays, electronic commerce(EC) results to the revolution and new paradigm of business, more and more Web-based EC applications have emerged. But, it's web systems should be satisfied by customers and it should be successful to buying some goods in virtual stores with easy to use. The usability and acceptance of the EC web system is one of the key factors in the successful construction of EC system. In this paper, we considered the characteristics of information search and decision making process in the design of EC web system to be used easily and to be more acceptable to customers. On the basis of these characteristics, we could classified with the activities of the process of buying in the domestic web systems. And, the log files of experimental tasks were analyzed by the statistical method of data mining. As the these results, the important factors of the process of buying could be summarized, 5 user groups could be seen in EC customers, and the usage patterns of these groups were described. These results could be very useful to design user-oriented EC web system.

  • PDF

A Control Path Analysis Mechanism for Workflow Mining (워크플로우 마이닝을 위한 제어 경로 분석 메커니즘)

  • Min Jun-Ki;Kim Kwang-Hoon;Chung Jung-Su
    • Journal of Internet Computing and Services
    • /
    • v.7 no.1
    • /
    • pp.91-99
    • /
    • 2006
  • This paper proposes a control path analysis mechanism to be used in the workflow mining framework maximizing the workflow traceability and re discoverability by analyzing the total sequences of the control path perspective of a workflow model and by rediscovering their runtime enactment history from the workflow log information. The mechanism has two components One is to generate the total sequences of the control paths from a workflow mode by transforming it to a control path decision tree, and the other is to rediscover the runtime enactment history of each control path out of the total sequences from the corresponding workflow's execution logs. Eventually, these rediscovered knowledge and execution history of a workflow model make up a control path oriented intelligence of the workflow model. which ought to be an essential ingredient for maintaining and reengineering the qualify of the workflow model. Based upon the workflow intelligence, it is possible for the workflow model to be gradually refined and finally maximize its qualify by repeatedly redesigning and reengineering during its whole life long time period.

  • PDF