• Title/Summary/Keyword: Web Logs

Search Result 83, Processing Time 0.03 seconds

Pre-Processing of Query Logs in Web Usage Mining

  • Abdullah, Norhaiza Ya;Husin, Husna Sarirah;Ramadhani, Herny;Nadarajan, Shanmuga Vivekanada
    • Industrial Engineering and Management Systems
    • /
    • v.11 no.1
    • /
    • pp.82-86
    • /
    • 2012
  • In For the past few years, query log data has been collected to find user's behavior in using the site. Many researches have studied on the usage of query logs to extract user's preference, recommend personalization, improve caching and pre-fetching of Web objects, build better adaptive user interfaces, and also to improve Web search for a search engine application. A query log contain data such as the client's IP address, time and date of request, the resources or page requested, status of request HTTP method used and the type of browser and operating system. A query log can offer valuable insight into web site usage. A proper compilation and interpretation of query log can provide a baseline of statistics that indicate the usage levels of website and can be used as tool to assist decision making in management activities. In this paper we want to discuss on the tasks performed of query logs in pre-processing of web usage mining. We will use query logs from an online newspaper company. The query logs will undergo pre-processing stage, in which the clickstream data is cleaned and partitioned into a set of user interactions which will represent the activities of each user during their visits to the site. The query logs will undergo essential task in pre-processing which are data cleaning and user identification.

Implementation of big web logs analyzer in estimating preferences for web contents (웹 컨텐츠 선호도 측정을 위한 대용량 웹로그 분석기 구현)

  • Choi, Eun Jung;Kim, Myuhng Joo
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.8 no.4
    • /
    • pp.83-90
    • /
    • 2012
  • With the rapid growth of internet infrastructure, World Wide Web is evolving recently into various services such as cloud computing, social network services. It simply go beyond the sharing of information. It started to provide new services such as E-business, remote control or management, providing virtual services, and recently it is evolving into new services such as cloud computing and social network services. These kinds of communications through World Wide Web have been interested in and have developed user-centric customized services rather than providing provider-centric informations. In these environments, it is very important to check and analyze the user requests to a website. Especially, estimating user preferences is most important. For these reasons, analyzing web logs is being done, however, it has limitations that the most of data to analyze are based on page unit statistics. Therefore, it is not enough to evaluate user preferences only by statistics of specific page. Because recent main contents of web page design are being made of media files such as image files, and of dynamic pages utilizing the techniques of CSS, Div, iFrame etc. In this paper, large log analyzer was designed and executed to analyze web server log to estimate web contents preferences of users. With mapreduce which is based on Hadoop, large logs were analyzed and web contents preferences of media files such as image files, sounds and videos were estimated.

Web Service Performance Improvement with the Redis (Redis를 활용한 Web Service 성능 향상)

  • Kim, Chul-Ho;Park, Kyeong-Won;Choi, Yong-Lak
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.19 no.9
    • /
    • pp.2064-2072
    • /
    • 2015
  • To improve performance, most of Web Services produce and manage User Access Logs. Through the Access Logs, the record provides information about time when the most traffic happens and logs and which resource is mostly used. Then, the log can be used to analyze. However, in case of increasing high traffics of Web Services at the specific time, the performance of Web Service leads to deterioration because the number of processing User Access Logs is increasing rapidly. To solve this problem, we should improve the system performance, or tuning is needed, but it makes a problem cost a lot of money. Also, after it happens, it is not necessary to build such system by spending extra money. Therefore, this paper described the effective Web Service's performance as using improved User Access Log performance. Also, to process the newest data in bulk, this paper includes a method applying some parts of NoSQL using Redis.

Applications of Transaction Log Analysis for the Web Searching Field (웹 검색 분야에서의 로그 분석 방법론의 활용도)

  • Park, So-Yeon;Lee, Joon-Ho
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.41 no.1
    • /
    • pp.231-242
    • /
    • 2007
  • Transaction logs capture the interactions between online information retrieval systems and the users. Given the nature of the Web and Web users, transaction logs appear to be a reasonable and relevant method to collect and investigate information searching behaviors from a large number of Web users. Based on a series of research studies that analyzed Naver transaction logs, this study examines how transaction log analysis can be applied and contributed to the field of web searching and suggests future implications for the web searching field. It is expected that this study could contribute to the development and implementation of more effective Web search systems and services.

Analysis of Behavior Patterns from Human and Web Crawler Events Log on ScienceON (ScienceON 웹 로그에 대한 인간 및 웹 크롤러 행위 패턴 분석)

  • Poositaporn, Athiruj;Jung, Hanmin;Park, Jung Hoon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.05a
    • /
    • pp.6-8
    • /
    • 2022
  • Web log analysis is one of the essential procedures for service improvement. ScienceON is a representative information service that provides various S&T literature and information, and we analyze its logs for continuous improvement. This study aims to analyze ScienceON web logs recorded in May 2020 and May 2021, dividing them into humans and web crawlers and performing an in-depth analysis. First, only web logs corresponding to S (search), V (detail view), and D (download) types are extracted and normalized to 658,407 and 8,727,042 records for each period. Second, using the Python 'user_agents' library, the logs are classified into humans and web crawlers, and third, the session size was set to 60 seconds, and each session is analyzed. We found that web crawlers, unlike humans, show relatively long for the average behavior pattern per session, and the behavior patterns are mainly for V patterns. As the future, the service will be improved to quickly detect and respond to web crawlers and respond to the behavioral patterns of human users.

  • PDF

Analyzing Patterns in News Reporters' Information Seeking Behavior on the Web (기자직의 웹 정보탐색행위 패턴 분석)

  • Kwon, Hye-Jin;Jeong, Dong-Youl
    • Journal of the Korean Society for information Management
    • /
    • v.27 no.4
    • /
    • pp.109-130
    • /
    • 2010
  • The purpose of this study is to identify th patterns in the news reporters' information seeking behaviors by observing their web activities. For this purpose, transaction logs collected from 23 news reporters were analyzed. Web tracking software was installed to collect the data from their PCs, and a total of 39,860 web logs were collected in two weeks. Start and end pattern of sessions, transitional pattern by step, sequence rule model was analyzed and the pattern of Internet use was compared with the general public. the analysis of pattern derived a web information seeking behavior modes that consists of four types of behaviors: fact-checking browsing, fact-checking search, investigative browsing and investigative search.

Web Server Log Visualization

  • Kim, Jungkee
    • International journal of advanced smart convergence
    • /
    • v.7 no.4
    • /
    • pp.101-107
    • /
    • 2018
  • Visitors to a Web site leave access logs documenting their activity in the site. These access logs provide a valuable source of information about the visitors' access patterns in the Web site. In addition to the pages that the user visited, it is generally possible to discover the geographical locations of the visitors. Web servers also records other information such as the entry into the site, the URL, the used operating system and the browser, etc. There are several Web mining techniques to extract useful information from such information and visualization of a Web log is one of those techniques. This paper presents a technique as well as a case a study of visualizing a Web log.

An Analysis of Query Types and Topics Submitted to Navel (클릭 로그에 근거한 네이버 검색 질의의 형태 및 주제 분석)

  • Park Soyeon;Lee Joon-Ho;Kim Ji Seoung
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.39 no.1
    • /
    • pp.265-278
    • /
    • 2005
  • This study examines web query types and topics submitted to Naver during one year period by analyzing query logs and click logs. Query logs capture queries users submitted to the system, and click logs consist of documents users clicked and viewed. This study presents a methodology to classify query types and topics. A method for click log analysis is also suggested. When classified by query types, there are more site search queries than content search queries. Queries about computer/internet. entertainment, shopping. game, education rank hightest. The implications for system designers and web content providers are discussed.

Information Seeking Behavior of the NAVER Users via Query Log Analysis (질의 로그 분석을 통한 네이버 이용자의 검색 형태 연구)

  • Lee, Joon-Ho;Park, So-Yeon;Kwon, Hyuk-Sung
    • Journal of the Korean Society for information Management
    • /
    • v.20 no.2
    • /
    • pp.27-41
    • /
    • 2003
  • Query logs are online records that capture user interactions with information retrieval systems and all the search processes. Query log analysis offers ad advantage of providing reasonable and unobtrusive means of collecting search information from a large number of users. In this paper, query logs of NAVER, a major Korean Internet search service, were analyzed to investigate the information seeking behabior of NAVER users. The query logs were collected over one week from various collecions such as comprehensive search, directory search and web ducument searc. It is expected that this study could contribute to the development and implementation of more effective web search systems and services.

A Web-based System for Business Process Discovery: Leveraging the SICN-Oriented Process Mining Algorithm with Django, Cytoscape, and Graphviz

  • Thanh-Hai Nguyen;Kyoung-Sook Kim;Dinh-Lam Pham;Kwanghoon Pio Kim
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.8
    • /
    • pp.2316-2332
    • /
    • 2024
  • In this paper, we introduce a web-based system that leverages the capabilities of the ρ(rho)-algorithm, which is a Structure Information Control Net (SICN)-oriented process mining algorithm, with open-source platforms, including Django, Graphviz, and Cytoscape, to facilitate the rediscovery and visualization of business process models. Our approach involves discovering SICN-oriented process models from process instances from the IEEE XESformatted process enactment event logs dataset. This discovering process is facilitated by the ρ-algorithm, and visualization output is transformed into either a JSON or DOT formatted file, catering to the compatibility requirements of Cytoscape or Graphviz, respectively. The proposed system utilizes the robust Django platform, which enables the creation of a userfriendly web interface. This interface offers a clear, concise, modern, and interactive visualization of the rediscovered business processes, fostering an intuitive exploration experience. The experiment conducted on our proposed web-based process discovery system demonstrates its ability and efficiency showing that the system is a valuable tool for discovering business process models from process event logs. Its development not only contributes to the advancement of process mining but also serves as an educational resource. Readers, students, and practitioners interested in process mining can leverage this system as a completely free process miner to gain hands-on experience in rediscovering and visualizing process models from event logs.