• Title/Summary/Keyword: Query Log

Search Result 52, Processing Time 0.029 seconds

Information Seeking Behavior of Shopping Site Users: A Log Analysis of Popshoes, a Korean Shopping Search Engine (이용자들의 쇼핑 검색 행태 분석: 팝슈즈 로그 분석을 중심으로)

  • Park, Soyeon;Cho, Kihun;Choi, Kirin
    • Journal of the Korean Society for information Management
    • /
    • v.32 no.4
    • /
    • pp.289-305
    • /
    • 2015
  • This study aims to investigate information seeking behavior of Popshoes users. Transaction logs of Popshoes, a major Korean shopping search engine, were analyzed. These transaction logs were collected over 3 months period, from January 1 to March 31, 2015. The results of this study show that Popshoes users behave in a simple and passive way. In the total sessions, more users chose to browse a directory than typing and submitting a query. However, queries played a more crucial role in important decision makings such as search results clicks and product purchases than directory browsing. The results of this study can be implemented to the effective development of shopping search engines.

Finding Shortest Paths in L$^1$ Plane with Parallel Roads (평행한 도로들을 포함하는 L$^1$ 평면상에서의 최단경로 탐색)

  • Kim, Jae-Hoon;Kim, Soo-Hwan
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • v.9 no.1
    • /
    • pp.716-719
    • /
    • 2005
  • We present an algorithm for finding shortest paths in the L$_1$ plane with a transportation network. The transportation network consists of parallel line segments, called highways, through which a movement gets faster. Given a source point s, our algorithm constructs a Shortest Path Map(SPM) such that for any query point t, we can find the length of a shortest path form s to t in O(log n) time. We design a plane sweep-like algorithm computing SPM in O(nlog n) time.

  • PDF

OLAP System and Performance Evaluation for Analyzing Web Log Data (웹 로그 분석을 위한 OLAP 시스템 및 성능 평가)

  • 김지현;용환승
    • Journal of Korea Multimedia Society
    • /
    • v.6 no.5
    • /
    • pp.909-920
    • /
    • 2003
  • Nowadays, IT for CRM has been growing and developed rapidly. Typical techniques are statistical analysis tools, on-line multidimensional analytical processing (OLAP) tools, and data mining algorithms (such neural networks, decision trees, and association rules). Among customer data, web log data is very important and to use these data efficiently, applying OLAP technology to analyze multi-dimensionally. To make OLAP cube, we have to precalculate multidimensional summary results in order to get fast response. But as the number of dimensions and sparse cells increases, data explosion occurs seriously and the performance of OLAP decreases. In this paper, we presented why the web log data sparsity occurs and then what kinds of sparsity patterns generate in the two and t.he three dimensions for OLAP. Based on this research, we set up the multidimensional data models and query models for benchmark with each sparsity patterns. Finally, we evaluated the performance of three OLAP systems (MS SQL 2000 Analysis Service, Oracle Express and C-MOLAP).

  • PDF

Distributed database replicator without locking base relations

  • Lee, Wookey;Kang, Sukho;Park, Jooseok
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 1996.10a
    • /
    • pp.93-95
    • /
    • 1996
  • A replication server is considered to be one of the most effective tools to cope with the problems that may be caused by the complex data replications in distributed database systems. In the distributed environment, locking a table is inevitable and it is the main reason to coerce the system practically. This paper presents an Asynchronous Replicator Scheme (ARS) that basically utilizes the system log as files named differential files to refresh the distributed data files with complicated queries, and that it prevents (normally, huge) base tables from being locked. We take join operations as the complicated queries, not only because the join operation covers almost all the operations, but also because it is one of the most time-consuming and data intensive operations in query processings.

  • PDF

A Generic Framework for Reliable Mobile Client-Server System in Mobile Database Environments (모바일 데이터베이스 환경의 신뢰성 보장 모바일 클라이언트-서버 프레임워크)

  • Joo, Hae-Jong;Hong, Suk-Joo;Park, Young-Bae
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2005.11a
    • /
    • pp.35-38
    • /
    • 2005
  • 본 논문은 모바일 클라이언트-서버(Mobile Client-Server) 환경에서 모바일 데이터베이스 시스템 특성상 가질 수 있는 무선 네트워크의 약한 연결성 및 접속성 단절로 인한 데이터베이스 비축(Database Hoarding)과 관련된 문제, 공유 데이터(Shared Data)의 일관성(Consistency) 유지 문제, 그리고 로그(Log) 최적화 문제를 해결하기 위한 모바일 질의 처리 시스템(MQPS : Mobile Query Processing System)을 포함하는 새로운 모바일 클라이언트-서버 시스템을 제안하는데 목적이 있다.

  • PDF

Design and Implementation of MongoDB-based Unstructured Log Processing System over Cloud Computing Environment (클라우드 환경에서 MongoDB 기반의 비정형 로그 처리 시스템 설계 및 구현)

  • Kim, Myoungjin;Han, Seungho;Cui, Yun;Lee, Hanku
    • Journal of Internet Computing and Services
    • /
    • v.14 no.6
    • /
    • pp.71-84
    • /
    • 2013
  • Log data, which record the multitude of information created when operating computer systems, are utilized in many processes, from carrying out computer system inspection and process optimization to providing customized user optimization. In this paper, we propose a MongoDB-based unstructured log processing system in a cloud environment for processing the massive amount of log data of banks. Most of the log data generated during banking operations come from handling a client's business. Therefore, in order to gather, store, categorize, and analyze the log data generated while processing the client's business, a separate log data processing system needs to be established. However, the realization of flexible storage expansion functions for processing a massive amount of unstructured log data and executing a considerable number of functions to categorize and analyze the stored unstructured log data is difficult in existing computer environments. Thus, in this study, we use cloud computing technology to realize a cloud-based log data processing system for processing unstructured log data that are difficult to process using the existing computing infrastructure's analysis tools and management system. The proposed system uses the IaaS (Infrastructure as a Service) cloud environment to provide a flexible expansion of computing resources and includes the ability to flexibly expand resources such as storage space and memory under conditions such as extended storage or rapid increase in log data. Moreover, to overcome the processing limits of the existing analysis tool when a real-time analysis of the aggregated unstructured log data is required, the proposed system includes a Hadoop-based analysis module for quick and reliable parallel-distributed processing of the massive amount of log data. Furthermore, because the HDFS (Hadoop Distributed File System) stores data by generating copies of the block units of the aggregated log data, the proposed system offers automatic restore functions for the system to continually operate after it recovers from a malfunction. Finally, by establishing a distributed database using the NoSQL-based Mongo DB, the proposed system provides methods of effectively processing unstructured log data. Relational databases such as the MySQL databases have complex schemas that are inappropriate for processing unstructured log data. Further, strict schemas like those of relational databases cannot expand nodes in the case wherein the stored data are distributed to various nodes when the amount of data rapidly increases. NoSQL does not provide the complex computations that relational databases may provide but can easily expand the database through node dispersion when the amount of data increases rapidly; it is a non-relational database with an appropriate structure for processing unstructured data. The data models of the NoSQL are usually classified as Key-Value, column-oriented, and document-oriented types. Of these, the representative document-oriented data model, MongoDB, which has a free schema structure, is used in the proposed system. MongoDB is introduced to the proposed system because it makes it easy to process unstructured log data through a flexible schema structure, facilitates flexible node expansion when the amount of data is rapidly increasing, and provides an Auto-Sharding function that automatically expands storage. The proposed system is composed of a log collector module, a log graph generator module, a MongoDB module, a Hadoop-based analysis module, and a MySQL module. When the log data generated over the entire client business process of each bank are sent to the cloud server, the log collector module collects and classifies data according to the type of log data and distributes it to the MongoDB module and the MySQL module. The log graph generator module generates the results of the log analysis of the MongoDB module, Hadoop-based analysis module, and the MySQL module per analysis time and type of the aggregated log data, and provides them to the user through a web interface. Log data that require a real-time log data analysis are stored in the MySQL module and provided real-time by the log graph generator module. The aggregated log data per unit time are stored in the MongoDB module and plotted in a graph according to the user's various analysis conditions. The aggregated log data in the MongoDB module are parallel-distributed and processed by the Hadoop-based analysis module. A comparative evaluation is carried out against a log data processing system that uses only MySQL for inserting log data and estimating query performance; this evaluation proves the proposed system's superiority. Moreover, an optimal chunk size is confirmed through the log data insert performance evaluation of MongoDB for various chunk sizes.

Analysis and Evaluation of Most Clicked Documents of Korean Search Portal (검색 포털의 클릭 집중 문서 분석 평가)

  • Park, So-Yeon
    • Journal of Korean Library and Information Science Society
    • /
    • v.42 no.1
    • /
    • pp.325-338
    • /
    • 2011
  • This study aims to investigate characteristics of most clicked documents of Naver's universal search service. In particular, this study analyzed characteristics of most clicked documents such as click ratio, collection distribution, and yearly distribution. Also, clicked documents were evaluated in terms of relevance, credibility, and currency. In conducting this study, query logs and click logs of unified search service were analyzed. The results of this study show that most clicks occurred in blog collection and average click concentration rate reached almost 50%. Also, the relevance and currency of most clicked documents were quite high, but credibility of these documents were on average level. The results of this study can be implemented to the portal's effective development of searching algorithm and interface.

An Efficient Refresh Method in Multi-Level Storage System with Snapshot Data (스냅샷 데이터를 갖는 다중 레벨 저장 시스템에서의 효율적인 리프레시 기법)

  • Zhou, Peng;Eo, Sang-Hun;Kim, Myoung-Keum;Cho, Sook-Kyoung;Bae, Hae-Young
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2005.05a
    • /
    • pp.55-58
    • /
    • 2005
  • In multi-level storage system with snapshot data, some snapshots which are from selection portions of the base tables are kept in main memory. So how to efficiently refresh snapshots in response to changes on their base tables for preserving consistency which requires snapshots reflect the current state of the base tables referenced by the snapshot query is a very import research issue. In this paper, a method for efficiently refreshing snapshots is proposed. In this method, it uses a data structure to store metadata which contains some necessary information of every snapshot and an updating log that records the history of changes on its base tables. Synchronization process scans the metadata and refreshing process is executed using appropriate logs after it finds anyone of the snapshot need to be refreshed.

  • PDF

SOSiM: Shape-based Object Similarity Matching using Shape Feature Descriptors (SOSiM: 형태 특징 기술자를 사용한 형태 기반 객체 유사성 매칭)

  • Noh, Chung-Ho;Lee, Seok-Lyong;Chung, Chin-Wan;Kim, Sang-Hee;Kim, Deok-Hwan
    • Journal of KIISE:Databases
    • /
    • v.36 no.2
    • /
    • pp.73-83
    • /
    • 2009
  • In this paper we propose an object similarity matching method based on shape characteristics of an object in an image. The proposed method extracts edge points from edges of objects and generates a log polar histogram with respect to each edge point to represent the relative placement of extracted points. It performs the matching in such a way that it compares polar histograms of two edge points sequentially along with edges of objects, and uses a well-known k-NN(nearest neighbor) approach to retrieve similar objects from a database. To verify the proposed method, we've compared it to an existing Shape-Context method. Experimental results reveal that our method is more accurate in object matching than the existing method, showing that when k=5, the precision of our method is 0.75-0.90 while that of the existing one is 0.37, and when k=10, the precision of our method is 0.61-0.80 while that of the existing one is 0.31. In the experiment of rotational transformation, our method is also more robust compared to the existing one, showing that the precision of our method is 0.69 while that of the existing one is 0.30.

The Multimedia Searching Behavior of Korean Portal Users (국내 포털 이용자들의 멀티미디어 검색 행태 분석)

  • Park, So-Yeon
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.44 no.1
    • /
    • pp.101-115
    • /
    • 2010
  • The main difference between web searching and traditional searching is that the web provides and supports multimedia searching. This study aims to investigate the multimedia searching behavior of users of NAVER, a major Korean search portal. In conducting this study, the query logs and click logs of a unified search service were analyzed. The results of this study show that among the multimedia queries submitted by users, audio searches are the dominant media type, followed similarly by video and image searches. On the other hand, among the multimedia documents clicked on, video is the most popular collection type followed by image and audio collections. Entertainment is the most popular topic in both multimedia queries and clicks. The results of this study can be implemented for the portal's development of multimedia content and searching algorithms.