• Title/Summary/Keyword: Query Log

Search Result 52, Processing Time 0.03 seconds

A Query Randomizing Technique for breaking 'Filter Bubble'

  • Joo, Sangdon;Seo, Sukyung;Yoon, Youngmi
    • Journal of the Korea Society of Computer and Information
    • /
    • v.22 no.12
    • /
    • pp.117-123
    • /
    • 2017
  • The personalized search algorithm is a search system that analyzes the user's IP, cookies, log data, and search history to recommend the desired information. As a result, users are isolated in the information frame recommended by the algorithm. This is called 'Filter bubble' phenomenon. Most of the personalized data can be deleted or changed by the user, but data stored in the service provider's server is difficult to access. This study suggests a way to neutralize personalization by keeping on sending random query words. This is to confuse the data accumulated in the server while performing search activities with words that are not related to the user. We have analyzed the rank change of the URL while conducting the search activity with 500 random query words once using the personalized account as the experimental group. To prove the effect, we set up a new account and set it as a control. We then searched the same set of queries with these two accounts, stored the URL data, and scored the rank variation. The URLs ranked on the upper page are weighted more than the lower-ranked URLs. At the beginning of the experiment, the difference between the scores of the two accounts was insignificant. As experiments continue, the number of random query words accumulated in the server increases and results show meaningful difference.

Efficient Deferred Incremental Refresh of XML Query Cache Using ORDBMS (ORDBMS를 사용한 XML 질의 캐쉬의 효율적인 지연 갱신)

  • Hwang Dae-Hyun;Kang Hyun-Chul
    • The KIPS Transactions:PartD
    • /
    • v.13D no.1 s.104
    • /
    • pp.11-22
    • /
    • 2006
  • As we are to deal with more and more XML documents, research on storing and managing XML documents in databases are actively conducted. Employing RDBMS or ORDBMS as a repository of XML documents is currently regarded as most practical. The query results out of XML documents stored in databases could be cached for query performance though the cost of cache consistency against the update of the underlying data is incurred. In this paper, we assume that an ORDBMS is used as a repository for the XML query cache as well as its underlying XML documents, and that XML query cache is refreshed in a deferred way with the update log. When the same XML document was updated multiple times, the deferred refresh of the XML query cache may Bet inefficient. We propose an algorithm that removes or filters such duplicate updates. Based on that, the optimal SQL statements that are to be executed for XML query cache consistency are generated. Through experiments, we show the efficiency of our proposed deferred refresh of XML query cache.

Examining Categorical Transition and Query Reformulation Patterns in Image Search Process (이미지 검색 과정에 나타난 질의 전환 및 재구성 패턴에 관한 연구)

  • Chung, Eun-Kyung;Yoon, Jung-Won
    • Journal of the Korean Society for information Management
    • /
    • v.27 no.2
    • /
    • pp.37-60
    • /
    • 2010
  • The purpose of this study is to investigate image search query reformulation patterns in relation to image attribute categories. A total of 592 sessions and 2,445 queries from the Excite Web search engine log data were analyzed by utilizing Batley's visual information types and two facets and seven sub-facets of query reformulation patterns. The results of this study are organized with two folds: query reformulation and categorical transition. As the most dominant categories of queries are specific and general/nameable, this tendency stays over various search stages. From the perspective of reformulation patterns, while the Parallel movement is the most dominant, there are slight differences depending on initial or preceding query categories. In examining categorical transitions, it was found that 60-80% of search queries were reformulated within the same categories of image attributes. These findings may be applied to practice and implementation of image retrieval systems in terms of assisting users' query term selection and effective thesauri development.

A Study on Search Query Topics and Types using Topic Modeling and Principal Components Analysis (토픽모델링 및 주성분 분석 기반 검색 질의 유형 분류 연구)

  • Kang, Hyun-Ah;Lim, Heui-Seok
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.10 no.6
    • /
    • pp.223-234
    • /
    • 2021
  • Recent advances in the 4th Industrial Revolution have accelerated the change of the shopping behavior from offline to online. Search queries show customers' information needs most intensively in online shopping. However, there are not many search query research in the field of search, and most of the prior research in the field of search query research has been studied on a limited topic and data-based basis based on researchers' qualitative judgment. To this end, this study defines the type of search query with data-based quantitative methodology by applying machine learning to search research query field to define the 15 topics of search query by conducting topic modeling based on search query and clicked document information. Furthermore, we present a new classification system of new search query types representing searching behavior characteristics by extracting key variables through principal component analysis and analyzing. The results of this study are expected to contribute to the establishment of effective search services and the development of search systems.

A Convex Layer Tree for the Ray-Shooting Problem (광선 슈팅 문제를 위한 볼록 레이어 트리)

  • Kim, Soo-Hwan
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.21 no.4
    • /
    • pp.753-758
    • /
    • 2017
  • The ray-shooting problem is to find the first intersection point on the surface of given geometric objects where a ray moving along a straight line hits. Since rays are usually given in the form of queries, this problem is typically solved as follows. First, a data structure for a collection of objects is constructed as preprocessing. Then, the answer for each query ray is quickly computed using the data structure. In this paper, we consider the ray-shooting problem about the set of vertical line segments on the x-axis. We present a new data structure called a convex layer tree for n vertical line segments given by input. This is a tree structure consisting of layers of convex hulls of vertical line segments. It can be constructed in O(n log n) time and O(n) space and is easy to implement. We also present an algorithm to solve each query in O(log n) time using this data structure.

A Security Log Analysis System using Logstash based on Apache Elasticsearch (아파치 엘라스틱서치 기반 로그스태시를 이용한 보안로그 분석시스템)

  • Lee, Bong-Hwan;Yang, Dong-Min
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.22 no.2
    • /
    • pp.382-389
    • /
    • 2018
  • Recently cyber attacks can cause serious damage on various information systems. Log data analysis would be able to resolve this problem. Security log analysis system allows to cope with security risk properly by collecting, storing, and analyzing log data information. In this paper, a security log analysis system is designed and implemented in order to analyze security log data using the Logstash in the Elasticsearch, a distributed search engine which enables to collect and process various types of log data. The Kibana, an open source data visualization plugin for Elasticsearch, is used to generate log statistics and search report, and visualize the results. The performance of Elasticsearch-based security log analysis system is compared to the existing log analysis system which uses the Flume log collector, Flume HDFS sink and HBase. The experimental results show that the proposed system tremendously reduces both database query processing time and log data analysis time compared to the existing Hadoop-based log analysis system.

Survey of Automatic Query Expansion for Arabic Text Retrieval

  • Farhan, Yasir Hadi;Noah, Shahrul Azman Mohd;Mohd, Masnizah
    • Journal of Information Science Theory and Practice
    • /
    • v.8 no.4
    • /
    • pp.67-86
    • /
    • 2020
  • Information need has been one of the main motivations for a person using a search engine. Queries can represent very different information needs. Ironically, a query can be a poor representation of the information need because the user can find it difficult to express the information need. Query Expansion (QE) is being popularly used to address this limitation. While QE can be considered as a language-independent technique, recent findings have shown that in certain cases, language plays an important role. Arabic is a language with a particularly large vocabulary rich in words with synonymous shades of meaning and has high morphological complexity. This paper, therefore, provides a review on QE for Arabic information retrieval, the intention being to identify the recent state-of-the-art of this burgeoning area. In this review, we primarily discuss statistical QE approaches that include document analysis, search, browse log analyses, and web knowledge analyses, in addition to the semantic QE approaches, which use semantic knowledge structures to extract meaningful word relationships. Finally, our conclusion is that QE regarding the Arabic language is subjected to additional investigation and research due to the intricate nature of this language.

Replica Update Propagation Method for Cost Optimization of Request Forwarding in the Grid Database (그리드 데이터베이스에서 전송비용 최적화를 위한 복제본 갱신 전파 기법)

  • Jang, Yong-Il;Baek, Sung-Ha;Bae, Hae-Young
    • Journal of Korea Multimedia Society
    • /
    • v.9 no.11
    • /
    • pp.1410-1420
    • /
    • 2006
  • In this paper, a replica update propagation method for cost optimization of request forwarding in the Grid database is proposed,. In the Grid database, the data is replicated for performance and availability. In the case of data update, update information is forwarded to the neighbor nodes to synchronize with the others replicated data. There are two kinds of update propagation method that are the query based scheme and the log based scheme. And, only one of them is commonly used. But, because of dynamically changing environment through property of update query and processing condition, strategies that using one propagation method increases transmission cost in dynamic environment. In the proposed method, the three classes are defined from two cost models of query and log based scheme. And, cost functions and update propagation method is designed to select optimized update propagation scheme from these three classes. This paper shows a proposed method has an optimized performance through minimum transmission cost in dynamic processing environment.

  • PDF

Study on Windows Event Log-Based Corporate Security Audit and Malware Detection (윈도우 이벤트 로그 기반 기업 보안 감사 및 악성코드 행위 탐지 연구)

  • Kang, Serim;Kim, Soram;Park, Myungseo;Kim, Jongsung
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.28 no.3
    • /
    • pp.591-603
    • /
    • 2018
  • Windows Event Log is a format that records system log in Windows operating system and methodically manages information about system operation. An event can be caused by system itself or by user's specific actions, and some event logs can be used for corporate security audits, malware detection and so on. In this paper, we choose actions related to corporate security audit and malware detection (External storage connection, Application install, Shared folder usage, Printer usage, Remote connection/disconnection, File/Registry manipulation, Process creation, DNS query, Windows service, PC startup/shutdown, Log on/off, Power saving mode, Network connection/disconnection, Event log deletion and System time change), which can be detected through event log analysis and classify event IDs that occur in each situation. Also, the existing event log tools only include functions related to the EVTX file parse and it is difficult to track user's behavior when used in a forensic investigation. So we implemented new analysis tool in this study which parses EVTX files and user behaviors.

Vocabulary Expansion Technique for Advertisement Classification

  • Jung, Jin-Yong;Lee, Jung-Hyun;Ha, Jong-Woo;Lee, Sang-Keun
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.6 no.5
    • /
    • pp.1373-1387
    • /
    • 2012
  • Contextual advertising is an important revenue source for major service providers on the Web. Ads classification is one of main tasks in contextual advertising, and it is used to retrieve semantically relevant ads with respect to the content of web pages. However, it is difficult for traditional text classification methods to achieve satisfactory performance in ads classification due to scarce term features in ads. In this paper, we propose a novel ads classification method that handles the lack of term features for classifying ads with short text. The proposed method utilizes a vocabulary expansion technique using semantic associations among terms learned from large-scale search query logs. The evaluation results show that our methodology achieves 4.0% ~ 9.7% improvements in terms of the hierarchical f-measure over the baseline classifiers without vocabulary expansion.