• Title/Summary/Keyword: Top-k 검색

Search Result 86, Processing Time 0.024 seconds

Query Expansion based on Word Sense Community (유사 단어 커뮤니티 기반의 질의 확장)

  • Kwak, Chang-Uk;Yoon, Hee-Geun;Park, Seong-Bae
    • Journal of KIISE
    • /
    • v.41 no.12
    • /
    • pp.1058-1065
    • /
    • 2014
  • In order to assist user's who are in the process of executing a search, a query expansion method suggests keywords that are related to an input query. Recently, several studies have suggested keywords that are identified by finding domains using a clustering method over the documents that are retrieved. However, the clustering method is not relevant when presenting various domains because the number of clusters should be fixed. This paper proposes a method that suggests keywords by finding various domains related to the input queries by using a community detection algorithm. The proposed method extracts words from the top-30 documents of those that are retrieved and builds communities according to the word graph. Then, keywords representing each community are derived, and the represented keywords are used for the query expansion method. In order to evaluate the proposed method, we compared our results to those of two baseline searches performed by the Google search engine and keyword recommendation using TF-IDF in the search results. The results of the evaluation indicate that the proposed method outperforms the baseline with respect to diversity.

Predicting changes of realtime search words using time series analysis and artificial neural networks (시계열분석과 인공신경망을 이용한 실시간검색어 변화 예측)

  • Chong, Min-Yeong
    • Journal of Digital Convergence
    • /
    • v.15 no.12
    • /
    • pp.333-340
    • /
    • 2017
  • Since realtime search words are centered on the fact that the search growth rate of an issue is rapidly increasing in a short period of time, it is not possible to express an issue that maintains interest for a certain period of time. In order to overcome these limitations, this paper evaluates the daily and hourly persistence of the realtime words that belong to the top 10 for a certain period of time and extracts the search word that are constantly interested. Then, we present the method of using the time series analysis and the neural network to know how the interest of the upper search word changes, and show the result of forecasting the near future change through the actual example derived through the method. It can be seen that forecasting through time series analysis by date and artificial neural networks learning by time shows good results.

An Survey on Layer-based Methods using Skylines (Skyline을 사용하는 Layer 기반 방법에 관한 조사)

  • Lee, Ji-Hyeon;Park, Young-Ho
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2012.04a
    • /
    • pp.1075-1076
    • /
    • 2012
  • 인터넷의 발달로 데이터가 이질적이고 방대해점에 따라 사용자의 의도와 목적에 맞는 정보를 빠르고 정확하게 찾아내는 것이 어려워지고, 대용량의 데이터를 빠르게 검색 할 수 있는 효율적인 top k 질의 처리가 중요해 지고 있다. top k 질의 처리는 릴레이션에서 가장 높은 (또는 가장 낮은) 값을 가지는 k개의 튜플을 반환하는 방법이며, 그 중 Layer 기반 방법은 객체가 가지는 d개의 속성 값들을 d-차원의 공간상의 점 객체로 랩핑하여, layer들의 list를 생성 한다. 본 논문에서는 Layer 기반 법 중 skyline을 사용하여 layer을 생성하고 인덱스를 구축하는 기존 연구에 대해서 조사한다. 그리고 대표적인 방법인 모든 객체를 순차적으로 비교하는 BNL과 이의 비교 횟수를 감소시킨 SFS, 그리고 R-tree를 사용한 NN과 이의 계산 비용을 감소시킨 BBS에 대해 설명한다.

An Efficient Processing of Top-k(g) skyline group queries for Incomplete Data (불완전 데이터를 위한 효율적 top-k(g) 스카이라인 그룹 질의 처리 기법)

  • Park, Mi-Ra;Min, Jun-Ki
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2009.04a
    • /
    • pp.282-285
    • /
    • 2009
  • 대부분의 스카이라인 질의에 대한 연구는 완전한 데이터에 관하여 이루어지고 있다. 하지만, 우리가 웹이나 기타 다른 도구로 데이터베이스에 자료를 입력할 때는 null을 허용하는 부분이 존재한다. 현재 이런 불완전한 데이터를 처리하기 위한 많은 연구가 이루어지고 있다. 본 논문에서는 이러한 문제를 해결하기 위하여 기존에 제안되었던 불완전한 데이터를 처리하는 기법과 차원의 저주를 해결하기 위한 기법을 고려하여 이를 바탕으로 완전한 데이터와 동등하거나 혹은 더 좋을지도 모르는 데이터를 우선순위가 높은 순서대로 k(g)개 검색해주는 스카이라인 그룹 질의를 도입하고 이를 처리하는 방법을 제안한다.

A Study on Efficient Decoding of Huffman Codes (허프만 코드의 효율적인 복호화에 관한 연구)

  • Park, Sangho
    • Journal of IKEEE
    • /
    • v.22 no.3
    • /
    • pp.850-853
    • /
    • 2018
  • In this paper, we propose a decoding method using a balanced binary tree and a canonical Huffman tree for efficient decoding of Huffman codes. The balanced binary tree scheme reduces the number of searches by lowering the height of the tree and binary search. However, constructing a tree based on the value of the code instead of frequency of symbol is a drawback of the balanced binary tree. In order to overcome these drawbacks, a balanced binary tree is reconstructed according to the occurrence probability of symbols at each level of the tree and binary search is performed for each level. We minimize the number of searches using a canonical Huffman tree to find level of code to avoid searching sequentially from the top level to bottom level.

An Efficient Processing Method of Top-k(g) Skyline Group Queries for Incomplete Data (불완전 데이터를 위한 효율적 Top-k(g) 스카이라인 그룹 질의 처리 기법)

  • Park, Mi-Ra;Min, Jun-Ki
    • The KIPS Transactions:PartD
    • /
    • v.17D no.1
    • /
    • pp.17-24
    • /
    • 2010
  • Recently, there has been growing interest in skyline queries. Most of works for skyline queries assume that the data do not have null value. However, when we input data through the Web or with other different tools, there exist incomplete data with null values. As a result, several skyline processing techniques for incomplete data have been proposed. However, available skyline query techniques for incomplete data do not consider the environments that coexist complete data and incomplete data since these techniques deal with the incomplete data only. In this paper, we propose a novel skyline group processing technique which evaluates skyline queries for the environments that coexist complete data and incomplete data. To do this, we introduce the top-k(g) skyline group query which searches g skyline groups with respect to the user's dimensional preference. In our experimental study, we show efficiency of our proposed technique.

Optimized Structures with Hop Constraints for Web Information Retrieval (Hop 제약조건이 고려된 최적화 웹정보검색)

  • Lee, Woo-Key;Kim, Ki-Baek;Lee, Hwa-Ki
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.33 no.4
    • /
    • pp.63-82
    • /
    • 2008
  • The explosively growing attractiveness of the Web is commencing significant demands for a structuring analysis on various web objects. The larger the substantial number of web objects are available, the more difficult for the clients(i.e. common web users and web robots) and the servers(i.e. Web search engine) to retrieve what they really want. We have in mind focusing on the structure of web objects by introducing optimization models for more convenient and effective information retrieval. For this purpose, we represent web objects and hyperlinks as a directed graph from which the optimal structures are derived in terms of rooted directed spanning trees and Top-k trees. Computational experiments are executed for synthetic data as well as for real web sites' domains so that the Lagrangian Relaxation approaches have exploited the Top-k trees and Hop constraint resolutions. In the experiments, our methods outperformed the conventional approaches so that the complex web graph can successfully be converted into optimal-structured ones within a reasonable amount of computation time.

Similar Question Search System for online Q&A for the Korean Language Based on Topic Classification (온라인가나다를 위한 주제 분류 기반 유사 질문 검색 시스템)

  • Mun, Jung-Min;Song, Yeong-Ho;Jin, Ji-Hwan;Lee, Hyun-Seob;Lee, Hyun Ah
    • Korean Journal of Cognitive Science
    • /
    • v.26 no.3
    • /
    • pp.263-278
    • /
    • 2015
  • Online Q&A for the National Institute of the Korean Language provides expert's answers for questions about the Korean language, in which many similar questions are repeatedly posted like other Q&A boards. So, if a system automatically finds questions that are similar to a user's question, it can immediately provide users with recommendable answers to their question and prevent experts from wasting time to answer to similar questions repeatedly. In this paper, we set 5 classes of questions based on its topic which are frequently asked, and propose to classify questions to those classes. Our system searches similar questions by combining topic similarity, vector similarity and sequence similarity. Experiment shows that our method improves search correctness with topic classification. In experiment, Mean Reciprocal Rank(MRR) of our system is 0.756, and precision for the first result is 68.31% and precision for top five results is 87.32%.

FRIP System for Region-based Image Retrieval (영역기반 영상 검색을 위한 FRIP 시스템)

  • Ko, Byoung-Chul;Lee, Hae-Sung;Byun, Hye-Ran
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.7 no.3
    • /
    • pp.260-272
    • /
    • 2001
  • In this paper, we have designed a region-based image retrieval system, FRIP(Finding Region In the Pictures). This system includes a robust image segmentation scheme using color and texture direction and retrieval scheme based on features of each region. For image segmentation, by using a circular filter, we can protect the boundary of round object and merge stripes or spots of objects into body region. It also combines scaled and shifted color coordinate and texture direction. After image segmentation, in order to improve the storage management effectively and reduce the computation time, we extract compact features from each region and store as index. For user interface, by the user specified constraints such as color-care / don't care. scale-care / dont care, shape-care / dont care and location-care / dont care, the overal/ matching score is estimated and the top Ie nearest images are reported in the ascending order of the final score.

  • PDF

Top-down Hierarchical Clustering using Multidimensional Indexes (다차원 색인을 이용한 하향식 계층 클러스터링)

  • Hwang, Jae-Jun;Mun, Yang-Se;Hwang, Gyu-Yeong
    • Journal of KIISE:Databases
    • /
    • v.29 no.5
    • /
    • pp.367-380
    • /
    • 2002
  • Due to recent increase in applications requiring huge amount of data such as spatial data analysis and image analysis, clustering on large databases has been actively studied. In a hierarchical clustering method, a tree representing hierarchical decomposition of the database is first created, and then, used for efficient clustering. Existing hierarchical clustering methods mainly adopted the bottom-up approach, which creates a tree from the bottom to the topmost level of the hierarchy. These bottom-up methods require at least one scan over the entire database in order to build the tree and need to search most nodes of the tree since the clustering algorithm starts from the leaf level. In this paper, we propose a novel top-down hierarchical clustering method that uses multidimensional indexes that are already maintained in most database applications. Generally, multidimensional indexes have the clustering property storing similar objects in the same (or adjacent) data pares. Using this property we can find adjacent objects without calculating distances among them. We first formally define the cluster based on the density of objects. For the definition, we propose the concept of the region contrast partition based on the density of the region. To speed up the clustering algorithm, we use the branch-and-bound algorithm. We propose the bounds and formally prove their correctness. Experimental results show that the proposed method is at least as effective in quality of clustering as BIRCH, a bottom-up hierarchical clustering method, while reducing the number of page accesses by up to 26~187 times depending on the size of the database. As a result, we believe that the proposed method significantly improves the clustering performance in large databases and is practically usable in various database applications.