• Title/Summary/Keyword: query processing algorithms

Search Result 112, Processing Time 0.022 seconds

MetaSearch for Entry Page Finding Task (엔트리 페이지 검색을 위한 메타 검색)

  • Kang In-Ho
    • The KIPS Transactions:PartB
    • /
    • v.12B no.2 s.98
    • /
    • pp.215-222
    • /
    • 2005
  • In this paper, a MetaSearch algorithm for navigational queries is presented. Previous MetaSearch algorithms focused on informational queries. They Eave a high score to an overlapped document. However, the overemphasis of overlapped documents may degrade the performance of a MetaSearch algerian for a navigational query. However, if a lot of result documents are from a certain domain or a directory, then we can assume the importance of the domain or directory. Various experiments are conducted to show the effectiveness of overlap of a domain and directory names. System results from TREC and commercial search engines are used for experiments. From the results of experiments, the overlap of documents showed the better performance for informational queries. However, the overlap of domain names and directory names showed the $10\%$ higher performance for navigational queries.

Spatial Join based on the Transform-Space View (변환공간 뷰를 기반으로한 공간 조인)

  • 이민재;한욱신;황규영
    • Journal of KIISE:Databases
    • /
    • v.30 no.5
    • /
    • pp.438-450
    • /
    • 2003
  • Spatial joins find pairs of objects that overlap with each other. In spatial joins using indexes, original-space indexes such as the R-tree are widely used. An original-space index is the one that indexes objects as represented in the original space. Since original-space indexes deal with sizes of objects, it is difficult to develop a formal algorithm without relying on heuristics. On the other hand, transform-space indexes, which transform objects in the original space into points in the transform space and index them, deal only with points but no sites. Thus, spatial join algorithms using these indexes are relatively simple and can be formally developed. However, the disadvantage of transform-space join algorithms is that they cannot be applied to original-space indexes such as the R-tree containing original-space objects. In this paper, we present a novel mechanism for achieving the best of these two types of algorithms. Specifically, we propose a new notion of the transform-space view and present the transform-space view join algorithm(TSVJ). A transform-space view is a virtual transform-space index based on an original-space index. It allows us to interpret on-the-fly a pre-built original-space index as a transform-space index without incurring any overhead and without actually modifying the structure of the original-space index or changing object representation. The experimental result shows that, compared to existing spatial join algorithms that use R-trees in the original space, the TSVJ improves the number of disk accesses by up to 43.1% The most important contribution of this paper is to show that we can use original-space indexes, such as the R-tree, in the transform space by interpreting them through the notion of the transform-space view. We believe that this new notion provides a framework for developing various new spatial query processing algorithms in the transform space.

A Study on the Tree based Memoryless Anti-Collision Algorithm for RFID Systems (RFID 시스템에서의 트리 기반 메모리래스 충돌방지 알고리즘에 관한 연구)

  • Quan Chenghao;Hong Wonkee;Lee Yongdoo;Kim Hiecheol
    • The KIPS Transactions:PartC
    • /
    • v.11C no.6 s.95
    • /
    • pp.851-862
    • /
    • 2004
  • RFID(Radio frequency IDentification) is a technology that automatically identifies objects containing the electronic tags by using radio wave. The multi-tag identification problem is the core issue in the RFID and could be resolved by the anti-collision algorithm. However, most of the existing anti-collision algorithms have a problem of heavy implementation cost and low performance. In this paper. we propose a new tree based memoryless anti-collision algorithm called a collision tracking tree algorithm and presents its performance evaluation results obtained by simulation. The Collision Tracking Tree algorithm proves itself the capability of an identification rate of 749 tags per second and the performance evaluation results also show that the proposed algorithm outperforms the other two existing tree-based memoryless algorithms, i.e., the tree-walking algorithm and the query tree algorithm about 49 and 2.4 times respectively.

Image Retrieval Method Based on IPDSH and SRIP

  • Zhang, Xu;Guo, Baolong;Yan, Yunyi;Sun, Wei;Yi, Meng
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.8 no.5
    • /
    • pp.1676-1689
    • /
    • 2014
  • At present, the Content-Based Image Retrieval (CBIR) system has become a hot research topic in the computer vision field. In the CBIR system, the accurate extractions of low-level features can reduce the gaps between high-level semantics and improve retrieval precision. This paper puts forward a new retrieval method aiming at the problems of high computational complexities and low precision of global feature extraction algorithms. The establishment of the new retrieval method is on the basis of the SIFT and Harris (APISH) algorithm, and the salient region of interest points (SRIP) algorithm to satisfy users' interests in the specific targets of images. In the first place, by using the IPDSH and SRIP algorithms, we tested stable interest points and found salient regions. The interest points in the salient region were named as salient interest points. Secondary, we extracted the pseudo-Zernike moments of the salient interest points' neighborhood as the feature vectors. Finally, we calculated the similarities between query and database images. Finally, We conducted this experiment based on the Caltech-101 database. By studying the experiment, the results have shown that this new retrieval method can decrease the interference of unstable interest points in the regions of non-interests and improve the ratios of accuracy and recall.

Designing a Repository Independent Model for Mining and Analyzing Heterogeneous Bug Tracking Systems (다형의 버그 추적 시스템 마이닝 및 분석을 위한 저장소 독립 모델 설계)

  • Lee, Jae-Kwon;Jung, Woo-Sung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.9
    • /
    • pp.103-115
    • /
    • 2014
  • In this paper, we propose UniBAS(Unified Bug Analysis System) to provide a unified repository model by integrating the extracted data from the heterogeneous bug tracking systems. The UniBAS reduces the cost and complexity of the MSR(Mining Software Repositories) research process and enables the researchers to focus on their logics rather than the tedious and repeated works such as extracting repositories, processing data and building analysis models. Additionally, the system not only extracts the data but also automatically generates database tables, views and stored procedures which are required for the researchers to perform query-based analysis easily. It can also generate various types of exported files for utilizing external analysis tools or managing research data. A case study of detecting duplicate bug reports from the Firfox project of the Mozilla site has been performed based on the UniBAS in order to evaluate the usefulness of the system. The results of the experiments with various algorithms of natural language processing and flexible querying to the automatically extracted data also showed the effectiveness of the proposed system.

Optimizing Skyline Query Processing Algorithms on CUDA Framework (CUDA 프레임워크 상에서 스카이라인 질의처리 알고리즘 최적화)

  • Min, Jun;Han, Hwan-Soo;Lee, Sang-Won
    • Journal of KIISE:Databases
    • /
    • v.37 no.5
    • /
    • pp.275-284
    • /
    • 2010
  • GPUs are stream processors based on multi-cores, which can process large data with a high speed and a large memory bandwidth. Furthermore, GPUs are less expensive than multi-core CPUs. Recently, usage of GPUs in general purpose computing has been wide spread. The CUDA architecture from Nvidia is one of efforts to help developers use GPUs in their application domains. In this paper, we propose techniques to parallelize a skyline algorithm which uses a simple nested loop structure. In order to employ the CUDA programming model, we apply our optimization techniques to make our skyline algorithm fit into the performance restrictions of the CUDA architecture. According to our experimental results, we improve the original skyline algorithm by 80% with our optimization techniques.

OLAP System and Performance Evaluation for Analyzing Web Log Data (웹 로그 분석을 위한 OLAP 시스템 및 성능 평가)

  • 김지현;용환승
    • Journal of Korea Multimedia Society
    • /
    • v.6 no.5
    • /
    • pp.909-920
    • /
    • 2003
  • Nowadays, IT for CRM has been growing and developed rapidly. Typical techniques are statistical analysis tools, on-line multidimensional analytical processing (OLAP) tools, and data mining algorithms (such neural networks, decision trees, and association rules). Among customer data, web log data is very important and to use these data efficiently, applying OLAP technology to analyze multi-dimensionally. To make OLAP cube, we have to precalculate multidimensional summary results in order to get fast response. But as the number of dimensions and sparse cells increases, data explosion occurs seriously and the performance of OLAP decreases. In this paper, we presented why the web log data sparsity occurs and then what kinds of sparsity patterns generate in the two and t.he three dimensions for OLAP. Based on this research, we set up the multidimensional data models and query models for benchmark with each sparsity patterns. Finally, we evaluated the performance of three OLAP systems (MS SQL 2000 Analysis Service, Oracle Express and C-MOLAP).

  • PDF

Design and Implementation of Index Structure for Tracing of RFID Tag Objects (RFID 태그 객체의 위치 추적을 위한 색인 구조의 설계 및 구현)

  • Kim, Dong-Hyun;Lee, Gi-Hyoung;Hong, Bong-Hee;Ban, Chae-Hoon
    • Journal of Korea Spatial Information System Society
    • /
    • v.7 no.2 s.14
    • /
    • pp.67-79
    • /
    • 2005
  • For tracing tag locations, the trajectories should be modeled and indexed in a radio frequency identification (RFID) system. The trajectory of a tag is represented as a line that connects two spatiotemporal locations captured when the tag enters and leaves the vicinity of a reader. If a tag enters but does not leave a reader, its trajectory is represented only as a point captured at entry. Because the information that a tag stays in a reader is missing from the trajectory represented only as a point, it is impossible to find the tag that remains in a reader. To solve this problem we propose the data model in which trajectories are defined as intervals and new index scheme called the Interval R-tree. We also propose new insert and split algorithms to enable efficient query processing. We evaluate the performance of the proposed index scheme and compare it with the R-tree and the R*-tree. Our experiments show that the new index scheme outperforms the other two in processing queries of tags on various datasets.

  • PDF

Development of Workbench for Analysis and Visualization of Whole Genome Sequence (전유전체(Whole gerlome) 서열 분석과 가시화를 위한 워크벤치 개발)

  • Choe, Jeong-Hyeon;Jin, Hui-Jeong;Kim, Cheol-Min;Jang, Cheol-Hun;Jo, Hwan-Gyu
    • The KIPS Transactions:PartA
    • /
    • v.9A no.3
    • /
    • pp.387-398
    • /
    • 2002
  • As whole genome sequences of many organisms have been revealed by small-scale genome projects, the intensive research on individual genes and their functions has been performed. However on-memory algorithms are inefficient to analysis of whole genome sequences, since the size of individual whole genome is from several million base pairs to hundreds billion base pairs. In order to effectively manipulate the huge sequence data, it is necessary to use the indexed data structure for external memory. In this paper, we introduce a workbench system for analysis and visualization of whole genome sequence using string B-tree that is suitable for analysis of huge data. This system consists of two parts : analysis query part and visualization part. Query system supports various transactions such as sequence search, k-occurrence, and k-mer analysis. Visualization system helps biological scientist to easily understand whole structure and specificity by many kinds of visualization such as whole genome sequence, annotation, CGR (Chaos Game Representation), k-mer, and RWP (Random Walk Plot). One can find the relations among organisms, predict the genes in a genome, and research on the function of junk DNA using our workbench.

A Single Index Approach for Subsequence Matching that Supports Normalization Transform in Time-Series Databases (시계열 데이터베이스에서 단일 색인을 사용한 정규화 변환 지원 서브시퀀스 매칭)

  • Moon Yang-Sae;Kim Jin-Ho;Loh Woong-Kee
    • The KIPS Transactions:PartD
    • /
    • v.13D no.4 s.107
    • /
    • pp.513-524
    • /
    • 2006
  • Normalization transform is very useful for finding the overall trend of the time-series data since it enables finding sequences with similar fluctuation patterns. The previous subsequence matching method with normalization transform, however, would incur index overhead both in storage space and in update maintenance since it should build multiple indexes for supporting arbitrary length of query sequences. To solve this problem, we propose a single index approach for the normalization transformed subsequence matching that supports arbitrary length of query sequences. For the single index approach, we first provide the notion of inclusion-normalization transform by generalizing the original definition of normalization transform. The inclusion-normalization transform normalizes a window by using the mean and the standard deviation of a subsequence that includes the window. Next, we formally prove correctness of the proposed method that uses the inclusion-normalization transform for the normalization transformed subsequence matching. We then propose subsequence matching and index building algorithms to implement the proposed method. Experimental results for real stock data show that our method improves performance by up to $2.5{\sim}2.8$ times over the previous method. Our approach has an additional advantage of being generalized to support many sorts of other transforms as well as normalization transform. Therefore, we believe our work will be widely used in many sorts of transform-based subsequence matching methods.