• Title/Summary/Keyword: Tree-based algorithms

Search Result 385, Processing Time 0.026 seconds

Analysis of Indexing Schemes for Structure-Based Retrieval (구조 기반 검색을 위한 색인 구조에 대한 분석)

  • 김영자;김현주;배종민
    • Journal of Korea Multimedia Society
    • /
    • v.7 no.5
    • /
    • pp.601-616
    • /
    • 2004
  • Information retrieval systems for structured documents provide multiple levels of retrieval capability by supporting structure-based queries. In order to process structure-based queries for structured documents, information for structural nesting relationship between elements and for element sequence must be maintained. This paper presents four index structures that can process various query types about structures such as structural relationships between elements or element occurrence order. The proposed algorithms are based on the concept of Global Document Instance Tree.

  • PDF

Bulk Insertion Method for R-tree using Seeded Clustering (R-tree에서 Seeded 클러스터링을 이용한 다량 삽입)

  • 이태원;문봉기;이석호
    • Journal of KIISE:Databases
    • /
    • v.31 no.1
    • /
    • pp.30-38
    • /
    • 2004
  • In many scientific and commercial applications such as Earth Observation System (EOSDIS) and mobile Phone services tracking a large number of clients, it is a daunting task to archive and index ever increasing volume of complex data that are continuously added to databases. To efficiently manage multidimensional data in scientific and data warehousing environments, R-tree based index structures have been widely used. In this paper, we propose a scalable technique called seeded clustering that allows us to maintain R-tree indexes by bulk insertion while keeping pace with high data arrival rates. Our approach uses a seed tree, which is copied from the top k levels of a target R-tree, to classify input data objects into clusters. We then build an R-tree for each of the clusters and insert the input R-trees into the target R-tree in bulk one at a time. We present detailed algorithms for the seeded clustering and bulk insertion as well as the results from our extensive experimental study. The experimental results show that the bulk insertion by seeded clustering outperforms the previously known methods in terms of insertion cost and the quality of target R-trees measured by their query performance.

Spatial Join based on the Transform-Space View (변환공간 뷰를 기반으로한 공간 조인)

  • 이민재;한욱신;황규영
    • Journal of KIISE:Databases
    • /
    • v.30 no.5
    • /
    • pp.438-450
    • /
    • 2003
  • Spatial joins find pairs of objects that overlap with each other. In spatial joins using indexes, original-space indexes such as the R-tree are widely used. An original-space index is the one that indexes objects as represented in the original space. Since original-space indexes deal with sizes of objects, it is difficult to develop a formal algorithm without relying on heuristics. On the other hand, transform-space indexes, which transform objects in the original space into points in the transform space and index them, deal only with points but no sites. Thus, spatial join algorithms using these indexes are relatively simple and can be formally developed. However, the disadvantage of transform-space join algorithms is that they cannot be applied to original-space indexes such as the R-tree containing original-space objects. In this paper, we present a novel mechanism for achieving the best of these two types of algorithms. Specifically, we propose a new notion of the transform-space view and present the transform-space view join algorithm(TSVJ). A transform-space view is a virtual transform-space index based on an original-space index. It allows us to interpret on-the-fly a pre-built original-space index as a transform-space index without incurring any overhead and without actually modifying the structure of the original-space index or changing object representation. The experimental result shows that, compared to existing spatial join algorithms that use R-trees in the original space, the TSVJ improves the number of disk accesses by up to 43.1% The most important contribution of this paper is to show that we can use original-space indexes, such as the R-tree, in the transform space by interpreting them through the notion of the transform-space view. We believe that this new notion provides a framework for developing various new spatial query processing algorithms in the transform space.

Prediction of Water Usage in Pig Farm based on Machine Learning (기계학습을 이용한 돈사 급수량 예측방안 개발)

  • Lee, Woongsup;Ryu, Jongyeol;Ban, Tae-Won;Kim, Seong Hwan;Choi, Heechul
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.21 no.8
    • /
    • pp.1560-1566
    • /
    • 2017
  • Recently, accumulation of data on pig farm is enabled through the wide spread of smart pig farm equipped with Internet-of-Things based sensors, and various machine learning algorithms are applied on the data in order to improve the productivity of pig farm. Herein, multiple machine learning schemes are used to predict the water usage in pig farm which is known to be one of the most important element in pig farm management. Especially, regression algorithms, which are linear regression, regression tree and AdaBoost regression, and classification algorithms which are logistic classification, decision tree and support vector machine, are applied to derive a prediction scheme which forecast the water usage based on the temperature and humidity of pig farm. Through performance evaluation, we find that the water usage can be predicted with high accuracy. The proposed scheme can be used to detect the malfunction of water system which prevents the death of pigs and reduces the loss of pig farm.

An Extended Frequent Pattern Tree for Hiding Sensitive Frequent Itemsets (민감한 빈발 항목집합 숨기기 위한 확장 빈발 패턴 트리)

  • Lee, Dan-Young;An, Hyoung-Geun;Koh, Jae-Jin
    • The KIPS Transactions:PartD
    • /
    • v.18D no.3
    • /
    • pp.169-178
    • /
    • 2011
  • Recently, data sharing between enterprises or organizations is required matter for task cooperation. In this process, when the enterprise opens its database to the affiliates, it can be occurred to problem leaked sensitive information. To resolve this problem it is needed to hide sensitive information from the database. Previous research hiding sensitive information applied different heuristic algorithms to maintain quality of the database. But there have been few studies analyzing the effects on the items modified during the hiding process and trying to minimize the hided items. This paper suggests eFP-Tree(Extended Frequent Pattern Tree) based FP-Tree(Frequent Pattern Tree) to hide sensitive frequent itemsets. Node formation of eFP-Tree uses border to minimize impacts of non sensitive frequent itemsets in hiding process, by organizing all transaction, sensitive and border information differently to before. As a result to apply eFP-Tree to the example transaction database, the lost items were less than 10%, proving it is more effective than the existing algorithm and maintain the quality of database to the optimal.

Customer Churning Forecasting and Strategic Implication in Online Auto Insurance using Decision Tree Algorithms (의사결정나무를 이용한 온라인 자동차 보험 고객 이탈 예측과 전략적 시사점)

  • Lim, Se-Hun;Hur, Yeon
    • Information Systems Review
    • /
    • v.8 no.3
    • /
    • pp.125-134
    • /
    • 2006
  • This article adopts a decision tree algorithm(C5.0) to predict customer churning in online auto insurance environment. Using a sample of on-line auto insurance customers contracts sold between 2003 and 2004, we test how decision tree-based model(C5.0) works on the prediction of customer churning. We compare the result of C5.0 with those of logistic regression model(LRM), multivariate discriminant analysis(MDA) model. The result shows C5.0 outperforms other models in the predictability. Based on the result, this study suggests a way of setting marketing strategy and of developing online auto insurance business.

An Indexing Technique for Range Sum Queries in Spatio - Temporal Databases (시공간 데이타베이스에서 영역 합 질의를 위한 색인 기법)

  • Cho Hyung-Ju;Choi Yong-Jin;Min Jun-Ki;Chung Chin-Wan
    • Journal of KIISE:Databases
    • /
    • v.32 no.2
    • /
    • pp.129-141
    • /
    • 2005
  • Although spatio-temporal databases have received considerable attention recently, there has been little work on processing range sum queries on the historical records of moving objects despite their importance. Since to answer range sum queries, the direct access to a huge amount of data incurs prohibitive computation cost, materialization techniques based on existing index structures are recently suggested. A simple but effective solution is to apply the materialization technique to the MVR-tree known as the most efficient structure for window queries with spatio-temporal conditions. However, the MVR-tree has a difficulty in maintaining pre-aggregated results inside its internal nodes due to cyclic paths between nodes. Aggregate structures based on other index structures such as the HR-tree and the 3DR-tree do not provide satisfactory query performance. In this paper, we propose a new indexing technique called the Adaptive Partitioned Aggregate R-Tree (APART) and query processing algorithms to efficiently process range sum queries in many situations. Experimental results show that the performance of the APART is typically above 2 times better than existing aggregate structures in a wide range of scenarios.

Xp-tree:A new spatial-based indexing method to accelerate Xpath location steps (Xp-tree:Xpath 로케이션 스텝의 효율화를 위한 새로운 공간기반의 인덱싱 기법)

  • Trang, Nguyen-Van;Hwang, Jeong-Hee;Ryu, Keun-Ho
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2004.04b
    • /
    • pp.10-12
    • /
    • 2004
  • Nowadays, with the rapid emergence of XML as a standard for data exchange over the Internet had led to considerable interest In the problem of data management requirements such as the need to store and query XML documents in which the location path languages Xpath is of particular important for XML application since it is a core component of many XML processing standards such as XSLT or XQuery, This parer gives a brief overview about method and design by applying a new spatial-based indexing method namely Xp-free that used for supporting Xpath. Spatial indexing technique has been proved its capacity on searching in large databases. Based on accelerating a node using planar as combined with the numbering schema, we devise efficiently derivative algorithms, which are simple, but useful. Besides that, it also allows to trace all Its relative nodes of context node In a manner supporting queries natural to the types especially Xpath queries with predicates.

  • PDF

CC-GiST: A Generalized Framework for Efficiently Implementing Arbitrary Cache-Conscious Search Trees (CC-GiST: 임의의 캐시 인식 검색 트리를 효율적으로 구현하기 위한 일반화된 프레임워크)

  • Loh, Woong-Kee;Kim, Won-Sik;Han, Wook-Shin
    • The KIPS Transactions:PartD
    • /
    • v.14D no.1 s.111
    • /
    • pp.21-34
    • /
    • 2007
  • According to recent rapid price drop and capacity growth of main memory, the number of applications on main memory databases is dramatically increasing. Cache miss, which means a phenomenon that the data required by CPU is not resident in cache and is accessed from main memory, is one of the major causes of performance degradation of main memory databases. Several cache-conscious trees have been proposed for reducing cache miss and making the most use of cache in main memory databases. Since each cache-conscious tree has its own unique features, more than one cache-conscious tree can be used in a single application depending on the application's requirement. Moreover, if there is no existing cache-conscious tree that satisfies the application's requirement, we should implement a new cache-conscious tree only for the application's sake. In this paper, we propose the cache-conscious generalized search tree (CC-GiST). The CC-GiST is an extension of the disk-based generalized search tree (GiST) [HNP95] to be tache-conscious, and provides the entire common features and algorithms in the existing cache-conscious trees including pointer compression and key compression techniques. For implementing a cache-conscious tree based on the CC-GiST proposed in this paper, one should implement only a few functions specific to the cache-conscious tree. We show how to implement the most representative cache-conscious trees such as the CSB+-tree, the pkB-tree, and the CR-tree based on the CC-GiST. The CC-GiST eliminates the troublesomeness caused by managing mire than one cache-conscious tree in an application, and provides a framework for efficiently implementing arbitrary cache-conscious trees with new features.

MOVING OBJECT JOIN ALGORITHMS USING TB- TREE

  • Lee Jai-Ho;Lee Seong-Ho;Kim Ju-Wan
    • Proceedings of the KSRS Conference
    • /
    • 2005.10a
    • /
    • pp.309-312
    • /
    • 2005
  • The need for LBS (Loc,ation Based Services) is increasing due to the wnespread of mobile computing devices and positioning technologies~ In LBS, there are many applications that need to manage moving objects (e.g. taxies, persons). The moving object join operation is to make pairs with spatio-temporal attribute for two sets in the moving object database system. It is import and complicated operation. And processing time increases by geometric progression with numbers of moving objects. Therefore efficient methods of spatio-temporal join is essential to moving object database system. In this paper, we apply spatial join methods to moving objects join. We propose two kind of join methods with TB- Tree that preserves trajectories of moving objects. One is depth first traversal spatio-temporaljoin and another is breadth-first traversal spatio-temporal join. We show results of performance test with sample data sets which are created by moving object ,generator tool.

  • PDF