• Title/Summary/Keyword: query clustering

Search Result 120, Processing Time 0.03 seconds

Bulk Insertion Method for R-tree using Seeded Clustering (R-tree에서 Seeded 클러스터링을 이용한 다량 삽입)

  • 이태원;문봉기;이석호
    • Journal of KIISE:Databases
    • /
    • v.31 no.1
    • /
    • pp.30-38
    • /
    • 2004
  • In many scientific and commercial applications such as Earth Observation System (EOSDIS) and mobile Phone services tracking a large number of clients, it is a daunting task to archive and index ever increasing volume of complex data that are continuously added to databases. To efficiently manage multidimensional data in scientific and data warehousing environments, R-tree based index structures have been widely used. In this paper, we propose a scalable technique called seeded clustering that allows us to maintain R-tree indexes by bulk insertion while keeping pace with high data arrival rates. Our approach uses a seed tree, which is copied from the top k levels of a target R-tree, to classify input data objects into clusters. We then build an R-tree for each of the clusters and insert the input R-trees into the target R-tree in bulk one at a time. We present detailed algorithms for the seeded clustering and bulk insertion as well as the results from our extensive experimental study. The experimental results show that the bulk insertion by seeded clustering outperforms the previously known methods in terms of insertion cost and the quality of target R-trees measured by their query performance.

Efficient Disk Access Method Using Region Storage Structure in Spatial Continuous Query Processing (공간 연속질의 처리에서 영역 기반의 저장 구조를 이용한 효율적인 디스크 접근 방법)

  • Chung, Weon-Il
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.12 no.5
    • /
    • pp.2383-2389
    • /
    • 2011
  • Ubiquitous applications require hybrid continuous query processing which processes both on-line data stream and spatial data in the disk. In the hybrid continuous spatial query processing, disk access costs for the high-volume spatial data should be minimized. However, previous indexing methods cannot reduce the disk seek time, because it is difficult that the data are stored in contiguity with others. Also, existing methods for the space-filling curve considering data cluster have the problem which does not cluster available data for queries. Therefore, we propose the region storage structure for efficient data access in hybrid continues spatial query processing. This paper shows that there is an obvious improvement of query processing costs through the contiguous data storing method and the group processing for user queries based on the region storage structure.

Object-Based Image Retrieval Using Color Adjacency and Clustering Method (컬러 인접성과 클러스터링 기법을 이용한 객체 기반 영상 검색)

  • Lee Hyung-Jin;Park Ki-Tae;Moon Young-Shik
    • The KIPS Transactions:PartB
    • /
    • v.12B no.1 s.97
    • /
    • pp.31-38
    • /
    • 2005
  • This paper proposes an object-based image retrieval scheme using color adjacency and clustering method. Color adjacency features in boundary regions are utilized to extract candidate blocks of interest from image database and a clustering method is used to extract the regions of interest(ROI) from candidate blocks of interest. To measure the similarity between the query and database images, the histogram intersection technique is used. The color pair information used in the proposed method is robust against translation, rotation, and scaling. Consequently, experimental results have shown that the proposed scheme is superior to existing methods in terms of ANMRR.

Moving Object Tracking using Query Relaying in Wireless Sensor Networks (무선 센서 네트워크에서 질의 중계를 이용한 이동 객체의 위치 추적 방안)

  • Kim, Sangdae;Kim, Cheonyong;Cho, Hyunchong;Yim, Yongbin;Kim, Sang-Ha
    • KIISE Transactions on Computing Practices
    • /
    • v.20 no.11
    • /
    • pp.598-603
    • /
    • 2014
  • In wireless sensor networks, two methods have been generally used to track continuously moving object: a user query-based method and a periodic report-based method. Although the former method generates more overhead as a result of the user queries, the former one is also an energy-efficient method that does not transfer unnecessary information. For the user query-based method, a virtual tree that consist of sensor nodes is used to perform the user query and the sensor reporting. The tree stores the information of the mobile objects, and the stored information triggers a report b the user query. However, in case of a fast-moving object, the tracking accuracy decreases as a result of the time delay of the end-to-end repeated query. In order to solve this problem, we propose a query-relay method that reduces the time delay for mobile object tracking. In the proposed method, the nodes in the tree relay the query to adjacent nodes according to the movement of mobile object that is tracked. When the query messages are relayed. The end-to-end querying time delay is reduced. and a simulation shows that our method is superior to existing ones in terms of tracking accuracy.

Cluster-Based Selection of Diverse Query Examples for Active Learning (능동적 학습을 위한 군집화 기반의 다양한 복수 문의 예제 선정 방법)

  • Kang, Jae-Ho;Ryu, Kwang-Ryel;Kwon, Hyuk-Chul
    • Journal of Intelligence and Information Systems
    • /
    • v.11 no.1
    • /
    • pp.169-189
    • /
    • 2005
  • In order to derive a better classifier with a limited number of training examples, active teaming alternately repeats the querying stage fur category labeling and the subsequent learning stage fur rebuilding the calssifier with the newly expanded training set. To relieve the user from the burden of labeling, especially in an on-line environment, it is important to minimize the number of querying steps as well as the total number of query examples. We can derive a good classifier in a small number of querying steps by using only a small number of examples if we can select multiple of diverse, representative, and ambiguous examples to present to the user at each querying step. In this paper, we propose a cluster-based batch query selection method which can select diverse, representative, and highly ambiguous examples for efficient active learning. Experiments with various text data sets have shown that our method can derive a better classifier than other methods which only take into account the ambiguity as the criterion to select multiple query examples.

  • PDF

A Cluster-Based Top-k Query Processing Algorithm in Wireless Sensor Networks (무선 센서 네트워크에서 클러스터 기반의 Top-k 질의 처리)

  • Yeo, Myung-Ho;Seong, Dong-Ook;Yoo, Jae-Soo
    • Journal of KIISE:Databases
    • /
    • v.36 no.4
    • /
    • pp.306-313
    • /
    • 2009
  • Top-k queries are issued to find out the highest (or lowest) readings in many sensor applications. Many top-k query processing algorithms are proposed to reduce energy consumption; FILA installs a filter at each sensor node and suppress unnecessary sensor updates; PRIM allots priorities to sensor nodes and collects the minimal number of sensor reading according to the priorities. However, if many sensor reading converge into the same range of sensor values, it leads to a problem that many false positives are occurred. In this paper, we propose a cluster-based approach to reduce them effectively. Our proposed algorithm operates in two phases: top-k query processing in the cluster level and top-k query processing in the tree level. False positives are effectively filtered out in each level. Performance evaluations show that our proposed algorithm reduces about 70% false positives and achieves about 105% better performance than the existing top-k algorithms in terms of the network lifetime.

SPEC: Space Efficient Cubes for Data Warehouses (SPEC : 데이타 웨어하우스를 위한 저장 공간 효율적인 큐브)

  • Chun Seok-Ju;Lee Seok-Lyong;Kang Heum-Geun;Chung Chin-Wan
    • Journal of KIISE:Databases
    • /
    • v.32 no.1
    • /
    • pp.1-11
    • /
    • 2005
  • An aggregation query computes aggregate information over a data cube in the query range specified by a user Existing methods based on the prefix-sum approach use an additional cube called the prefix-sum cube(PC), to store the cumulative sums of data, causing a high space overhead. This space overhead not only leads to extra costs for storage devices, but also causes additional propagations of updates and longer access time on physical devices. In this paper, we propose a new prefix-sum cube called 'SPEC' which drastically reduces the space of the PC in a large data warehouse. The SPEC decreases the update propagation caused by the dependency between values in cells of the PC. We develop an effective algorithm which finds dense sub-cubes from a large data cube. We perform an extensive experiment with respect to various dimensions of the data cube and query sizes, and examine the effectiveness and performance ot our proposed method. Experimental results show that the SPEC significantly reduces the space of the PC while maintaining a reasonable query performance.

MLPPI Wizard: An Automated Multi-level Partitioning Tool on Analytical Workloads

  • Suh, Young-Kyoon;Crolotte, Alain;Kostamaa, Pekka
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.4
    • /
    • pp.1693-1713
    • /
    • 2018
  • An important technique used by database administrators (DBAs) is to improve performance in decision-support workloads associated with a Star schema is multi-level partitioning. Queries will then benefit from performance improvements via partition elimination, due to constraints on queries expressed on the dimension tables. As the task of multi-level partitioning can be overwhelming for a DBA we are proposing a wizard that facilitates the task by calculating a partitioning scheme for a particular workload. The system resides completely on a client and interacts with the costing estimation subsystem of the query optimizer via an API over the network, thereby eliminating any need to make changes to the optimizer. In addition, since only cost estimates are needed the wizard overhead is very low. By using a greedy algorithm for search space enumeration over the query predicates in the workload the wizard is efficient with worst-case polynomial complexity. The technology proposed can be applied to any clustering or partitioning scheme in any database management system that provides an interface to the query optimizer. Applied to the Teradata database the technology provides recommendations that outperform a human expert's solution as measured by the total execution time of the workload. We also demonstrate the scalability of our approach when the fact table (and workload) size increases.

Applying Hebbian Theory to Enhance Search Performance in Unstructured Social-Like Peer-to-Peer Networks

  • Huang, Chester S.J.;Yang, Stephen J.H.;Su, Addison Y.S.
    • ETRI Journal
    • /
    • v.34 no.4
    • /
    • pp.591-601
    • /
    • 2012
  • Unstructured peer-to-peer (p2p) networks usually employ flooding search algorithms to locate resources. However, these algorithms often require a large storage overhead or generate massive network traffic. To address this issue, previous researchers explored the possibility of building efficient p2p networks by clustering peers into communities based on their social relationships, creating social-like p2p networks. This study proposes a social relationship p2p network that uses a measure based on Hebbian theory to create a social relation weight. The contribution of the study is twofold. First, using the social relation weight, the query peer stores and searches for the appropriate response peers in social-like p2p networks. Second, this study designs a novel knowledge index mechanism that dynamically adapts social relationship p2p networks. The results show that the proposed social relationship p2p network improves search performance significantly, compared with existing approaches.

An Hybrid Peer to Peer System Development in Technique of Query Reservation (질의 예약 기능을 갖는 Hybrid P2P 시스템 개발)

  • Yang Seung-Won;Lim Jong-Tae
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2005.11a
    • /
    • pp.371-375
    • /
    • 2005
  • Giving service in the web environment, which is the service, service, connected to the database server and receiving data from query causes a problem. It declines the efficiency of down loading and offering server service. Through obtaining data from database cashing or database clustering have used in the researches these days.

  • PDF