• Title/Summary/Keyword: Indexing Databases

Search Result 186, Processing Time 0.029 seconds

Hilbert-curve based Multi-dimensional Indexing Key Generation Scheme and Query Processing Algorithm for Encrypted Databases (암호화 데이터를 위한 힐버트 커브 기반 다차원 색인 키 생성 및 질의처리 알고리즘)

  • Kim, Taehoon;Jang, Miyoung;Chang, Jae-Woo
    • Journal of Korea Multimedia Society
    • /
    • v.17 no.10
    • /
    • pp.1182-1188
    • /
    • 2014
  • Recently, the research on database outsourcing has been actively done with the popularity of cloud computing. However, because users' data may contain sensitive personal information, such as health, financial and location information, the data encryption methods have attracted much interest. Existing data encryption schemes process a query without decrypting the encrypted databases in order to support user privacy protection. On the other hand, to efficiently handle the large amount of data in cloud computing, it is necessary to study the distributed index structure. However, existing index structure and query processing algorithms have a limitation that they only consider single-column query processing. In this paper, we propose a grid-based multi column indexing scheme and an encrypted query processing algorithm. In order to support multi-column query processing, the multi-dimensional index keys are generated by using a space decomposition method, i.e. grid index. To support encrypted query processing over encrypted data, we adopt the Hilbert curve when generating a index key. Finally, we prove that the proposed scheme is more efficient than existing scheme for processing the exact and range query.

Content-Based Indexing and Retrieval in Large Image Databases

  • Cha, Guang-Ho;Chung, Chin-Wan
    • Journal of Electrical Engineering and information Science
    • /
    • v.1 no.2
    • /
    • pp.134-144
    • /
    • 1996
  • In this paper, we propose a new access method, called the HG-tree, to support indexing and retrieval by image content in large image databases. Image content is represented by a point in a multidimensional feature space. The types of queries considered are the range query and the nearest-neighbor query, both in a multidimensional space. Our goals are twofold: increasing the storage utilization and decreasing the area covered by the directory regions of the index tree. The high storage utilization and the small directory area reduce the number of nodes that have to be touched during the query processing. The first goal is achieved by absorbing splitting if possible, and when splitting is necessary, converting two nodes to three. The second goal is achieved by maintaining the area occupied by the directory region minimally on the directory nodes. We note that there is a trade-off between the two design goals, but the HG-tree is so flexible that it can control the trade-off. We present the design of our access method and associated algorithms. In addition, we report the results of a series of tests, comparing the proposed access method with the buddy-tree, which is one of the most successful point access methods for a multidimensional space. The results show the superiority of our method.

  • PDF

A Study of automatic indexing based on the linguistic analysis for newspaper articles (언어학적 분석기법에 의한 신문기사 자동색인시스팀 설계에 관한 연구)

  • Seo, Gyeong-Ju;SaGong, Cheol
    • Journal of the Korean Society for information Management
    • /
    • v.8 no.1
    • /
    • pp.78-99
    • /
    • 1991
  • So far, most of Korea's newspapers indexing have been done manually using tesaurus. In recent years, however, the need for automatic indexing system has grown stronger so as for indexers to save time, efforts and money. And some newspapers have started establishing their databases along with introducing electronic newspapers and CTS. This thesis is on establishing and automatic indexing system for the full-text of the Korea Economic Daily's articles, which have been accumulated in its database, KETEL. In my thesis, I suggest methods to create a keyword file, a stopword list, an auxiliary word list and an infected word list by applying linguistic analysis methods to Hangul, taking advantage of the language's morphological peculiarity. Through these studies, I was able to reach four conclusions as follows. First, we can obtain satisfactory keywords by automatic indexing methods that were made through morphological analysis. Second, an indexer can improve the efficiency of indexing work by controlling extracted vocabulary, as syntax analysis and semantic analysis is not complete in Hangul. Third, The keyword file in this system which is made of about 20,000 most-frequently-used newspaper terms can be used in the future in compiling a thesaurus. Finally, the suggested methods to prepare an auxiliary word list and an infected word list can be applicable to designing other automatic systems.

  • PDF

High-Dimensional Image Indexing based on Adaptive Partitioning ana Vector Approximation (적응 분할과 벡터 근사에 기반한 고차원 이미지 색인 기법)

  • Cha, Gwang-Ho;Jeong, Jin-Wan
    • Journal of KIISE:Databases
    • /
    • v.29 no.2
    • /
    • pp.128-137
    • /
    • 2002
  • In this paper, we propose the LPC+-file for efficient indexing of high-dimensional image data. With the proliferation of multimedia data, there Is an increasing need to support the indexing and retrieval of high-dimensional image data. Recently, the LPC-file (5) that based on vector approximation has been developed for indexing high-dimensional data. The LPC-file gives good performance especially when the dataset is uniformly distributed. However, compared with for the uniformly distributed dataset, its performance degrades when the dataset is clustered. We improve the performance of the LPC-file for the strongly clustered image dataset. The basic idea is to adaptively partition the data space to find subspaces with high-density clusters and to assign more bits to them than others to increase the discriminatory power of the approximation of vectors. The total number of bits used to represent vector approximations is rather less than that of the LPC-file since the partitioned cells in the LPC+-file share the bits. An empirical evaluation shows that the LPC+-file results in significant performance improvements for real image data sets which are strongly clustered.

Shape-Based Subsequence Retrieval Supporting Multiple Models in Time-Series Databases (시계열 데이터베이스에서 복수의 모델을 지원하는 모양 기반 서브시퀀스 검색)

  • Won, Jung-Im;Yoon, Jee-Hee;Kim, Sang-Wook;Park, Sang-Hyun
    • The KIPS Transactions:PartD
    • /
    • v.10D no.4
    • /
    • pp.577-590
    • /
    • 2003
  • The shape-based retrieval is defined as the operation that searches for the (sub) sequences whose shapes are similar to that of a query sequence regardless of their actual element values. In this paper, we propose a similarity model suitable for shape-based retrieval and present an indexing method for supporting the similarity model. The proposed similarity model enables to retrieve similar shapes accurately by providing the combination of various shape-preserving transformations such as normalization, moving average, and time warping. Our indexing method stores every distinct subsequence concisely into the disk-based suffix tree for efficient and adaptive query processing. We allow the user to dynamically choose a similarity model suitable for a given application. More specifically, we allow the user to determine the parameter p of the distance function $L_p$ when submitting a query. The result of extensive experiments revealed that our approach not only successfully finds the subsequences whose shapes are similar to a query shape but also significantly outperforms the sequence search.

A Study on Indexing Moving Objects using the 3D R-tree (3차원 R-트리를 이용한 이동체 색인에 관한 연구)

  • Jon, Bong-Gi
    • Journal of the Korea Society of Computer and Information
    • /
    • v.10 no.4 s.36
    • /
    • pp.65-75
    • /
    • 2005
  • Moving-objects databases should efficiently support database queries that refer to the trajectories and positions of continuously moving objects. To improve the performance of these queries. an efficient indexing scheme for continuously moving objects is required. To my knowledge, range queries on current positions cannot be handled by the 3D R-tree and the TB-tree. In order to handle range queries on current and past positions. I modified the original 3D R-tree to keep the now tags. Most of spatio-temporal index structures suffer from the fact that they cannot efficiently process range queries past positions of moving objects. To address this issue. we propose an access method, called the Tagged Adaptive 3DR-tree (or just TA3DR-tree), which is based on the original 3D R-tree method. The results of our extensive experiments show that the Tagged Adaptive 3DR-tree outperforms the original 3D R-tree and the TB-tree typically by a big margin.

  • PDF

MPI: A Practical Index Scheme for XML Data in Object Databases

  • Song Ha-Joo
    • Journal of Korea Multimedia Society
    • /
    • v.8 no.6
    • /
    • pp.729-734
    • /
    • 2005
  • In order to access XML data stored in object databases, an efficient index scheme is inevitable. There have been several index schemes that can be used to efficiently retrieve XML data stored In object databases, but they are all the single path indexes that support indexing along a single schema path. Henee, if a query contains an extended path which is denoted by wild character ('*'), a query processor has to examine multiple index objects, resulting in poor performance and inconsistent index management. In this paper, we propose MPI (Multi-Path Index) scheme as a new index scheme that provides the functionality of multiple path indexes more efficiently, while it uses only one index structure. The proposed scheme is easy to manage since it considers the extended path as a logically single schema path. It is also practical since it can be implemented by little modification of the B -tree index structure.

  • PDF

Future and Directions for Research in Full Text Databases (본문 데이타베이스 연구에 관한 고찰과 그 전망)

  • Ro Jung Soon
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.17
    • /
    • pp.49-83
    • /
    • 1989
  • A Full text retrieval system is a natural language document retrieval system in which the full text of all documents in a collection is stored on a computer so that every word in every sentence of every document can be located by the machine. This kind of IR System is recently becoming rapidly available online in the field of legal, newspaper, journal and reference book indexing. Increased research interest has been in this field. In this paper, research on full text databases and retrieval systems are reviewed, directions for research in this field are speculated, questions in the field that need answering are considered, and variables affecting online full text retrieval and various role that variables play in a research study are described. Two obvious research questions in full text retrieval have been how full text retrieval performs and how to improve the retrieval performance of full text databases. Research to improve the retrieval performance has been incorporated with ranking or weighting algorithms based on word occurrences, combined menu-driven and query-driven systems, and improvement of computer architectures and record structure for databases. Recent increase in the number of full text databases with various sizes, forms and subject matters, and recent development in computer architecture artificial intelligence, and videodisc technology promise new direction of its research and scholarly growth. Studies on the interrelationship between every elements of the full text retrieval situation and the relationship between each elements and retrieval performance may give a professional view in theory and practice of full text retrieval.

  • PDF

A DNA Index Structure using Frequency and Position Information of Genetic Alphabet (염기문자의 빈도와 위치정보를 이용한 DNA 인덱스구조)

  • Kim Woo-Cheol;Park Sang-Hyun;Won Jung-Im;Kim Sang-Wook;Yoon Jee-Hee
    • Journal of KIISE:Databases
    • /
    • v.32 no.3
    • /
    • pp.263-275
    • /
    • 2005
  • In a large DNA database, indexing techniques are widely used for rapid approximate sequence searching. However, most indexing techniques require a space larger than original databases, and also suffer from difficulties in seamless integration with DBMS. In this paper, we suggest a space-efficient and disk-based indexing and query processing algorithm for approximate DNA sequence searching, specially exact match queries, wildcard match queries, and k-mismatch queries. Our indexing method places a sliding window at every possible location of a DNA sequence and extracts its signature by considering the occurrence frequency of each nucleotide. It then stores a set of signatures using a multi-dimensional index, such as R*-tree. Especially, by assigning a weight to each position of a window, it prevents signatures from being concentrated around a few spots in index space. Our query processing algorithm converts a query sequence into a multi-dimensional rectangle and searches the index for the signatures overlapped with the rectangle. The experiments with real biological data sets revealed that the proposed method is at least three times, twice, and several orders of magnitude faster than the suffix-tree-based method in exact match, wildcard match, and k- mismatch, respectively.

A Data Type for Concept-Based Retrieval against Image Databases Indefinitely Indexed (불확정적으로 색인된 이미지 데이터베이스를 개념 기반으로 검색하기 위한 자료형)

  • Yang, Jae-Dong
    • Journal of KIISE:Databases
    • /
    • v.29 no.1
    • /
    • pp.27-33
    • /
    • 2002
  • There are two significant drawbacks in triple image indexing; one is that is cannot support concept-based image retrieval and the other is that it fails to allow disjunctive labeling of images. To remedy the drawbacks, we propose a new technique supporting a concept-based retrieval against images indexed by indefinite fuzzy triples (I-fuzzy triples). The I-fuzzy triples allow not only a disjunctive image labeling, but also a concept-based matching against images labeled disjunctively. The disjunctive labeling is based on the expended closed world assumption and the concept-based image retrieval is based on fuzzy matching. In this paper, we also propose a concept-based query evaluation against the image database to extract desired answers with the degree of certainty $\alpha$$\in$[1,0].