• Title/Summary/Keyword: Inverted Index Method

Search Result 38, Processing Time 0.025 seconds

Odysseus/Parallel-OOSQL: A Parallel Search Engine using the Odysseus DBMS Tightly-Coupled with IR Capability (오디세우스/Parallel-OOSQL: 오디세우스 정보검색용 밀결합 DBMS를 사용한 병렬 정보 검색 엔진)

  • Ryu, Jae-Joon;Whang, Kyu-Young;Lee, Jae-Gil;Kwon, Hyuk-Yoon;Kim, Yi-Reun;Heo, Jun-Suk;Lee, Ki-Hoon
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.14 no.4
    • /
    • pp.412-429
    • /
    • 2008
  • As the amount of electronic documents increases rapidly with the growth of the Internet, a parallel search engine capable of handling a large number of documents are becoming ever important. To implement a parallel search engine, we need to partition the inverted index and search through the partitioned index in parallel. There are two methods of partitioning the inverted index: 1) document-identifier based partitioning and 2) keyword-identifier based partitioning. However, each method alone has the following drawbacks. The former is convenient in inserting documents and has high throughput, but has poor performance for top h query processing. The latter has good performance for top-k query processing, but is inconvenient in inserting documents and has low throughput. In this paper, we propose a hybrid partitioning method to compensate for the drawback of each method. We design and implement a parallel search engine that supports the hybrid partitioning method using the Odysseus DBMS tightly coupled with information retrieval capability. We first introduce the architecture of the parallel search engine-Odysseus/parallel-OOSQL. We then show the effectiveness of the proposed system through systematic experiments. The experimental results show that the query processing time of the document-identifier based partitioning method is approximately inversely proportional to the number of blocks in the partition of the inverted index. The results also show that the keyword-identifier based partitioning method has good performance in top-k query processing. The proposed parallel search engine can be optimized for performance by customizing the methods of partitioning the inverted index according to the application environment. The Odysseus/parallel OOSQL parallel search engine is capable of indexing, storing, and querying 100 million web documents per node or tens of billions of web documents for the entire system.

UIL:A Novel Indexing Method for Spatial Objects and Moving Objects

  • Huang, Xuguang;Baek, Sung-Ha;Lee, Dong-Wook;Chung, Weon-Il;Bae, Hae-Young
    • Journal of Korea Spatial Information System Society
    • /
    • v.11 no.2
    • /
    • pp.19-26
    • /
    • 2009
  • Ubiquitous service based on Spatio-temporal dataspaces requires not only the moving objects data but also the spatial objects. However, existing methods can not handle the moving objects and spatial objects together. To overcome the limitation of existing methods, we propose a new index structure called UIL (Union Indexing Lists) which contains two parts: MOL (Moving Object List) and SOL (Spatial Object List) to index the moving objects and spatial objects together. In addition, it can suppose the flexible queries on these data. We present the results of a series of tests which indicate that the structure perform well.

  • PDF

Application of the L-index to the Delineation of Market Areas of Retail Businesses

  • Lee, Sang-Kyeong;Lee, Byoungkil
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.32 no.3
    • /
    • pp.245-251
    • /
    • 2014
  • As delineating market areas of retail businesses has become an interesting topic in marketing field, Lee and Lee recently suggested a noteworthy method, which applied the hydrological analysis of geographical information system (GIS), based on Christaller's central place theory. They used a digital elevation model (DEM) which inverted the kernel density of retail businesses, which was measured by using bandwidths of pre-determined 500, 1000 and 5000 m, respectively. In fact, their method is not a fully data-based approach in that they used pre-determined kernel bandwidths, however, this paper has been planned to improve Lee and Lee's method by using a kind of data-based approach of the L-index that describes clustering level of point feature distribution. The case study is implemented to automobile-related retail businesses in Seoul, Korea with selected Kernel bandwidths, 1211.5, 2120.2 and 7067.2 m from L-index analysis. Subsequently, the kernel density is measured, the density DEM is created by inverting it, and boundaries of market areas are extracted. Following the study, analysis results are summarized as follows. Firstly, the L-index can be a useful tool to complement the Lee and Lee's market area analysis method. At next, the kernel bandwidths, pre-determined by Lee and Lee, cannot be uniformly applied to all kinds of retail businesses. Lastly, the L-index method can be useful for analyzing the space structure of market areas of retail businesses, based on Christaller's central place theory.

Term Clustering and Duplicate Distribution for Efficient Parallel Information Retrieval (효율적인 병렬정보검색을 위한 색인어 군집화 및 분산저장 기법)

  • 강재호;양재완;정성원;류광렬;권혁철;정상화
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.1_2
    • /
    • pp.129-139
    • /
    • 2003
  • The PC cluster architecture is considered as a cost-effective alternative to the existing supercomputers for realizing a high-performance information retrieval (IR) system. To implement an efficient IR system on a PC cluster, it is essential to achieve maximum parallelism by having the data appropriately distributed to the local hard disks of the PCs in such a way that the disk I/O and the subsequent computation are distributed as evenly as possible to all the PCs. If the terms in the inverted index file can be classified to closely related clusters, the parallelism can be maximized by distributing them to the PCs in an interleaved manner. One of the goals of this research is the development of methods for automatically clustering the terms based on the likelihood of the terms' co-occurrence in the same query. Also, in this paper, we propose a method for duplicate distribution of inverted index records among the PCs to achieve fault-tolerance as well as dynamic load balancing. Experiments with a large corpus revealed the efficiency and effectiveness of our method.

Some Characteristics of the Performance in Comparison with Indexing techniques for File Organization (화일조직을 위한 인덱싱 기법의 성능 특성 비교)

  • Lee, Gu-Nam
    • Journal of The Korean Association of Information Education
    • /
    • v.1 no.1
    • /
    • pp.49-59
    • /
    • 1997
  • In this thesis, To provide the base of effective data access methods, performance of some indexing techniques used gent-Tally are compared. They are classified as primary key and multikey. For primary key method, made a comparative analysis on Static index. Dynamic index and Hashing. For multikey indexing method K-d tree, K-d-B tree, Inverted file and Grid file of which characteristics are compared. In many applications, multikey indexings are more requested, but are not supplied enough. So, to satisfy users' request - more fast, more exact and to be applied according to the trend of being huge database systems, it is requested more study about multikey data access methods.

  • PDF

Query Optimization on Large Scale Nested Data with Service Tree and Frequent Trajectory

  • Wang, Li;Wang, Guodong
    • Journal of Information Processing Systems
    • /
    • v.17 no.1
    • /
    • pp.37-50
    • /
    • 2021
  • Query applications based on nested data, the most commonly used form of data representation on the web, especially precise query, is becoming more extensively used. MapReduce, a distributed architecture with parallel computing power, provides a good solution for big data processing. However, in practical application, query requests are usually concurrent, which causes bottlenecks in server processing. To solve this problem, this paper first combines a column storage structure and an inverted index to build index for nested data on MapReduce. On this basis, this paper puts forward an optimization strategy which combines query execution service tree and frequent sub-query trajectory to reduce the response time of frequent queries and further improve the efficiency of multi-user concurrent queries on large scale nested data. Experiments show that this method greatly improves the efficiency of nested data query.

Experimental Study on Heat Release in a Lean Premixed Dump Combustor using OH Chemiluminescence Images (희박 예혼합 덤프 연소기에서 OH 자발광을 이용한 열 방출에 관한 실험적 연구)

  • Moon, Gun-Feel;Lee, Jong-Ho;Jeon, Chung-Hwan;Chang, Young-June
    • Proceedings of the KSME Conference
    • /
    • 2004.11a
    • /
    • pp.1146-1151
    • /
    • 2004
  • Measurements of OH chemiluminescence in an atmospheric pressure, laboratory-scale dump combustor at equivalence ratios ranging from 0.63 to 0.89 were reported. The signal from the first electronically excited state of OH to ground state was detected through a band-pass filter with an ICCD. The objectives of this study are two: One is to see the effects of equivalence ratio on global heat release rate and local Rayleigh index distribution. To get the local Rayleigh index distribution, the line-of-sight images were inverted by tomographic method, such as Abel de-convolution. Another aim is to investigate the validity of using OH chemiluminescence acquired with an ICCD as a qualitative measure of local heat release. For constant inlet velocity and temperature, the overall intensities of OH emission acquired at different equivalence ratio showed periodic and higher value at high equivalence ratio. OH intensity averaged over one period of pressure increased exponentially with equivalence ratio. Local Rayleigh index distribution clearly showed the region of amplifying or damping the combustion instability as equivalence ratio increased. It could provide an information/insights on active control such as secondary fuel injection. Finally, local heat release rate derived from reconstructed OH images were presented for typical locations.

  • PDF

Enhanced VLAD

  • Wei, Benchang;Guan, Tao;Luo, Yawei;Duan, Liya;Yu, Junqing
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.7
    • /
    • pp.3272-3285
    • /
    • 2016
  • Recently, Vector of Locally Aggregated Descriptors (VLAD) has been proposed to index image by compact representations, which encodes powerful local descriptors and makes significant improvement on search performance with less memory compared against the state of art. However, its performance relies heavily on the size of the codebook which is used to generate VLAD representation. It indicates better accuracy needs higher dimensional representation. Thus, more memory overhead is needed. In this paper, we enhance VLAD image representation by using two level hierarchical-codebooks. It can provide more accurate search performance while keeping the VLAD size unchanged. In addition, hierarchical-codebooks are used to construct multiple inverted files for more accurate non-exhaustive search. Experimental results show that our method can make significant improvement on both VLAD image representation and non-exhaustive search.

Efficient Linear Path Query Processing using Information Retrieval Techniques for Large-Scale Heterogeneous XML Documents (정보 검색 기술을 이용한 대규모 이질적인 XML 문서에 대한 효율적인 선형 경로 질의 처리)

  • 박영호;한욱신;황규영
    • Journal of KIISE:Databases
    • /
    • v.31 no.5
    • /
    • pp.540-552
    • /
    • 2004
  • We propose XIR-Linear, a novel method for processing partial match queries on large-scale heterogeneous XML documents using information retrieval (IR) techniques. XPath queries are written in path expressions on a tree structure representing an XML document. An XPath query in its major form is a partial match query. The objective of XIR-Linear is to efficiently support this type of queries for large-scale documents of heterogeneous schemas. XIR-Linear has its basis on the schema-level methods using relational tables and drastically improves their efficiency and scalability using an inverted index technique. The method indexes the labels in label paths as key words in texts, and allows for finding the label paths that match the queries far more efficiently than string match used in conventional methods. We demonstrate the efficiency and scalability of XIR-Linear by comparing it with XRel and XParent using XML documents crawled from the Internet. The results show that XIR-Linear is more efficient than both XRel and XParent by several orders of magnitude for linear path expressions as the number of XML documents increases.

Types of perception on the body shape of old-old aged women

  • Cha, Su-Joung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.23 no.4
    • /
    • pp.121-129
    • /
    • 2018
  • The purpose of this study is to provide a basic data of clothing development which can improve the satisfaction of the body shape by examining the subjective evaluation and type characteristics of the old-old women themselves. Q methodology was used for the study of subjectivity. The types of the body shape of the old-old women were analyzed as five types: bent body with protruding abdomen, backward bent body with slender legs, inverted triangle, swollen cylinder, triangle. The bent body with protruding abdomen had a bent back and waist. They were recognized that the bust and shoulders were sagging and abdomen was protruding. The backward bent body with slender legs was the smallest of the five types with a BMI index and shoulders and bust were sagging. And knee and waist were bent and legs were thin. The inverted triangular shape showed the highest BMI index among the 5 types, indicating that it is obese. They thought that the upper body was developed and the lower body and legs were slender. The swollen cylinder shape was analyzed to be the smallest and the most fat body. The triangle shape had developed lower body and bent back and waist. It is considered that a design consideration is needed to cover the disadvantages of the body shape in consideration of not only wearing feeling but also aesthetic part when making clothes. By making ergonomic garments considering the characteristics of body shape, it can be expected to change the body shape due to the wearing of clothing that is not suitable for body shape and the effect on physical health in a positive direction.