• Title/Summary/Keyword: inverted indexes

Search Result 12, Processing Time 0.03 seconds

An Efficient Information Retrieval System for Unstructured Data Using Inverted Index

  • Abdullah Iftikhar;Muhammad Irfan Khan;Kulsoom Iftikhar
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.7
    • /
    • pp.31-44
    • /
    • 2024
  • The inverted index is combination of the keywords and posting lists associated for indexing of document. In modern age excessive use of technology has increased data volume at a very high rate. Big data is great concern of researchers. An efficient Document indexing in big data has become a major challenge for researchers. All organizations and web engines have limited number of resources such as space and storage which is very crucial in term of data management of information retrieval system. Information retrieval system need to very efficient. Inverted indexing technique is introduced in this research to minimize the delay in retrieval of data in information retrieval system. Inverted index is illustrated and then its issues are discussed and resolve by implementing the scalable inverted index. Then existing algorithm of inverted compared with the naïve inverted index. The Interval list of inverted indexes stores on primary storage except of auxiliary memory. In this research an efficient architecture of information retrieval system is proposed particularly for unstructured data which don't have a predefined structure format and data volume.

Inverted Indexes for XML Updates and Full-Text Retrievals in Relational Model (관계형 모델에서 XML 변경과 전문 검색을 지원하기 위한 역 인덱스 구축 기법)

  • Cheon, Yun-Woo;Hong, Dong-Kweon
    • The KIPS Transactions:PartD
    • /
    • v.11D no.3
    • /
    • pp.509-518
    • /
    • 2004
  • Recently there has been some efforts to add XML full-text retrievals and XML updates into new standardization of XML queries. XML full-text retrievals plays an important role in XML query languages. of like tables in relational model an XML document has complex and unstructured natures. We believe that when we try to get some information from unstructured XML documents a full-text retrieval query is much more convenient approach than a regular structured query XML update is another core function that an XML query have to have. In this paper we propose an inverted index to support XML updates and XML full-text queries in relational environment. Performance comparisons exhibit that our approach maintains a comparable size of inverted indexes and it supports many full-text retrieval functions very well. It also shows very stable retrieval performance especially for large size of XML documents. Foremost our approach handles XML updates efficiently by removing cascading effects.

The Shape and the Location of Forehead Hairline of Korean Males in Their 20s & 30s (20, 30대 한국 남성의 전두부 모발선의 모양과 위치)

  • Yoon, Sung-Won;Kim, Chung-Hun
    • Archives of Plastic Surgery
    • /
    • v.38 no.3
    • /
    • pp.295-299
    • /
    • 2011
  • Purpose: It is generally believed that alopecia is caused by various factors such as scars, stress, genetical factors, androgens, etc. Androgenic alopecia is one of the most common cause of alopecia and found mainly in males. Propecia (Merck & Co., USA) and Minoxidil (McNEIL-PPC, Inc, USA) were the drugs approved from FDA for treatment of androgenic alopecia. Surgical treatments such as flap, tissue expansion, scalp reduction and hair transplantation can be considered if necessary. Hair micrograft techniques were developed for natural hair shapes and minimal adverse effect. There were attempts to measure the length of the forehead of the Korean young adults. However attempts to classify the shape and location of forehead hairline were rare. This study attempted to find out standard hairlines of young adults in their 20s & 30s and the result would be the guideline of the hairline in hair replacement surgery of male patients in their 40s & 50s. Methods: 200 male adults in 20s and 30s were photographed and measured the length of 11 vertical index lines to determine hairline. The indexes are the distances from hairline to intercanthal midpoint (A), to medial canthus (B), to upper eyelid fissure (C), to lower eyelid fissure (D), to lateral canthus (E) and distance from lateral highest point to medial lowest point, if the hairline is M-shape (F). Additionally, we classified the hairlines into 4 groups, M, horizontal, inverted U and irregular shapes. Results: The most common hairline of male adults in their 20s is inverted U-shape (53.3%), followed by horizontal-shape, M-shape, irregular-shape. In their 30s, inverted U-shape (59%) is followed by irregular-shape, M-shape, horizontal-shape. The M-shape is more frequently found in males in 30s than those in 20s. The mean values of the indexes in their 20s are as follows: A (76.14 mm), B (Rt: 75.78 mm, Lt:76.41 mm), C (Rt: 69.43 mm, Lt: 69.92 mm), D (Rt: 76.92 mm, Lt:77.46 mm), E (Rt: 64.16 mm, Lt: 64.73 mm), F (4.09 mm). Those in their 30s are as follows: A (76.13 mm), B (Rt: 76.114 mm, Lt: 76.02 mm), C (Rt: 69.87 mm, Lt: 70.37 mm), D (Rt: 77.37 mm, Lt: 77.58 mm), E (Rt: 69.63 mm, Lt: 69.85 mm), F (6.14 mm). Conclusion: The knowledge about human body measurement is indispensable to plastic surgeons. In this study, inverted U shape is the most common type of hairline in 30s, and similar distribution is found in 20s. The percentage of M shape in their 30s is elevated more than 10% compared to that in their 20s. The study of hairline shapes and 11 indexes of hairlines can be useful for planning of the hair transplantation and postoperative evaluation. This study being based on photogrammetry, there may be differences between actual distance of curved face and projected distance on flat photographs.

Update conscious and depth insensitive inverted indexes for XML full-text queries (XML 문서의 변경을 고려한 XML 전문 검색 역인덱스)

  • Kwon, Guk-Bong;Hong, Dong-Kweon;Kim, Kweon-Yang
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2004.05a
    • /
    • pp.81-84
    • /
    • 2004
  • XML 문서는 관계형 테이블과는 달리 문서의 구조가 매우 복잡하고 불규칙하여 부분적인 정보를 최대한 활용하는 전문 검색이 일반적인 구조적 검색보다 더 중요한 역할을 한다. XML 문서는 계층이 있으므로 계층을 사용하는 전문 검색 연산은 계층을 제공함으로써 검색 공간을 줄여서 검색의 정확성과 효율성을 훨씬 더 높일 수 있다. 전문 검색 연산을 효과적으로 지원하기 위한 방법으로는 역인덱스를 (inverted index) 사용하는 것이 가장 일반적인 방법이다. 지금까지의 전문 검색을 위한 XML 문서의 구조 정보를 표현, 저장하는 방법들은 문서의 내용이 변경되지 않는 정적 문서(static documents)만을 고려하여 왔다. 이 방법들은 문서가 동적으로 변화할 경우 저장된 문서의 구조 정보 중에서 많은 부분을 다시 표현해야 하는 비효율적인 면이 있다. 본 논문은 XML 문서의 동적인 변화를 지원하면서 동시에 복잡한 XML 전문 검색을 지원하기 위한 방법으로 경로 스트링을 사용하는 효율적인 역 인덱스 구축 기법을 제안하고 제안하는 방법이 복잡한 문서의 검색과 문서의 동적인 변화를 효율적으로 검색할 수 있음을 보인다.

  • PDF

Development of an Automatic Hypertext Indexer for Dynamic Information Storage (동적 정보 저장을 위한 자동 하이퍼텍스트 색인 기법의 개발)

  • Yi, Dong-Ae;Jang, Duk-Sung
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.9
    • /
    • pp.2333-2341
    • /
    • 1997
  • The hyperlinks to related nodes should be changed when we insert, or modify an information in a hypertext database. We can find more informations by means of hyperlinks that are based upon hypertext indexes. Therefore, the management of the hypertext indexes is an important component for dynamic information storage. In this paper, we suggest a method to manage the hypertext indexes and to determine hyperlinks automatically by using a dynamic indexer. We also construct index, stopword, and postposition dictionaries, an inverted index file, and a thesaurus to help the dynamic indexer.

  • PDF

Efficient Linear Path Query Processing using Information Retrieval Techniques for Large-Scale Heterogeneous XML Documents (정보 검색 기술을 이용한 대규모 이질적인 XML 문서에 대한 효율적인 선형 경로 질의 처리)

  • 박영호;한욱신;황규영
    • Journal of KIISE:Databases
    • /
    • v.31 no.5
    • /
    • pp.540-552
    • /
    • 2004
  • We propose XIR-Linear, a novel method for processing partial match queries on large-scale heterogeneous XML documents using information retrieval (IR) techniques. XPath queries are written in path expressions on a tree structure representing an XML document. An XPath query in its major form is a partial match query. The objective of XIR-Linear is to efficiently support this type of queries for large-scale documents of heterogeneous schemas. XIR-Linear has its basis on the schema-level methods using relational tables and drastically improves their efficiency and scalability using an inverted index technique. The method indexes the labels in label paths as key words in texts, and allows for finding the label paths that match the queries far more efficiently than string match used in conventional methods. We demonstrate the efficiency and scalability of XIR-Linear by comparing it with XRel and XParent using XML documents crawled from the Internet. The results show that XIR-Linear is more efficient than both XRel and XParent by several orders of magnitude for linear path expressions as the number of XML documents increases.

Index Graph : An IR Index Structure for Dynamic Document Database (인덱스 그래프 : 동적 문서 데이터베이스를 위한 IR 인덱스 구조)

  • 박병권
    • The Journal of Information Systems
    • /
    • v.10 no.1
    • /
    • pp.257-278
    • /
    • 2001
  • An IR(information retrieval) index for dynamic document databases where insertion, deletion, and update of documents happen frequently should be frequently updated. As the conventional structure of IR index is, however, focused on the information retrieval purpose, its structure is inefficient to handle dynamic update of it. In this paper, we propose a new structure for IR Index, we call it Index Graph, which is organized by connecting multiple indexes into a graph structure. By analysis and experiment, we prove the Index Graph is superior to the conventional structure of IR index in the performance of insertion, deletion, and update of documents as well as the performance of information retrieval.

  • PDF

Control of Unstable Systems Concerned with the Performance Indexes and Constraints (성능지수와 제약조건을 고려한 불안정 시스템의 제어)

  • Ahn, Jong-Kap;Lee, Yun-Hung;So, Myung-Ok
    • Journal of Advanced Marine Engineering and Technology
    • /
    • v.32 no.5
    • /
    • pp.785-790
    • /
    • 2008
  • A technique for determining the feedback gain of the states feedback controller using a real-coded genetic algorithm(RCGA) is presented. It is concerned with the states error to the performance index of a RCGA. As for assessing the performance of the controller three performance criteria (ISE. IAE and ITAE) are adopted. And designing the controller involves a constrained optimization problem. Therefore a real-coded genetic algorithm incorporating the penalty strategy is used. The performance of the proposed method is demonstrated through a set of simulation about an inverted pendulum system.

Retrieval of Large scaled XML Documents based on Path Query using Inverted indexes (역 색인을 이용한 경로 질의 기반 대용량 XML문서 검색)

  • Moon, Kyung-Won;Hwang, Byung-Yeon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2005.05a
    • /
    • pp.35-38
    • /
    • 2005
  • 1998년 XML 문서 표준이 제안된 이래, 다양한 응용 분야에서 XML은 데이터를 표현하는 표준으로 자리잡아 가고 있다. 특히, 인터넷상의 많은 데이터들이 XML 형태로 작성되고 변환됨에 따라 다량의 XML 데이터가 생성되고 있다. 따라서 현재 XML 문서의 저장 및 질의 처리 기법의 연구가 활발하게 진행되고 있다. 하지만 기존의 연구는 대용량 XML 문서를 다루기에는 미흡한 점이 있다. 본 논문에서는 인터넷상의 널리 퍼져있는 방대하고, 다양한 구조의 XML문서들을 대상으로 패스 기반 질의를 빠르게 처리할 수 있는 검색 기법을 제안한다. 제안된 기법은 인터넷상에 산재해 있는 여러 XML 문서를 관계형 데이터베이스에 효율적으로 저장하고 질의를 통해 인터넷상 XML 문서의 엘리먼트를 빠르게 검색하는데 주안점을 둔다. 먼저, XML 문서를 관계형 데이터베이스에 효율적으로 저장하는 계층형 XML 저장 기법을 제안하고, 정보 검색 시스템에서 많이 사용하는 역 인덱스를 사용하여 저장된 XML 문서에 대한 검색 성능을 향상시킨다.

  • PDF

The Impact of Network with Central City on Urban Growth (중심도시와의 네트워크가 도시성장에 미치는 영향)

  • Eom, Hyuntae;Woo, Myungje
    • Journal of Korea Planning Association
    • /
    • v.54 no.3
    • /
    • pp.15-26
    • /
    • 2019
  • The development of science and transportation technology leads to the increase of inter - city networks that play an important role in urban growth. Overall, numerous studies based on network theory pay attention to positive effects of urban network on urban growth. However, some studies have pointed out the negative effects of inter-city interactions such as straw effects. This implies that the network between cities may not be positively correlated with urban growth, and that the direction of the influence may vary from a certain threshold, such as the marginal utility curve. In this context, the purpose of this study is to measure the impacts of network with central city on urban growth in the capital region and examine the relationship between urban network and growth. Two multiple regression models are employed with changes in population and employment as dependent variables. The urban network index and other control variables are used as independent variables. Especially, the urban network indexes are used in quadratic forms to examine non linear relations with urban growth such U-shape or an inverted U-shape. The results show that the relationships between networks with the central city and urban growth are not a simple linear, and the influence can be changed from the critical point.