• 제목/요약/키워드: distributed indexing

Search Result 47, Processing Time 0.023 seconds

Improving Elasticsearch for Chinese, Japanese, and Korean Text Search through Language Detector

  • Kim, Ki-Ju;Cho, Young-Bok
    • Journal of information and communication convergence engineering
    • /
    • v.18 no.1
    • /
    • pp.33-38
    • /
    • 2020
  • Elasticsearch is an open source search and analytics engine that can search petabytes of data in near real time. It is designed as a distributed system horizontally scalable and highly available. It provides RESTful APIs, thereby making it programming-language agnostic. Full text search of multilingual text requires language-specific analyzers and field mappings appropriate for indexing and searching multilingual text. Additionally, a language detector can be used in conjunction with the analyzers to improve the multilingual text search. Elasticsearch provides more than 40 language analysis plugins that can process text and extract language-specific tokens and language detector plugins that can determine the language of the given text. This study investigates three different approaches to index and search Chinese, Japanese, and Korean (CJK) text (single analyzer, multi-fields, and language detector-based), and identifies the advantages of the language detector-based approach compared to the other two.

Music Source Signature Indexing Method for Quick Search (빠른 검색을 위한 음원 시그니처 인덱싱 방법)

  • Kim, Sang-Kyun;Lee, Kyoung-Sik
    • Journal of Broadcast Engineering
    • /
    • v.26 no.3
    • /
    • pp.321-326
    • /
    • 2021
  • Blockchain is increasing in value as a platform for safe transmission of capital transactions or secure data. In addition, blockchain has the potential as a new platform that can safely store large amounts of data such as videos, music, and photos, and safely manage transaction details and service usage specifications. Since it is not possible to store large-capacity media data in a block, research on the performance of storing sound source information in a block and retrieving the stored sound source data by using the distributed storage system (IPFS) and the hash information of the sound source signature data was conducted. In this paper, we propose a sound source signature indexing method using a bloom filter that can improve the search speed suggested by previous studies. As a result of the experiment, it was confirmed that improved search performance (O(1)) than the existing search performance (O(n)) can be achieved.

Effective Streaming of XML Data for Wireless Broadcasting (무선 방송을 위한 효과적인 XML 스트리밍)

  • Park, Jun-Pyo;Park, Chang-Sup;Chung, Yon-Dohn
    • Journal of KIISE:Databases
    • /
    • v.36 no.1
    • /
    • pp.50-62
    • /
    • 2009
  • In wireless and mobile environments, data broadcasting is recognized as an effective way for data dissemination due to its benefits to bandwidth efficiency, energy-efficiency, and scalability. In this paper, we address the problem of delayed query processing raised by tree-based index structures in wireless broadcast environments, which increases the access time of the mobile clients. We propose a novel distributed index structure and a clustering strategy for streaming XML data which enable energy and latency-efficient broadcast of XML data. We first define the DIX node structure to implement a fully distributed index structure which contains tag name, attributes, and text content of an element as well as its corresponding indices. By exploiting the index information in the DIX node stream, a mobile client can access the wireless stream in a shorter latency. We also suggest a method of clustering DIX nodes in the stream, which can further enhance the performance of query processing over the stream in the mobile clients. Through extensive performance experiments, we demonstrate that our approach is effective for wireless broadcasting of XML data and outperforms the previous methods.

Video Retrieval System supporting Adaptive Streaming Service (적응형 스트리밍 서비스를 지원하는 비디오 검색 시스템)

  • 이윤채;전형수;장옥배
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.9 no.1
    • /
    • pp.1-12
    • /
    • 2003
  • Recently, many researches into distributed processing on Internet, and multimedia data processing have been performed. Rapid and convenient multimedia services supplied with high quality and high speed are to be needed. In this paper, we design and implement clip-based video retrieval system on the Web enviroment in real-time. Our system consists of the content-based indexing system supporting convenient services for video content providers, and the Web-based retrieval system in order to make it easy and various information retrieval for users in the Web. Three important methods are used in the content-based indexing system, key frame extracting method by dividing video data, clip file creation method by clustering related information, and video database construction method by using clip unit. In Web-based retrieval system, retrieval method ny using a key word, two dimension browsing method of key frame, and real-time display method of the clip are used. In this paper, we design and implement the system that supports real-time display method of the clip are used. In this paper, we design and implement the system that supports real-time retrieval for video clips on Web environment and provides the multimedia service in stability. The proposed methods show a usefulness of video content providing, and provide an easy method for serching intented video content.

A Study on Cost Estimation of Spatial Query Processing for Multiple Spatial Query Optimization in GeoSensor Networks (지오센서 네트워크의 다중 공간질의 최적화를 위한 공간질의처리비용 예측 알고리즘 연구)

  • Kim, Min Soo;Jang, In Sung;Li, Ki Joune
    • Spatial Information Research
    • /
    • v.21 no.2
    • /
    • pp.23-33
    • /
    • 2013
  • W ith the recent advancement of IoT (Internet of Things) technology, there has been much interest in the spatial query processing which energy-efficiently acquires sensor readings from sensor nodes inside specified geographical area of interests. Therefore, various kinds of spatial query processing algorithms and distributed spatial indexing methods have been proposed. They can minimize energy consumption of sensor nodes by reducing wireless communication among them using in-network spatial filtering technology. However, they cannot optimize multiple spatial queries which w ill be w idely used in IoT, because most of them have focused on a single spatial query optimization. Therefore, we propose a new multiple spatial query optimization algorithm which can energy-efficiently process multiple spatial queries in a sensor network. The algorithm uses a concept of 'query merging' that performs the merged set after merging multiple spatial queries located at adjacent area. Here, our algorithm makes a decision on which is better between the merged and the separate execution of queries. For such the decision making, we additionally propose the cost estimation method on the spatial query execution. Finally, we analyze and clarify our algorithm's distinguished features using the spatial indexing methods of GR-tree, SPIX, CPS.

A Data Driven Index for Convergence Sensor Networks (융합 센서 네트워크를 위한 데이터 기반 색인)

  • Park, Jeong-Seok
    • Journal of the Korea Convergence Society
    • /
    • v.7 no.6
    • /
    • pp.43-48
    • /
    • 2016
  • Wireless sensor networks (WSN) can be more reliable and easier to program and use with the help of sensor database management systems (SDMS). SDMS establish a user-friendly SQL-based interface to process declarative user-defined queries over sensor readings from WSN. Typical queries in SDMS are ad-hoc snapshot queries and long-running, continuous queries. In SDMSs queries are flooded to all nodes in the sensor net, and query results are sent back from nodes that have qualified results to a base station. For query flooding to all nodes, and result flooding to the base station, a lot of communication energy consuming is required. This paper suggests an efficient in-network index solution, named Distributed Information Gathering (DIG) to process range queries in a sensor net environment that can save energy by reducing query and result flooding.

Power-Aware Query Processing Using Optimized Distributed R-tree in Sensor Networks (센서 네트워크 환경에서 최적화된 분산 R-tree를 이용한 에너지 인식 질의 처리 방법)

  • Pandey Suraj;Eo Sang-Hun;Kim Ho-Seok;Bae Hae-Young
    • The KIPS Transactions:PartD
    • /
    • v.13D no.1 s.104
    • /
    • pp.23-28
    • /
    • 2006
  • In this paper, a power-aware query processing using optimized distributed R-tree in a sensor network is proposed. The proposed technique is a new approach for processing range queries that uses spatial indexing. Range queries are most often encountered under sensor networks for computing aggregation values. The previous work just addressed the importance but didn't provide any efficient technique for processing range queries. A query processing scheme is thus designed for efficiently processing them. Each node in the sensor network has the MBR of the region where its children nodes and the node itself are located. The range query is evaluated over the region which intersects the geographic location of sensors. It ensures the maximum power savings by avoiding the communication of nodes not participating over the evaluation of the query.

BU-Chord Mechanism for Reliable P2P File Sharing over MANET (모바일 에드 혹 네트워크 상의 신뢰성 있는 P2P 파일 공유를 위한 BU-Chord 메커니즘)

  • Jeong Hong-Jong;Song Jeom-Ki;Kim Dong-Kyun
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.31 no.4B
    • /
    • pp.333-343
    • /
    • 2006
  • MANET and P2P applications have a common nature that they don't have any fixed infrastructures that might maintain network topologies. With such common characteristics, a P2P application can be a killer application over MANET. Due to absence of reliable node which serves indexing services in MANET, fully distributed P2P applications are more suitable for MANET. By using DHT like Chord, we can save network bandwidth and avoid a point of failure of a directory server. However, since MANET allows nodes to depart from network freely, P2P file sharing applications using Chord lookup protocol should address how to recover the keys stored at the departed node. In this paper, we propose BU-Chord in order to detect and recover the departure of nodes by creating and storing backup file information in distributed manner. Our BU-Chord shows off better performance than existing Chord especially in case of high departure rate of nodes.

A Distributed Spatial Indexing Technique based on Hilbert Curve and MBR for k-NN Query Processing in a Single Broadcast Channel Environment (단일방송채널환경에서 k-최근접질의 처리를 위한 힐버트 곡선과 최소영역 사각형 기반의 분산 공간 인덱싱 기법)

  • Yi, Jung-Hyung;Jung, Sung-Won
    • Journal of KIISE:Databases
    • /
    • v.37 no.4
    • /
    • pp.203-208
    • /
    • 2010
  • This paper deals with an efficient index scheduling technique based on Hilbert curve and MBR for k-NN query in a single wireless broadcast channel environment. Previous works have two major problems. One is that they need a long time to process queries due to the back-tracking problem. The other is that they have to download too many spatial data since they can not reduce search space rapidly. Our proposed method broadcasts spatial data based on Hilbert curve order where a distributed index table is also broadcast with each spatial data. Each entry of index table represents the MBR which groups spatial data. By predicting the unknown location of spatial data, our proposed index scheme allows mobile clients to remove unnecessary data and to reduce search space rapidly. As a result, our method gives the decreased tuning time and access latency.

A Virtual Microscope System for Educational Applications (교육 분야 응용을 위한 가상 현미경 시스템)

  • Cho, Seung-Ho;Beynon, Mike;Saltz, Joel
    • The KIPS Transactions:PartD
    • /
    • v.10D no.1
    • /
    • pp.117-124
    • /
    • 2003
  • The system implemented in this paper partitions and stores specimen data captured by a light microscope on distributed or parallel systems. Users ran observe images on computers as we use a physical microscope. Based on the client-server computing model, the system consists of client, coordinator, and data manager. Three components communicate messages. For retrieving images, we implemented the client program with necessary functions for educational applications such at image mark and text annotation, and defined the communication protocol. We performed the experiment for introducing a tape storage which stores a large volume of data. The experiment results showed performance improvement by data partitioning and indexing technique.