• Title/Summary/Keyword: search similarity

Search Result 537, Processing Time 0.029 seconds

PdR-Tree : An Efficient Indexing Technique for the improvement of search performance in High-Dimensional Data (PdR-트리 : 고차원 데이터의 검색 성능 향상을 위한 효율적인 인덱스 기법)

  • Joh, Beom-Seok;Park, Young-Bae
    • The KIPS Transactions:PartD
    • /
    • v.8D no.2
    • /
    • pp.145-153
    • /
    • 2001
  • The Pyramid-Technique is based on mapping n-dimensional space data into one-dimensional data and expressing it as B-tree ; and by solving the problem of search time complexity the pyramid technique also prevents the effect \"phenomenon of dimensional curse\" which is caused by treatment of hypercube range query in n-dimensional data space. The Spherical Pyramid-Technique applies the pyramid method’s space division strategy, uses spherical range query and improves the search performance to make it suitable for similarity search. However, depending on the size of data and change in dimensions, the two above technique demonstrate significantly inferior search performance for data sizes greater than one million and dimensions greater than sixteen. In this paper, we propose a new index-structured PdR-Tree to improve the search performance for high dimensional data such as multimedia data. Test results using simulation data as well as real data demonstrate that PdR-Tree surpasses both the Pyramid-Technique and Spherical Pyramid-Technique in terms of search performance.

  • PDF

Dynamic Classification of Categories in Web Search Environment (웹 검색 환경에서 범주의 동적인 분류)

  • Choi Bum-Ghi;Lee Ju-Hong;Park Sun
    • Journal of KIISE:Software and Applications
    • /
    • v.33 no.7
    • /
    • pp.646-654
    • /
    • 2006
  • Directory searching and index searching methods are two main methods in web search engines. Both of the methods are applied to most of the well-known Internet search engines, which enable users to choose the other method if they are not satisfied with results shown by one method. That is, Index searching tends to come up with too many search results, while directory searching has a difficulty in selecting proper categories, frequently mislead to false ones. In this paper, we propose a novel method in which a category hierarchy is dynamically constructed. To do this, a category is regarded as a fuzzy set which includes keywords. Similarly extensible subcategories of a category can be found using fuzzy relational products. The merit of this method is to enhance the recall rate of directory search by expanding subcategories on the basis of similarity.

Phytosociological Study and Spatial autocorrelation on the Forest Vegetation of Mt. Yeonae at Gijang-gun

  • Choi, Byoung-Ki;Huh, Man Kyu
    • Journal of Environmental Science International
    • /
    • v.22 no.11
    • /
    • pp.1373-1381
    • /
    • 2013
  • Mt. Yeonae is at Gijang-gun in Busan and is surrounded by farming lands on three sides. The search for the species composition and dynamics of local communities were studied at Mt. Yeonae of how spatial similarity decays with geographic distance. The index values of Z$\ddot{u}$rich-Montpellier School's phytosociology at the 12 plots was compared to a distribution of similarly using 20 m quadrates at 12 sites. The specific communities were five including Pinus densiflora - Quercus variabilis community. Six species were significant similarity between neighboring sites by using the spatial autocorrelation coefficient, Moran's I. If Mt. Yeonae was destroyed by an artificial action, some spatial correlated species such as P. densiflora and Q. variabilis will be collapsed because of no maintaining the effective population sizes.

A Study on the Performance of Structured Document Retrieval Using Node Information (노드정보를 이용한 문서검색의 성능에 관한 연구)

  • Yoon, So-Young
    • Journal of the Korean Society for information Management
    • /
    • v.24 no.1 s.63
    • /
    • pp.103-120
    • /
    • 2007
  • Node is the semantic unit and a part of structured document. Information retrieval from structured documents offers an opportunity to go subdivided below the document level in search of relevant information, making any element in an structured document a retrievable unit. The node-based document retrieval constitutes several similarity calculating methods and the extended node retrieval method using structure information. Retrieval performance is hardly influenced by the methods for determining document similarity The extended node method outperformed the others as a whole.

Functional Analysis of ESTs from the Flower Bud of Korean Ginseng

  • Yang, Deok-Chun;In, Jun-Gyo;Kim, Moo-Sung;Jeon, Jong-Seong
    • Proceedings of the Plant Resources Society of Korea Conference
    • /
    • 2003.04a
    • /
    • pp.124-124
    • /
    • 2003
  • In order to study gene expression in a reproductive organ, we constructed a cDNA library of immature flower buds in Korean ginseng and generated expressed sequence tags (ESTs) of 3,360 clones randomly selected. The ESTs could be clustered into 1,844 non-redundant groups. Similarity search of the non-redundant ESTs against public non-redundant databases of both protein and DNA indicated that 1,254 groups show similarity to genes of known function. These ESTs clones were divided into sixteen categories depending upon gene function. The most abundant transcripts were unknown protein (72), chlorophyll a/b-binding protein (48), and stylar glycoprotein. There are no useful informations of gene expression during the development of flower bud in Korean ginseng. These results could help to understand the development of flower bud in Korean ginseng.

  • PDF

Integrating Spatial Proximity with Manifold Learning for Hyperspectral Data

  • Kim, Won-Kook;Crawford, Melba M.;Lee, Sang-Hoon
    • Korean Journal of Remote Sensing
    • /
    • v.26 no.6
    • /
    • pp.693-703
    • /
    • 2010
  • High spectral resolution of hyperspectral data enables analysis of complex natural phenomena that is reflected on the data nonlinearly. Although many manifold learning methods have been developed for such problems, most methods do not consider the spatial correlation between samples that is inherent and useful in remote sensing data. We propose a manifold learning method which directly combines the spatial proximity and the spectral similarity through kernel PCA framework. A gain factor caused by spatial proximity is first modelled with a heat kernel, and is added to the original similarity computed from the spectral values of a pair of samples. Parameters are tuned with intelligent grid search (IGS) method for the derived manifold coordinates to achieve optimal classification accuracies. Of particular interest is its performance with small training size, because labelled samples are usually scarce due to its high acquisition cost. The proposed spatial kernel PCA (KPCA) is compared with PCA in terms of classification accuracy with the nearest-neighbourhood classification method.

Deep Learning and Color Histogram based Fire and Smoke Detection Research

  • Lee, Yeunghak;Shim, Jaechang
    • International journal of advanced smart convergence
    • /
    • v.8 no.2
    • /
    • pp.116-125
    • /
    • 2019
  • The fire should extinguish as soon as possible because it causes economic loss and loses precious life. In this study, we propose a new atypical fire and smoke detection algorithm using deep learning and color histogram of fire and smoke. First, input frame images obtain from the ONVIF surveillance camera mounted in factory search motion candidate frame by motion detection algorithm and mean square error (MSE). Second deep learning (Faster R-CNN) is used to extract the fire and smoke candidate area of motion frame. Third, we apply a novel algorithm to detect the fire and smoke using color histogram algorithm with local area motion, similarity, and MSE. In this study, we developed a novel fire and smoke detection algorithm applied the local motion and color histogram method. Experimental results show that the surveillance camera with the proposed algorithm showed good fire and smoke detection results with very few false positives.

Implementation of a Spam Message Filtering System using Sentence Similarity Measurements (문장유사도 측정 기법을 통한 스팸 필터링 시스템 구현)

  • Ou, SooBin;Lee, Jongwoo
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.1
    • /
    • pp.57-64
    • /
    • 2017
  • Short message service (SMS) is one of the most important communication methods for people who use mobile phones. However, illegal advertising spam messages exploit people because they can be used without the need for friend registration. Recently, spam message filtering systems that use machine learning have been developed, but they have some disadvantages such as requiring many calculations. In this paper, we implemented a spam message filtering system using the set-based POI search algorithm and sentence similarity without servers. This algorithm can judge whether the input query is a spam message or not using only letter composition without any server computing. Therefore, we can filter the spam message although the input text message has been intentionally modified. We added a specific preprocessing option which aims to enable spam filtering. Based on the experimental results, we observe that our spam message filtering system shows better performance than the original set-based POI search algorithm. We evaluate the proposed system through extensive simulation. According to the simulation results, the proposed system can filter the text message and show high accuracy performance against the text message which cannot be filtered by the 3 major telecom companies.

Spherical Pyramid-Technique : An Efficient Indexing Technique for Similarity Search in High-Dimensional Data (구형 피라미드 기법 : 고차원 데이터의 유사성 검색을 위한 효율적인 색인 기법)

  • Lee, Dong-Ho;Jeong, Jin-Wan;Kim, Hyeong-Ju
    • Journal of KIISE:Software and Applications
    • /
    • v.26 no.11
    • /
    • pp.1270-1281
    • /
    • 1999
  • 피라미드 기법 1 은 d-차원의 공간을 2d개의 피라미드들로 분할하는 특별한 공간 분할 방식을 이용하여 고차원 데이타를 효율적으로 색인할 수 있는 새로운 색인 방법으로 제안되었다. 피라미드 기법은 고차원 사각형 형태의 영역 질의에는 효율적이나, 유사성 검색에 많이 사용되는 고차원 구형태의 영역 질의에는 비효율적인 면이 존재한다. 본 논문에서는 고차원 데이타를 많이 사용하는 유사성 검색에 효율적인 새로운 색인 기법으로 구형 피라미드 기법을 제안한다. 구형 피라미드 기법은 먼저 d-차원의 공간을 2d개의 구형 피라미드로 분할하고, 각 단일 구형 피라미드를 다시 구형태의 조각으로 분할하는 특별한 공간 분할 방법에 기반하고 있다. 이러한 공간 분할 방식은 피라미드 기법과 마찬가지로 d-차원 공간을 1-차원 공간으로 변환할 수 있다. 따라서, 변환된 1-차원 데이타를 다루기 위하여 B+-트리를 사용할 수 있다. 본 논문에서는 이렇게 분할된 공간에서 고차원 구형태의 영역 질의를 효율적으로 처리할 수 있는 알고리즘을 제안한다. 마지막으로, 인위적 데이타와 실제 데이타를 사용한 다양한 실험을 통하여 구형 피라미드 기법이 구형태의 영역 질의를 처리하는데 있어서 기존의 피라미드 기법보다 효율적임을 보인다.Abstract The Pyramid-Technique 1 was proposed as a new indexing method for high- dimensional data spaces using a special partitioning strategy that divides d-dimensional space into 2d pyramids. It is efficient for hypercube range query, but is not efficient for hypersphere range query which is frequently used in similarity search. In this paper, we propose the Spherical Pyramid-Technique, an efficient indexing method for similarity search in high-dimensional space. The Spherical Pyramid-Technique is based on a special partitioning strategy, which is to divide the d-dimensional data space first into 2d spherical pyramids, and then cut the single spherical pyramid into several spherical slices. This partition provides a transformation of d-dimensional space into 1-dimensional space as the Pyramid-Technique does. Thus, we are able to use a B+-tree to manage the transformed 1-dimensional data. We also propose the algorithm of processing hypersphere range query on the space partitioned by this partitioning strategy. Finally, we show that the Spherical Pyramid-Technique clearly outperforms the Pyramid-Technique in processing hypersphere range queries through various experiments using synthetic and real data.

Layout Analysis for Calculation of Web Page Similarity as Image

  • Mitsuhashi, Noriaki;Yamaguchi, Toru;Takama, Yasufumi
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2003.09a
    • /
    • pp.142-145
    • /
    • 2003
  • When we search information on the Web using search engines, they only analyze the text information collected from the source files of Web pages. However, there is a limit to analyze the layout of a Web page only from its source file, although Web page design is the most important factor for a user to estimate a page. In particular it often happens on the Web that the pages of similar design ofter similar information. We propose a method to analyze layout for comparing the design of pages by treating the displayed page as image.

  • PDF