• Title/Summary/Keyword: search similarity

Search Result 535, Processing Time 0.025 seconds

Semantic Clustering Model for Analytical Classification of Documents in Cloud Environment (클라우드 환경에서 문서의 유형 분류를 위한 시맨틱 클러스터링 모델)

  • Kim, Young Soo;Lee, Byoung Yup
    • The Journal of the Korea Contents Association
    • /
    • v.17 no.11
    • /
    • pp.389-397
    • /
    • 2017
  • Recently semantic web document is produced and added in repository in a cloud computing environment and requires an intelligent semantic agent for analytical classification of documents and information retrieval. The traditional methods of information retrieval uses keyword for query and delivers a document list returned by the search. Users carry a heavy workload for examination of contents because a former method of the information retrieval don't provide a lot of semantic similarity information. To solve these problems, we suggest a key word frequency and concept matching based semantic clustering model using hadoop and NoSQL to improve classification accuracy of the similarity. Implementation of our suggested technique in a cloud computing environment offers the ability to classify and discover similar document with improved accuracy of the classification. This suggested model is expected to be use in the semantic web retrieval system construction that can make it more flexible in retrieving proper document.

Multi-class Support Vector Machines Model Based Clustering for Hierarchical Document Categorization in Big Data Environment (빅 데이터 환경에서 계층적 문서 유형 분류를 위한 클러스터링 기반 다중 SVM 모델)

  • Kim, Young Soo;Lee, Byoung Yup
    • The Journal of the Korea Contents Association
    • /
    • v.17 no.11
    • /
    • pp.600-608
    • /
    • 2017
  • Recently data growth rates are growing exponentially according to the rapid expansion of internet. Since users need some of all the information, they carry a heavy workload for examination and discovery of the necessary contents. Therefore information retrieval must provide hierarchical class information and the priority of examination through the evaluation of similarity on query and documents. In this paper we propose an Multi-class support vector machines model based clustering for hierarchical document categorization that make semantic search possible considering the word co-occurrence measures. A combination of hierarchical document categorization and SVM classifier gives high performance for analytical classification of web documents that increase exponentially according to extension of document hierarchy. More information retrieval systems are expected to use our proposed model in their developments and can perform a accurate and rapid information retrieval service.

App Recommendation Based on Characteristic Similarity (특성 유사도 기반 앱 추천)

  • Kim, Hyung-Il
    • Journal of Digital Contents Society
    • /
    • v.13 no.4
    • /
    • pp.559-565
    • /
    • 2012
  • The remarkable development of IT is contributed to popularization of smart phones, which in turn creates a new domain called app store. Smartphone apps have grown fast because they can be easily purchased through an app store. As the volume of apps traded in app stores is so huge that it is extremely hard for users to find the exact app they want. In general, an app store recommends an app to users based on the search words they entered. In terms of recommendation of app, this kind of content-based method is not effective. To increase accuracy in recommending app, this paper proposes a characteristic similarity-based app recommendation method. This method creates attributes on the app based on the related information such as genre, functionality and number of downloads and then compares them with the propensity to use the app. According to diverse simulations, the method proposed in this paper improved the performance of app recommendation by 33% in average, compared to the conventional method.

Digital Image Watermarking using Subband Correlation Wavelet Domain (웨이블릿 영역의 부대역 상관도를 이용한 디지털 영상 워터마킹)

  • 서영호;박진영;김동욱
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.40 no.6
    • /
    • pp.51-60
    • /
    • 2003
  • The watermarking is the technique that embeds or extracts the certain data without the change of the original data for the copyright protection of the multimedia contents. Watermark-embedded contents must not be distinguished by human's eye and must be robust to the various image processing and the intentional distortions. In this paper, we propose a new watermarking technique applied in the wavelet domain which has both the spatial and frequency information of a image. For both the robustness and the invisibility, the positions for embedding the watermark is selected with the multi-threshold. We search the similarity between highly correlated coefficients in the each subband and decide the mark space after verifying the significance in the specified subband. The similarity is represented by the coefficient difference between the subbands and its distribution is used in the watermark embedding and extracting. The embedded watermark can be extracted without the original image using the relationship of the subbands. By these properties the proposed watermarking algorithm has the invisibility and the robustness to the attacks such as JPEG compression and the general image processing.

Determining on Model-based Clusters of Time Series Data (시계열데이터의 모델기반 클러스터 결정)

  • Jeon, Jin-Ho;Lee, Gye-Sung
    • The Journal of the Korea Contents Association
    • /
    • v.7 no.6
    • /
    • pp.22-30
    • /
    • 2007
  • Most real word systems such as world economy, stock market, and medical applications, contain a series of dynamic and complex phenomena. One of common methods to understand these systems is to build a model and analyze the behavior of the system. In this paper, we investigated methods for best clustering over time series data. As a first step for clustering, BIC (Bayesian Information Criterion) approximation is used to determine the number of clusters. A search technique to improve clustering efficiency is also suggested by analyzing the relationship between data size and BIC values. For clustering, two methods, model-based and similarity based methods, are analyzed and compared. A number of experiments have been performed to check its validity using real data(stock price). BIC approximation measure has been confirmed that it suggests best number of clusters through experiments provided that the number of data is relatively large. It is also confirmed that the model-based clustering produces more reliable clustering than similarity based ones.

Image Information Retrieval Using DTW(Dynamic Time Warping) (DTW(Dynamic Time Warping)를 이용한 영상 정보 검색)

  • Ha, Jeong-Yo;Lee, Na-Young;Kim, Gye-Young;Choi, Hyung-Il
    • Journal of Digital Contents Society
    • /
    • v.10 no.3
    • /
    • pp.423-431
    • /
    • 2009
  • There are various image retrieval methods using shape, color and texture features. One of the most active area is using shape and color information. A number of shape representations have been suggested to recognize shapes even under affine transformation. There are many kinds of method for shape recognition, the well-known method is Fourier descriptors and moment invariant. The other method is CSS(Curvature Scale Space). The maxima of curvature scale space image have already been used to represent 2-D shapes in different applications. Because preexistence CSS exists several problems, in this paper we use improved CSS method for retrieval image. There are two kinds of method, One is using RGB color information feature and the other is using HSI color information feature. In this paper we used HSI color model to represent color histogram before, then use it as comparison measure. The similarity is measured by using Euclidean distance and for reduce search time and accuracy, We use DTW for measure similarity. Compare with the result of using Euclidean distance, we can find efficiency elevated.

  • PDF

Isolation of Sesquiterpene Synthase Homolog from Panax ginseng C.A. Meyer

  • Khorolragchaa, Altanzul;Parvin, Shohana;Shim, Ju-Sun;Kim, Yu-Jin;Lee, Ok-Ran;In, Jun-Gyo;Kim, Yeon-Ju;Kim, Se-Young;Yang, Deok-Chun
    • Journal of Ginseng Research
    • /
    • v.34 no.1
    • /
    • pp.17-22
    • /
    • 2010
  • Sesquiterpenes are found naturally in plants and insects as defensive agents or pheromones. They are produced in the cytosolic acetate/mevalonate pathway for isoprenoid biosynthesis. The inducible sesquiterpene synthases (STS), which are responsible for the transformation of the precursor farnesyl diphosphate, appear to generate very few olefinic products that are converted to biologically active metabolites. In this study, we isolated the STS gene from Panax ginseng C.A. Meyer, designated PgSTS, and investigated the correlation between its expression and various abiotic stresses using real-time PCR. PgSTS cDNA was observed to be 1,883 nucleotides long with an open reading frame of 1,707 bp, encoding a protein of 568 amino acids. The molecular mass of the mature protein was determined to be 65.5 kDa, with a predicted isoelectric point of 5.98. A GenBank BlastX search revealed the deduced amino acid sequence of PgSTS to be homologous to STS from other plants, with the highest similarity to an STS from Lycopersicon hirsutum (55% identity, 51% similarity). Real-time PCR analysis showed that different abiotic stresses triggered significant induction of PgSTS expression at different time points.

Feature-Based Image Retrieval using SOM-Based R*-Tree

  • Shin, Min-Hwa;Kwon, Chang-Hee;Bae, Sang-Hyun
    • Proceedings of the KAIS Fall Conference
    • /
    • 2003.11a
    • /
    • pp.223-230
    • /
    • 2003
  • Feature-based similarity retrieval has become an important research issue in multimedia database systems. The features of multimedia data are useful for discriminating between multimedia objects (e 'g', documents, images, video, music score, etc.). For example, images are represented by their color histograms, texture vectors, and shape descriptors, and are usually high-dimensional data. The performance of conventional multidimensional data structures(e'g', R- Tree family, K-D-B tree, grid file, TV-tree) tends to deteriorate as the number of dimensions of feature vectors increases. The R*-tree is the most successful variant of the R-tree. In this paper, we propose a SOM-based R*-tree as a new indexing method for high-dimensional feature vectors.The SOM-based R*-tree combines SOM and R*-tree to achieve search performance more scalable to high dimensionalities. Self-Organizing Maps (SOMs) provide mapping from high-dimensional feature vectors onto a two dimensional space. The mapping preserves the topology of the feature vectors. The map is called a topological of the feature map, and preserves the mutual relationship (similarity) in the feature spaces of input data, clustering mutually similar feature vectors in neighboring nodes. Each node of the topological feature map holds a codebook vector. A best-matching-image-list. (BMIL) holds similar images that are closest to each codebook vector. In a topological feature map, there are empty nodes in which no image is classified. When we build an R*-tree, we use codebook vectors of topological feature map which eliminates the empty nodes that cause unnecessary disk access and degrade retrieval performance. We experimentally compare the retrieval time cost of a SOM-based R*-tree with that of an SOM and an R*-tree using color feature vectors extracted from 40, 000 images. The result show that the SOM-based R*-tree outperforms both the SOM and R*-tree due to the reduction of the number of nodes required to build R*-tree and retrieval time cost.

  • PDF

Hierarchical Organization of Embryo Data for Supporting Efficient Search (배아 데이터의 효율적 검색을 위한 계층적 구조화 방법)

  • Won, Jung-Im;Oh, Hyun-Kyo;Jang, Min-Hee;Kim, Sang-Wook
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.48 no.2
    • /
    • pp.16-27
    • /
    • 2011
  • Embryo is a very early stage of the development of multicellular organism such as animals and plants. It is an important research target for studying ontogeny because the fundamental body system of multicellular organism is determined during an embryo state. Researchers in the developmental biology have a large volume of embryo image databases for studying embryos and they frequently search for an embryo image efficiently from those databases. Thus, it is crucial to organize databases for their efficient search. Hierarchical clustering methods have been widely used for database organization. However, most of previous algorithms tend to produce a highly skewed tree as a result of clustering because they do not simultaneously consider both the size of a cluster and the number of objects within the cluster. The skewed tree requires much time to be traversed in users' search process. In this paper, we propose a method that effectively organizes a large volume of embryo image data in a balanced tree structure. We first represent embryo image data as a similarity-based graph. Next, we identify clusters by performing a graph partitioning algorithm repeatedly. We check constantly the size of a cluster and the number of objects, and partition clusters whose size is too large or whose number of objects is too high, which prevents clusters from growing too large or having too many objects. We show the superiority of the proposed method by extensive experiments. Moreover, we implement the visualization tool to help users quickly and easily navigate the embryo image database.

Semantic Search and Recommendation of e-Catalog Documents through Concept Network (개념 망을 통한 전자 카탈로그의 시맨틱 검색 및 추천)

  • Lee, Jae-Won;Park, Sung-Chan;Lee, Sang-Keun;Park, Jae-Hui;Kim, Han-Joon;Lee, Sang-Goo
    • The Journal of Society for e-Business Studies
    • /
    • v.15 no.3
    • /
    • pp.131-145
    • /
    • 2010
  • Until now, popular paradigms to provide e-catalog documents that are adapted to users' needs are keyword search or collaborative filtering based recommendation. Since users' queries are too short to represent what users want, it is hard to provide the users with e-catalog documents that are adapted to their needs(i.e., queries and preferences). Although various techniques have beenproposed to overcome this problem, they are based on index term matching. A conventional Bayesian belief network-based approach represents the users' needs and e-catalog documents with their corresponding concepts. However, since the concepts are the index terms that are extracted from the e-catalog documents, it is hard to represent relationships between concepts. In our work, we extend the conventional Bayesian belief network based approach to represent users' needs and e-catalog documents with a concept network which is derived from the Web directory. By exploiting the concept network, it is possible to search conceptually relevant e-catalog documents although they do not contain the index terms of queries. Furthermore, by computing the conceptual similarity between users, we can exploit a semantic collaborative filtering technique for recommending e-catalog documents.