• Title/Summary/Keyword: search similarity

Search Result 537, Processing Time 0.032 seconds

Hierarchical Organization of Embryo Data for Supporting Efficient Search (배아 데이터의 효율적 검색을 위한 계층적 구조화 방법)

  • Won, Jung-Im;Oh, Hyun-Kyo;Jang, Min-Hee;Kim, Sang-Wook
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.48 no.2
    • /
    • pp.16-27
    • /
    • 2011
  • Embryo is a very early stage of the development of multicellular organism such as animals and plants. It is an important research target for studying ontogeny because the fundamental body system of multicellular organism is determined during an embryo state. Researchers in the developmental biology have a large volume of embryo image databases for studying embryos and they frequently search for an embryo image efficiently from those databases. Thus, it is crucial to organize databases for their efficient search. Hierarchical clustering methods have been widely used for database organization. However, most of previous algorithms tend to produce a highly skewed tree as a result of clustering because they do not simultaneously consider both the size of a cluster and the number of objects within the cluster. The skewed tree requires much time to be traversed in users' search process. In this paper, we propose a method that effectively organizes a large volume of embryo image data in a balanced tree structure. We first represent embryo image data as a similarity-based graph. Next, we identify clusters by performing a graph partitioning algorithm repeatedly. We check constantly the size of a cluster and the number of objects, and partition clusters whose size is too large or whose number of objects is too high, which prevents clusters from growing too large or having too many objects. We show the superiority of the proposed method by extensive experiments. Moreover, we implement the visualization tool to help users quickly and easily navigate the embryo image database.

Semantic Search and Recommendation of e-Catalog Documents through Concept Network (개념 망을 통한 전자 카탈로그의 시맨틱 검색 및 추천)

  • Lee, Jae-Won;Park, Sung-Chan;Lee, Sang-Keun;Park, Jae-Hui;Kim, Han-Joon;Lee, Sang-Goo
    • The Journal of Society for e-Business Studies
    • /
    • v.15 no.3
    • /
    • pp.131-145
    • /
    • 2010
  • Until now, popular paradigms to provide e-catalog documents that are adapted to users' needs are keyword search or collaborative filtering based recommendation. Since users' queries are too short to represent what users want, it is hard to provide the users with e-catalog documents that are adapted to their needs(i.e., queries and preferences). Although various techniques have beenproposed to overcome this problem, they are based on index term matching. A conventional Bayesian belief network-based approach represents the users' needs and e-catalog documents with their corresponding concepts. However, since the concepts are the index terms that are extracted from the e-catalog documents, it is hard to represent relationships between concepts. In our work, we extend the conventional Bayesian belief network based approach to represent users' needs and e-catalog documents with a concept network which is derived from the Web directory. By exploiting the concept network, it is possible to search conceptually relevant e-catalog documents although they do not contain the index terms of queries. Furthermore, by computing the conceptual similarity between users, we can exploit a semantic collaborative filtering technique for recommending e-catalog documents.

Clustering of Web Objects with Similar Popularity Trends (유사한 인기도 추세를 갖는 웹 객체들의 클러스터링)

  • Loh, Woong-Kee
    • The KIPS Transactions:PartD
    • /
    • v.15D no.4
    • /
    • pp.485-494
    • /
    • 2008
  • Huge amounts of various web items such as keywords, images, and web pages are being made widely available on the Web. The popularities of such web items continuously change over time, and mining temporal patterns in popularities of web items is an important problem that is useful for several web applications. For example, the temporal patterns in popularities of search keywords help web search enterprises predict future popular keywords, enabling them to make price decisions when marketing search keywords to advertisers. However, presence of millions of web items makes it difficult to scale up previous techniques for this problem. This paper proposes an efficient method for mining temporal patterns in popularities of web items. We treat the popularities of web items as time-series, and propose gapmeasure to quantify the similarity between the popularities of two web items. To reduce the computation overhead for this measure, an efficient method using the Fast Fourier Transform (FFT) is presented. We assume that the popularities of web items are not necessarily following any probabilistic distribution or periodic. For finding clusters of web items with similar popularity trends, we propose to use a density-based clustering algorithm based on the gap measure. Our experiments using the popularity trends of search keywords obtained from the Google Trends web site illustrate the scalability and usefulness of the proposed approach in real-world applications.

Impact of Diverse Document-evaluation Measure-based Searching Methods in Big Data Search Accuracy (빅데이터 검색 정확도에 미치는 다양한 측정 방법 기반 검색 기법의 효과)

  • Kim, Ji young;Han, DaHyeon;Kim, Jongkwon
    • Journal of KIISE
    • /
    • v.44 no.5
    • /
    • pp.553-558
    • /
    • 2017
  • With the rapid growth of Big Data, research on extracting meaningful information is being pursued by both academia and industry. Especially, data characteristics derived from analysis, and researcher intention are key factors for search algorithms to obtain accurate output. Therefore, reflecting both data characteristics and researcher intention properly is the final goal of data analysis research. The data analyzed properly can help users to increase loyalty to the service provided by company, and to utilize information more effectively and efficiently. In this paper, we explore various methods of document-evaluation, so that we can improve the accuracy of searching article one of the most frequently searches used in real life. We also analyze the experiment result, and suggest the proper manners to use various methods.

Matching Method between Heterogeneous Data for Semantic Search (시맨틱 검색을 위한 이기종 데이터간의 매칭방법)

  • Lee, Ki-Jung;WhangBo, Taeg-Keun
    • The Journal of the Korea Contents Association
    • /
    • v.6 no.10
    • /
    • pp.25-33
    • /
    • 2006
  • For semantic retrieval in semantic web environment, it is an important factor to manage and manipulate distributed resources. Ontology is essential for efficient search in distributed resources, but it is almost impossible to construct an unified ontology for all distributed resources in the web. In this paper, we assumed that most information in the web environment exist in the form of RDBMS, and propose a matching method between domain ontology and the existing RDBMS tables for semantic retrieval. Most previous studies about matching between RDBMS tables and domain ontology have extracted a local ontology from RDBMS tables at first, and conducted the matching between the local ontology and domain ontology. However in the processing of extracting a local ontology, some problems such as losing domain information can be occurred since its correlation with domain ontology has not been considered at all. In this paper, we propose a methods to prevent the loss of domain information through the similarity measure between instances of RDBMS tables and instances of ontology. And using the relational information between RDBMS tables and the relational information between classes in domain ontology, more efficient instance-based matching becomes possible.

  • PDF

Genealogy grouping for services of message post-office box based on fuzzy-filtering (퍼지필터링 기반의 메시지 사서함 서비스를 위한 genealogy 그룹화)

  • Lee Chong-Deuk;Ahn Jeong-Yong
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.15 no.6
    • /
    • pp.701-708
    • /
    • 2005
  • Structuring mechanism, important to serve messages in post-office box structure, is to construct the hierarchy of classes according to the contents of message objects. This Paper Proposes $\alpha$-cut based genealogy grouping method to cluster a lot of structured objects in application domain. The proposed method decides the relationship first by semantic similarity relation and fuzzy relation, and then performs the grouping by operations of search( ), insert() and hierarchy(). This hierarchy structure makes it easy to process group-related processing tasks such as answering queries, discriminating objects, finding similarities among objects, etc. The proposed post-office box structure may be efficiently used to serve and manage message objects by the creation of groups. The Proposed method is tested for 5500 message objects and compared with other methods such as non-grouping, BGM, RGM, OGM.

Graph Representation by Medial Axis Transform Image for 3D Retrieval (3차원 영상 검색을 위한 중심축 변환에 의한 그래프 표현 기법)

  • Kim, Deok-Hun;Yun, Il-Dong;Lee, Sang-Uk
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.38 no.1
    • /
    • pp.33-42
    • /
    • 2001
  • Recently, the interests in the 3D image, generated from the range data and CAD, have exceedingly increased, accordingly a various 3D image database is being constructed. The efficient and fast scheme to access the desired image data is the important issue in the application area of the Internet and digital library. However, it is difficult to manage the 3D image database because of its huge size. Therefore, a proper descriptor is necessary to manage the data efficiently, including the content-based search. In this paper, the proposed shape descriptor is based on the voxelization of the 3D image. The medial axis transform, stemming from the mathematical morphology, is performed on the voxelized 3D image and the graph, which is composed of node and edge, is generated from skeletons. The generated graph is adequate to the novel shape descriptor due to no loss of geometric information and the similarity of the insight of the human. Therefore the proposed shape descriptor would be useful for the recognition of 3D object, compression, and content-based search.

  • PDF

Shape Retrieval using Curvature-based Morphological Graphs (굴곡 기반 형태 그래프를 이용한 모양 검색)

  • Bang, Nan-Hyo;Um, Ky-Hyun
    • Journal of KIISE:Databases
    • /
    • v.32 no.5
    • /
    • pp.498-508
    • /
    • 2005
  • A shape data is used one oi most important feature for image retrieval as data to reflect meaning of image. Especially, structural feature of shape is widely studied because it represents primitive properties of shape and relation information between basic units well. However, most structural features of shape have the problem that it is not able to guarantee an efficient search time because the features are expressed as graph or tree. In order to solve this problem, we generate curvature-based morphological graph, End design key to cluster shapes from this graph. Proposed this graph have contour features and morphological features of a shape. Shape retrieval is accomplished by stages. We reduce a search space through clustering, and determine total similarity value through pattern matching of external curvature. Various experiments show that our approach reduces computational complexity and retrieval cost.

An Automated Technique for Illegal Site Detection using the Sequence of HTML Tags (HTML 태그 순서를 이용한 불법 사이트 탐지 자동화 기술)

  • Lee, Kiryong;Lee, Heejo
    • Journal of KIISE
    • /
    • v.43 no.10
    • /
    • pp.1173-1178
    • /
    • 2016
  • Since the introduction of BitTorrent protocol in 2001, everything can be downloaded through file sharing, including music, movies and software. As a result, the copyright holder suffers from illegal sharing of copyright content. In order to solve this problem, countries have enacted illegal share related law; and internet service providers block pirate sites. However, illegal sites such as pirate bay easily reopen the site by changing the domain name. Thus, we propose a technique to easily detect pirate sites that are reopened. This automated technique collects the domain names using the google search engine, and measures similarity using Longest Common Subsequence (LCS) algorithm by comparing the tag structure of the source web page and reopened web page. For evaluation, we colledted 2,383 domains from google search. Experimental results indicated detection of a total of 44 pirate sites for collected domains when applying LCS algorithm. In addition, this technique detected 23 pirate sites for 805 domains when applied to foreign pirate sites. This experiment facilitated easy detection of the reopened pirate sites using an automated detection system.

Ranked Web Service Retrieval by Keyword Search (키워드 질의를 이용한 순위화된 웹 서비스 검색 기법)

  • Lee, Kyong-Ha;Lee, Kyu-Chul;Kim, Kyong-Ok
    • The Journal of Society for e-Business Studies
    • /
    • v.13 no.2
    • /
    • pp.213-223
    • /
    • 2008
  • The efficient discovery of services from a large scale collection of services has become an important issue[7, 24]. We studied a syntactic method for Web service discovery, rather than a semantic method. We regarded a service discovery as a retrieval problem on the proprietary XML formats, which were service descriptions in a registry DB. We modeled services and queries as probabilistic values and devised similarity-based retrieval techniques. The benefits of our way are follows. First, our system supports ranked service retrieval by keyword search. Second, we considers both of UDDI data and WSDL definitions of services amid query evaluation time. Last, our technique can be easily implemented on the off-theshelf DBMS and also utilize good features of DBMS maintenance.

  • PDF