• Title/Summary/Keyword: search similarity

Search Result 530, Processing Time 0.028 seconds

Detecting Intentionally Biased Web Pages In terms of Hypertext Information (하이퍼텍스트 정보 관점에서 의도적으로 왜곡된 웹 페이지의 검출에 관한 연구)

  • Lee Woo Key
    • Journal of the Korea Society of Computer and Information
    • /
    • v.10 no.1 s.33
    • /
    • pp.59-66
    • /
    • 2005
  • The organization of the web is progressively more being used to improve search and analysis of information on the web as a large collection of heterogeneous documents. Most people begin at a Web search engine to find information. but the user's pertinent search results are often greatly diluted by irrelevant data or sometimes appear on target but still mislead the user in an unwanted direction. One of the intentional, sometimes vicious manipulations of Web databases is a intentionally biased web page like Google bombing that is based on the PageRank algorithm. one of many Web structuring techniques. In this thesis, we regard the World Wide Web as a directed labeled graph that Web pages represent nodes and link edges. In the Present work, we define the label of an edge as having a link context and a similarity measure between link context and target page. With this similarity, we can modify the transition matrix of the PageRank algorithm. By suggesting a motivating example, it is explained how our proposed algorithm can filter the Web intentionally biased web Pages effective about $60\%% rather than the conventional PageRank.

  • PDF

Development of a XML Web Services Retrieval Engine (XML 웹 서비스 검색 엔진의 개발)

  • Sohn, Seung-Beom;Oh, Il-Jin;Hwang, Yun-Young;Lee, Kyong-Ha;Lee, Kyu-Chul
    • Journal of Information Technology Applications and Management
    • /
    • v.13 no.4
    • /
    • pp.121-140
    • /
    • 2006
  • UDDI (Universal Discovery Description and Integration) Registry is used for Web Services registration and search. UDDI offers the search result to the keyword-based query. UDDI supports WSDL registration but it does not supports WSDL search. So it is required that contents based search and ranking using name and description in UDDI registration information and WSDL. This paper proposes a retrieval engine considering contents of services registered in the UDDI and WSDL. It uses Vector Space Model for similarity comparison between contents of those. UDDI registry information hierarchy and WSDL hierarchy are considered during searching process. This engine suppports two discovery methods. One is Keyword-based search and the other is template-based search supporting ranking for user's query. Template-based search offers how service interfaces correspond to the query for WSDL documents. Proposed retrieval engine can offer search result more accurately than one which UDDI offers and it can retrieve WSDL which is registered in UDDI in detail.

  • PDF

Query Space Exploration Using Genetic Algorithm

  • Lee, Jae-Hoon;Kim, Young-Cheon;Lee, Sung-Joo
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2003.09a
    • /
    • pp.683-689
    • /
    • 2003
  • Information retrieval must be able to search the most suitable document that user need from document set. If foretell document adaptedness by similarity degree about QL(Query Language) of document, documents that search person does not require are searched. In this paper, showed that can search the most suitable document on user's request searching document of the whole space using genetic algorithm and used knowledge-base operator to solve various model's problem.

  • PDF

Query Space Exploration Model Using Genetic Algorithm

  • Lee, Jae-Hoon;Lee, Sung-Joo
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.3 no.2
    • /
    • pp.222-226
    • /
    • 2003
  • Information retrieval must be able to search the most suitable document that user need from document set. If foretell document adaptedness by similarity degree about QL(Query Language) of document, documents that search person does not require are searched. In this paper, showed that can search the most suitable document on user's request searching document of the whole space using genetic algorithm and used knowledge-base operator to solve various model's problem.

Content based data search using semantic annotation (시맨틱 주석을 이용한 내용 기반 데이터 검색)

  • Kim, Byung-Gon;Oh, Sung-Kyun
    • Journal of Digital Contents Society
    • /
    • v.12 no.4
    • /
    • pp.429-436
    • /
    • 2011
  • Various documents, images, videos and other materials on the web has been increasing rapidly. Efficient search of those things has become an important topic. From keyword-based search, internet search has been transformed to semantic search which finds the implications and the relations between data elements. Many annotation processing systems manipulating the metadata for semantic search have been proposed. However, annotation data generated by different methods and forms are difficult to process integrated search between those systems. In this study, in order to resolve this problem, we categorized levels of many annotation documents, and we proposed the method to measure the similarity between the annotation documents. Similarity measure between annotation documents can be used for searching similar or related documents, images, and videos regardless of the forms of the source data.

An Improved Combined Content-similarity Approach for Optimizing Web Query Disambiguation

  • Kamal, Shahid;Ibrahim, Roliana;Ghani, Imran
    • Journal of Internet Computing and Services
    • /
    • v.16 no.6
    • /
    • pp.79-88
    • /
    • 2015
  • The web search engines are exposed to the issue of uncertainty because of ambiguous queries, being input for retrieving the accurate results. Ambiguous queries constitute a significant fraction of such instances and pose real challenges to web search engines. Moreover, web search has created an interest for the researchers to deal with search by considering context in terms of location perspective. Our proposed disambiguation approach is designed to improve user experience by using context in terms of location relevance with the document relevance. The aim is that providing the user a comprehensive location perspective of a topic is informative than retrieving a result that only contains temporal or context information. The capacity to use this information in a location manner can be, from a user perspective, potentially useful for several tasks, including user query understanding or clustering based on location. In order to carry out the approach, we developed a Java based prototype to derive the contextual information from the web results based on the queries from the well-known datasets. Among those results, queries are further classified in order to perform search in a broad way. After the result provision to users and the selection made by them, feedback is recorded implicitly to improve the web search based on contextual information. The experiment results demonstrate the outstanding performance of our approach in terms of precision 75%, accuracy 73%; recall 81% and f-measure 78% when compared with generic temporal evaluation approach and furthermore achieved precision 86%, accuracy 71%; recall 67% and f-measure 75% when compared with web document clustering approach.

A Study on the Performance of Music Retrieval Based on the Emotion Recognition (감정 인식을 통한 음악 검색 성능 분석)

  • Seo, Jin Soo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.34 no.3
    • /
    • pp.247-255
    • /
    • 2015
  • This paper presents a study on the performance of the music search based on the automatically recognized music-emotion labels. As in the other media data, such as speech, image, and video, a song can evoke certain emotions to the listeners. When people look for songs to listen, the emotions, evoked by songs, could be important points to consider. However; very little study has been done on the performance of the music-emotion labels to the music search. In this paper, we utilize the three axes of human music perception (valence, activity, tension) and the five basic emotion labels (happiness, sadness, tenderness, anger, fear) in measuring music similarity for music search. Experiments were conducted on both genre and singer datasets. The search accuracy of the proposed emotion-based music search was up to 75 % of that of the conventional feature-based music search. By combining the proposed emotion-based method with the feature-based method, we achieved up to 14 % improvement of search accuracy.

Shape-Based Subsequence Retrieval Supporting Multiple Models in Time-Series Databases (시계열 데이터베이스에서 복수의 모델을 지원하는 모양 기반 서브시퀀스 검색)

  • Won, Jung-Im;Yoon, Jee-Hee;Kim, Sang-Wook;Park, Sang-Hyun
    • The KIPS Transactions:PartD
    • /
    • v.10D no.4
    • /
    • pp.577-590
    • /
    • 2003
  • The shape-based retrieval is defined as the operation that searches for the (sub) sequences whose shapes are similar to that of a query sequence regardless of their actual element values. In this paper, we propose a similarity model suitable for shape-based retrieval and present an indexing method for supporting the similarity model. The proposed similarity model enables to retrieve similar shapes accurately by providing the combination of various shape-preserving transformations such as normalization, moving average, and time warping. Our indexing method stores every distinct subsequence concisely into the disk-based suffix tree for efficient and adaptive query processing. We allow the user to dynamically choose a similarity model suitable for a given application. More specifically, we allow the user to determine the parameter p of the distance function $L_p$ when submitting a query. The result of extensive experiments revealed that our approach not only successfully finds the subsequences whose shapes are similar to a query shape but also significantly outperforms the sequence search.

Method of Related Document Recommendation with Similarity and Weight of Keyword (키워드의 유사도와 가중치를 적용한 연관 문서 추천 방법)

  • Lim, Myung Jin;Kim, Jae Hyun;Shin, Ju Hyun
    • Journal of Korea Multimedia Society
    • /
    • v.22 no.11
    • /
    • pp.1313-1323
    • /
    • 2019
  • With the development of the Internet and the increase of smart phones, various services considering user convenience are increasing, so that users can check news in real time anytime and anywhere. However, online news is categorized by media and category, and it provides only a few related search terms, making it difficult to find related news related to keywords. In order to solve this problem, we propose a method to recommend related documents more accurately by applying Doc2Vec similarity to the specific keywords of news articles and weighting the title and contents of news articles. We collect news articles from Naver politics category by web crawling in Java environment, preprocess them, extract topics using LDA modeling, and find similarities using Doc2Vec. To supplement Doc2Vec, we apply TF-IDF to obtain TC(Title Contents) weights for the title and contents of news articles. Then we combine Doc2Vec similarity and TC weight to generate TC weight-similarity and evaluate the similarity between words using PMI technique to confirm the keyword association.

e-Cohesive Keyword based Arc Ranking Measure for Web Navigation (연관 웹 페이지 검색을 위한 e-아크 랭킹 메저)

  • Lee, Woo-Key;Lee, Byoung-Su
    • Journal of KIISE:Databases
    • /
    • v.36 no.1
    • /
    • pp.22-29
    • /
    • 2009
  • The World Wide Web has emerged as largest media which provides even a single user to market their products and publish desired information; on the other hand the user can access what kind of information abundantly enough as well. As a result web holds large amount of related information distributed over multiple web pages. The current search engines search for all the entered keywords in a single webpage and rank the resulting set of web pages as an answer to the user query. But this approach fails to retrieve the pair of web pages which contains more relevant information for users search. We introduce a new search paradigm which gives different weights to the query keywords according to their order of appearance. We propose a new arc weight measure that assigns more relevance to the pair of web pages with alternate keywords present so that the pair of web pages which contains related but distributed information can be presented to the user. Our measure proved to be effective on the similarity search in which the experimentation represented the e~arc ranking measure outperforming the conventional ones.