• Title/Summary/Keyword: Web search engines

Search Result 210, Processing Time 0.033 seconds

Ontology Supported Information Systems: A Review

  • Padmavathi, T.;Krishnamurthy, M.
    • Journal of Information Science Theory and Practice
    • /
    • v.2 no.4
    • /
    • pp.61-76
    • /
    • 2014
  • The exponential growth of information on the web far exceeds the capacity of present day information retrieval systems and search engines, making information integration on the web difficult. In order to overcome this, semantic web technologies were proposed by the World Wide Web Consortium (W3C) to achieve a higher degree of automation and precision in information retrieval systems. Semantic web, with its promise to deliver machine understanding to the traditional web, has attracted a significant amount of research from academia as well as from industries. Semantic web is an extension of the current web in which data can be shared and reused across the internet. RDF and ontology are two essential components of the semantic web architecture which support a common framework for data storage and representation of data semantics, respectively. Ontologies being the backbone of semantic web applications, it is more relevant to study various approaches in their application, usage, and integration into web services. In this article, an effort has been made to review the research work being undertaken in the area of design and development of ontology supported information systems. This paper also briefly explains the emerging semantic web technologies and standards.

Mining Search Keywords for Improving the Accuracy of Entity Search (엔터티 검색의 정확성을 높이기 위한 검색 키워드 마이닝)

  • Lee, Sun Ku;On, Byung-Won;Jung, Soo-Mok
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.9
    • /
    • pp.451-464
    • /
    • 2016
  • Nowadays, entity search such as Google Product Search and Yahoo Pipes has been in the spotlight. The entity search engines have been used to retrieve web pages relevant with a particular entity. However, if an entity (e.g., Chinatown movie) has various meanings (e.g., Chinatown movies, Chinatown restaurants, and Incheon Chinatown), then the accuracy of the search result will be decreased significantly. To address this problem, in this article, we propose a novel method that quantifies the importance of search queries and then offers the best query for the entity search, based on Frequent Pattern (FP)-Tree, considering the correlation between the entity relevance and the frequency of web pages. According to the experimental results presented in this paper, the proposed method (59% in the average precision) improved the accuracy five times, compared to the traditional query terms (less than 10% in the average precision).

Study for Blog Clustering Method Based on Similarity of Titles (주제 유사성 기반 클러스터링을 이용한 블로그 검색기법 연구)

  • Lee, Ki-Jun;Lee, Myung-Jin;Kim, Woo-Ju
    • Journal of Intelligence and Information Systems
    • /
    • v.15 no.2
    • /
    • pp.61-74
    • /
    • 2009
  • With an exponential growth of blogs, lots of important data have appeared on blogs. However, since main topics mentioned in blog pages are quite different from general web pages, there are problems which can't be solved by general search engines. Therefore, many researchers have studied searching methods only for blogs to help users who want to have useful information on blog. We also present a blog classifying method based on similarity of titles. First, we analyze blogs and blog search engines to find problems and solution of current blog search. Second, applying our similarity algorithm on blog titles, we discuss a way to develop clustering method only for blog. Finally, by making a prototype system of our algorithm, we evaluate our algorithm's effectiveness and show conclusion and future work. We expect this algorithm could add its power to current search engine.

  • PDF

Improvement Mechanism of Security Monitoring and Control Model Using Multiple Search Engines (다중 검색엔진을 활용한 보안관제 모델 개선방안)

  • Lee, Je-Kook;Jo, In-June
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.1
    • /
    • pp.284-291
    • /
    • 2021
  • As the current security monitoring system is operated as a passive system only for response after an attacker's attack, it is common to respond to intrusion incidents after an attack occurs. In particular, when new assets are added and actual services are performed, there is a limit to vulnerability testing and pre-defense from the point of view of an actual hacker. In this paper, a new security monitoring model has been proposed that uses multiple hacking-related search engines to add proactive vulnerability response functions of protected assets. In other words, using multiple search engines with general purpose or special purpose, special vulnerabilities of the assets to be protected are checked in advance, and the vulnerabilities of the assets that have appeared as a result of the check are removed in advance. In addition, the function of pre-checking the objective attack vulnerabilities of the protected assets recognized from the point of view of the actual hacker, and the function of discovering and removing a wide range of system-related vulnerabilities located in the IP band in advance were additionally presented.

Design and Implemention of Real-time web Crawling distributed monitoring system (실시간 웹 크롤링 분산 모니터링 시스템 설계 및 구현)

  • Kim, Yeong-A;Kim, Gea-Hee;Kim, Hyun-Ju;Kim, Chang-Geun
    • Journal of Convergence for Information Technology
    • /
    • v.9 no.1
    • /
    • pp.45-53
    • /
    • 2019
  • We face problems from excessive information served with websites in this rapidly changing information era. We find little information useful and much useless and spend a lot of time to select information needed. Many websites including search engines use web crawling in order to make data updated. Web crawling is usually used to generate copies of all the pages of visited sites. Search engines index the pages for faster searching. With regard to data collection for wholesale and order information changing in realtime, the keyword-oriented web data collection is not adequate. The alternative for selective collection of web information in realtime has not been suggested. In this paper, we propose a method of collecting information of restricted web sites by using Web crawling distributed monitoring system (R-WCMS) and estimating collection time through detailed analysis of data and storing them in parallel system. Experimental results show that web site information retrieval is applied to the proposed model, reducing the time of 15-17%.

The Evaluation of the Child-Care Web Sites on the Internet (인터넷 육아정보 제공 사이트에 대한 평가)

  • Han, Kyung-Ja;Kim, Jeong-Soo;Kim, Sook-Young
    • Child Health Nursing Research
    • /
    • v.12 no.1
    • /
    • pp.57-64
    • /
    • 2006
  • Purpose: This study was conducted to analyze web sites that provide child-care information and to provide a proper model for child-care web sites. Method: The evaluation tool with 23 items including purpose, contents, timeliness and reliability, interaction, and function was developed and modified. Quantitative analyses of 48 web sites, which were selected using popular search engines, were done. Result: 1) The aim of the web site was clearly shown for 24 sites (63.2%) and 17 sites (44.7%) provided the information for judging whether the informant was an expert. 2) Most web sites provided information on feeding, nutrition, and common health problems, and 11 sites provided information on care of problem behavior, but only 6 sites provided information on mother-infant interaction. 3) Timely information was provided on 21 sites, however none of the sites provided information sources. 4) Methods for contact the authors were found for 31 sites (81.6%) and 19 sites (50%) had active bulletin boards to receive opinions from users. 5) There were 32 sites where information could be found by clicking less than 3 times. Conclusion: We suggest that the evaluation criteria for child-care web sites used in this study is a tool that can be used to evaluate web sites with consistency, but there is a need for further study to develop standardization of the evaluating tool.

  • PDF

An analysis of user behaviors on the search engine results pages based on the demographic characteristics

  • Bitirim, Yiltan;Ertugrul, Duygu Celik
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.7
    • /
    • pp.2840-2861
    • /
    • 2020
  • The purpose of this survey-based study is to make an analysis of search engine users' behaviors on the Search Engine Results Pages (SERPs) based on the three demographic characteristics gender, age, and program studying. In this study, a questionnaire was designed with 12 closed-ended questions. Remaining questions other than the demographic characteristic related ones were about "tab", "advertisement", "spelling suggestion", "related query suggestion", "instant search suggestion", "video result", "image result", "pagination" and the amount of clicking results. The questionnaire was used and the data collected were analyzed with the descriptive statistics as well as the inferential statistics. 84.2% of the study population was reached. Some of the major results are as follows: Most of each demographic characteristic category (i.e. female, male, under-20, 20-24, above-24, English computer engineering, Turkish computer engineering, software engineering) have rarely or more click for tab, spelling suggestion, related query suggestion, instant search suggestion, video result, image result, and pagination. More than 50.0% of female category click advertisement rarely; however, for the others, 50.0% or more never click advertisement. For every demographic characteristic category, between 78.0% and 85.4% click 10 or fewer results. This study would be the first attempt with its complete content and design. Search engine providers and researchers would gain knowledge to user behaviors about the usage of the SERPs based on the demographic characteristics.

User Perceptions of Uncertainty in the Selection of Information Retrieval System: Implications for System and Service Improvement

  • Kim, Yang-Woo
    • International Journal of Contents
    • /
    • v.5 no.3
    • /
    • pp.40-49
    • /
    • 2009
  • While numerous studies have suggested the significance of uncertainty during the process of information-seeking, less research has investigated user uncertainty in the actual search process using a real system. This study investigated user perceptions of uncertainty in the process of the selection of information retrieval system in the real information-seeking process. Considering the role of commercial Web search engines as supplementary tools for traditional bibliographic databases in academic research environments, this study analyzed the selection behavior of scholarly researchers, who use such search tools for their academic study. The researchers were limited to the discipline of science in order to understand user perceptions in this field. The findings revealed various dimensions, types, and incidents of uncertainty. Variations appeared in different incidents of uncertainty relating to the unique characteristics of the subjects' information-seeking context. The identification of three principal origins of uncertainty based on the different types of uncertainty generated implications to improve information systems and services.

Main Content Extraction from Web Pages Based on Node Characteristics

  • Liu, Qingtang;Shao, Mingbo;Wu, Linjing;Zhao, Gang;Fan, Guilin;Li, Jun
    • Journal of Computing Science and Engineering
    • /
    • v.11 no.2
    • /
    • pp.39-48
    • /
    • 2017
  • Main content extraction of web pages is widely used in search engines, web content aggregation and mobile Internet browsing. However, a mass of irrelevant information such as advertisement, irrelevant navigation and trash information is included in web pages. Such irrelevant information reduces the efficiency of web content processing in content-based applications. The purpose of this paper is to propose an automatic main content extraction method of web pages. In this method, we use two indicators to describe characteristics of web pages: text density and hyperlink density. According to continuous distribution of similar content on a page, we use an estimation algorithm to judge if a node is a content node or a noisy node based on characteristics of the node and neighboring nodes. This algorithm enables us to filter advertisement nodes and irrelevant navigation. Experimental results on 10 news websites revealed that our algorithm could achieve a 96.34% average acceptable rate.

Analysis of Recipes for Korean Foods in Web Sites (레시피 관련 웹 사이트 중 한국음식 레시피의 자료 분석 및 검토)

  • Yun, Mi-Ok;Mun, Hyeon-Gyeong
    • Journal of the Korean Dietetic Association
    • /
    • v.10 no.4
    • /
    • pp.390-400
    • /
    • 2004
  • Food and nutrition sites are the major portion of the health information sites. For the point of public health it is very important to secure validity and reliability of information on those web sites. Therefore, in this study we would like to identify problems when acquiring recipes in web sites by analyzing and reviewing recipes in web sites. To investigate Korean food recipes provided in web sites, domestic search engines such as Simmani, Naver, Hanmir, and Empas and foreign search engines such as Yahoo Korea, Lycos and Altabista Korea were used. Searchs were done using 'recipe' and 'Joribeob (cooking method)' from March 20, 2002 to June 20, 2002. Informations in each sites were reviewed and analyzed Results are as follow; When classifying 46sites searched with 'Joribeob' by the information provider, 24sites were individual, 16sites were corporate and 6sites were others. When searching 'recipe', total 12,654recipes were returned. Out of them, individual provided 2,581sites(20.4%), corporate provided 7,249sites(57.3%), and others provided 2,824sites(22.3%). 9,979(78.9%) recipes out of 12,654recipes were proved to be appropriate as Korean food. Classifying recipes by dish group, vegetables 11.7%, soups and hot soups 9.7%, stew and casseroles 8.2%, pan cakes 8.0%, stir fried foods and skewers 7.8%, rice 7.2%, hard boiled food 7.1%, steam 6.4%, noodles and mandu 5.3%, Kimchi 4.5%, fried 4.1%, and porridge 3.7% in order. 21.1% of recipes were not appropriate as Korean food but provided as Korean Food. The proportion of individual as the information provider were higher than that of enterprises. Recipes from enterprises were based on food and nutrient information and more reliable. However, there were some cases that they provided the same amount of ingredients with different calories or provided the same calories with different ingredients. Additionally, depending on sites, they provided different calories even for the same recipe. There were some cases that the calories provided on the site were too high or too low, for the suggested amount of ingredients and serving size. Recipes those provide amount of calories were evaluated using the nutrient analysis program. Calculated calories and provided calories on the Web were compared together. There are difference between two valus. With these results, it may lead misuse of recipe by those who need accuracy in diet such as patients or who are interested in recipe information for academic purposes. These results could be used as basic materials to improve quantity and quality of recipes in the future. Also, to improve the accuracy of recipies for Korean foods in the web sites, there should be some systems to monitor and let internet users know monitoring results.

  • PDF