• Title/Summary/Keyword: RSS Crawling

Search Result 4, Processing Time 0.019 seconds

System Design for Collecting Real-Time Product Information Using RSS (RSS를 이용한 실시간 상품정보 수집시스템의 설계)

  • Chuluun, Munkhzaya;Ko, Sun-Woo
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.35 no.1
    • /
    • pp.1-9
    • /
    • 2012
  • It is well known that internet shoppers are very sensitive to sale prices. They visit the various shopping malls and collect the product information including purchase conditions for goods purchase decision-making. Recently the necessity of information support is increasing because of increase of information amount which is necessary and complexity of goods purchase decision-making process. The comparison shopping agent systems have provided price comparison information which is collected from various shopping malls to satisfy internet shoppers information craving. But the frequent price change caused by keen price competition is becoming the primary reason of information quality decline among price comparison sites. RSS which is a family of web feed formats used to publish frequently updated is applied even in on-line shopping malls. This paper develops a RSS product information collection system to get real-time product information. The proposed product information system consists of (1) web crawler module for searching RSS feed shopping malls automatically, (2) RSS reader module for parsing product information from RSS feed file, (3) product DB and (4) product searching module. Performance of the proposed system is higher than the comparison shopping agent systems when it is defined with the volume of collecting product information per unit time.

Information Retrieval System for R2SS (R2SS 기반의 정보검색 시스템)

  • Hong, Seok-Joo;Park, Young-Bae
    • The Journal of the Korea Contents Association
    • /
    • v.9 no.12
    • /
    • pp.39-51
    • /
    • 2009
  • This study matters the design and implementation of an intelligent information search engine that is based on the $R^2SS$(Reverse Really Simple Syndication). Apart from to the previous method, where the user inputs the RSS address that one intends and obtains limited RSS information, the user just types in the information that one appoints to acquire the RSS information of standard documents that the user is interested among several RSS addresses by a Reverse RSS(Really Simple Syndication) method, which is drawn by the automated RSS address collection server in realtime. Through the proposed $R^2SS$(Really Reverse Simple Syndication) based intelligent information search engine, time can be significantly saved along with obtaining information with good quality, furthermore, it has the effects of having a personal secretary.

State Information Based Recommendation Algorithm for Minimizing the Malicious User's Influence (상태 정보를 활용하여 악의적 사용자의 영향력을 최소화 하는 추천 알고리즘)

  • Noh, Taewan;Oh, Hayoung;Noh, Giseop;Kim, Chong-Kwon
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.25 no.6
    • /
    • pp.1353-1360
    • /
    • 2015
  • With the extreme development of Internet, recently most users refer the sites with the various Recommendation Systems (RSs) when they want to buy some stuff, movie and music. However, the possibilities of the Sybils with the malicious behaviors may exists in these RSs sites in which Sybils intentionally increase or decrease the rating values. The RSs cannot play an accurate role of the proper recommendations to the general normal users. In this paper, we divide the given rating values into the stable or unstable states and propose a system information based recommendation algorithm that minimizes the malicious user's influence. To evaluate the performance of the proposed scheme, we directly crawl the real trace data from the famous movie site and analyze the performance. After that, we showed proposed scheme performs well compared to existing algorithms.

Intelligent Web Crawler for Supporting Big Data Analysis Services (빅데이터 분석 서비스 지원을 위한 지능형 웹 크롤러)

  • Seo, Dongmin;Jung, Hanmin
    • The Journal of the Korea Contents Association
    • /
    • v.13 no.12
    • /
    • pp.575-584
    • /
    • 2013
  • Data types used for big-data analysis are very widely, such as news, blog, SNS, papers, patents, sensed data, and etc. Particularly, the utilization of web documents offering reliable data in real time is increasing gradually. And web crawlers that collect web documents automatically have grown in importance because big-data is being used in many different fields and web data are growing exponentially every year. However, existing web crawlers can't collect whole web documents in a web site because existing web crawlers collect web documents with only URLs included in web documents collected in some web sites. Also, existing web crawlers can collect web documents collected by other web crawlers already because information about web documents collected in each web crawler isn't efficiently managed between web crawlers. Therefore, this paper proposed a distributed web crawler. To resolve the problems of existing web crawler, the proposed web crawler collects web documents by RSS of each web site and Google search API. And the web crawler provides fast crawling performance by a client-server model based on RMI and NIO that minimize network traffic. Furthermore, the web crawler extracts core content from a web document by a keyword similarity comparison on tags included in a web documents. Finally, to verify the superiority of our web crawler, we compare our web crawler with existing web crawlers in various experiments.