RSS Channel Recommendation System using Focused Crawler

주제 중심 수집기를 이용한 RSS 채널 추천 시스템

  • Lee, Young-Seok (Dept. of Electronics & Computer Engineering, Hanyang University) ;
  • Cho, Jung-Woo (Dept. of Computer Education, Cheju National University) ;
  • Kim, Jun-Il (Research Institute, WISE iTech Co., Ltd.) ;
  • Choi, Byung-Uk (Dept. of Electronics & Computer Engineering, Hanyang University)
  • 이영석 (한양대학교 전자통신컴퓨터공학과) ;
  • 조정원 (제주대학교 컴퓨터교육과) ;
  • 김준일 (위세아이텍 연구소) ;
  • 최병욱 (한양대학교 전자통신컴퓨터공학과)
  • Published : 2006.11.25

Abstract

Recently, the internet has seen tremendous growth with plenty of enriched information due to an increasing number of specialized personal interests and popularizations of private cyber space called, blog. Many of today's blog provide internet users, RSS, which is also hewn as the syndication technology. It enables blog users to receive update automatically by registering their RSS channel address with RSS aggregator. In other words, it keeps internet users wasting their time checking back the web site for update. This paper propose the ways to manage RSS Channel Searching Crawler and collected RSS Channels for internet users to search for a specific RSS channel of their want without any obstacles. At the same time. This paper proposes RSS channel ranking based on user popularity. So, we focus on an idea of adding index to information and web update for users to receive appropriate information according to user property.

최근 빠른 주기로 많은 양의 새로운 정보가 생성되기 때문에, 개인별 관심 분야의 전문화와 블로그의 보급을 위해 RSS라는 신디케이션 기술이 제공되고 있다. 사용자는 RSS 수집기에 RSS 채널의 주소를 등록함으로써, 새롭게 갱신된 콘텐츠를 자동으로 전달받을 수 있어서 신규 정보를 찾기 위해 사이트에 지속적으로 접근하지 않아도 된다. 본 논문에서는 사용자가 웹상에 존재하는 RSS 문서를 효과적으로 이용할 수 있도록 RSS 채널의 주소를 수집하는 주제 중심의 수집가와 사용자 질의에 따른 RSS 채널의 순위 부여 방안을 제안한다. 제안된 RSS 수집기를 이용하면 사용자는 원하는 RSS 채널 주소를 효과적으로 검색 할 수 있어서 자료 검색의 효율성을 증진시킬 수 있다.

Keywords

References

  1. PEW INTERNET & AMERICAN LIFE VPROJECT, http://www.pewinternet.org, 2004
  2. World Wide Web Consortium, http://www.w3c.org, 2005
  3. RSS Technology Reports, http://www.oasis-open.org/cover/rss.html, 2005
  4. Weihong Huang, 'Enabling Context-Aware Agents to Understand Semantic Resources on the WWW and The Sementic Web', Proc. of the IEEE/WIC/ACM International Conference on Web Intelligence, pp. 138-144, 2004 https://doi.org/10.1109/WI.2004.10028
  5. 김성진, '웹 로봇 구현 및 한국 웹 통계 보고', 한국 정보처리 학회, 10권 C편, 제 4호, 2003 https://doi.org/10.3745/KIPSTC.2003.10C.4.509
  6. Soumen Chakrabarti, 'Focused crawling: a new approach to topic-specific web resource discovery', In Proc. of 8th International World Wide Web Conference, 1999
  7. F. Menczer, 'Evaluating topic-driven Web crawlers', Proc. 24th annual Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval, pp. 241-149. 2001 https://doi.org/10.1145/383952.383995
  8. Cautam Pant, 'Topical Crawling for Business Intelligence', Proc. of ECDL 2003, pp. 233-244, 2003 https://doi.org/10.1007/b11967
  9. BlogKorea, http://www.blogkorea.org, 2005
  10. AllBlog, http://www.allblog.net, 2005
  11. BLOZINE, http://www.blozine.com, 2005
  12. Haruo Hosoya, 'Regular expression pattern matching for XML', Proc. of the 28th ACM SIGPLAN-SIGACT symposium on Principles of programming language, pp.67-80, 2001 https://doi.org/10.1145/360204.360209
  13. B. Yuwono, 'Search and ranking algorithms for locating resources on World Wide Web', Proc. of the Int. Conf. on Data Engineering, pp. 164-171, 1996
  14. Jon Kleiboemer, 'Authoritative sources in a hyperlinked environment', Proc. of the 9th ACM-SIAM symposium on Discrete Algorithms, pp. 668-677, 1998
  15. Brin, S., & Page, L., 'The Anatomy of a Large-Scale Hypertextual Web Search Engine', Computer Networks and ISDN Systems, pp. 1107-117, 1998 https://doi.org/10.1016/S0169-7552(98)00110-X