Fast URL Lookup Using URL Prefix Hash Tree

URL Prefix 해시 트리를 이용한 URL 목록 검색 속도 향상

  • Published : 2008.02.15

Abstract

In this paper, we propose an efficient URL lookup algorithm for URL list-based web contents filtering systems. Converting a URL list into URL prefix form and building a hash tree representation of them, the proposed algorithm performs tree searches for URL lookups. It eliminates redundant searches of hash table method. Experimental results show that proposed algorithm is $62%{\sim}210%$ faster, depending on the number of segment, than conventional hash table method.

본 논문에서는 URL 목록 기반 웹사이트 컨텐츠 필터링 시스템에서 효율적인 URL 목록 검색 방식을 제안한다. 제안된 방식은 URL prefix 형태로 변환된 URL 목록을 해시 트리 형식으로 표현하여 한번의 트리 검색으로 URL 검색을 수행한다. 그 결과 단일 해시 테이블 방식의 중복 탐색을 제거하였다. 실험 결과 제안된 검색 방식은 세그먼트의 개수에 따라 단일 해시 테이블 방식에 비해 $62%{\sim}210%$의 성능 향상을 보인다.

Keywords

References

  1. 정보통신윤리위원회, http://www.icec.or.kr
  2. 한국전산원, "네트워크용 유해정보 차단도구 NCApatrol Proxy 1.0 개발 보고서", 한국전산원 연구 보고서 IV-PER-98035, 1998년 12월
  3. Web Sense Inc., http://www.websense.com
  4. Secure Computing Corporation, http://www.securecomputing.com
  5. World Wide Web Consortium, "Platform for Internet Content Selection: PICS," http://www.w3.org/PICS/
  6. M. Hammami, Y. Chahir, and L. Chen, "WebGuard: Web Based Adult Content Detection and Filtering System," in Proc. IEEE/WIC Int. Conf. on Web Intelligence, pp. 574-578, Oct. 2003
  7. C. Ding, C. Chi, J. Deng, and C. Dong, "Centralized Content-based Web Filtering and Blocking: How Far Can It Go?," in Proc. IEEE Int. Conf. on Systems, Man, and Cybernetics, vol.2, pp. 115-119, Oct. 1999
  8. R. Du, R. Safavi-Naini, and W. Susilo, "Web Filtering Using Text Classification," in Proc. IEEE Int. Conf. on Networks, pp. 325-330, Oct. 2003
  9. N. Huang, R. Liu, C. Chen, Y. Chen, and L. Huang, "Fast URL Lookup Engine for Content-Aware Multi-Gigabit Switches," in Proc. Int. Conf. on Advanced Information Networking and Applications, vol.1,  pp. 641-646, Mar. 2005
  10. Basso et al., "Method and System for Performing a Pattern Match Search for Text Strings," US Patent No. US 7054855 B2, 2006
  11. H. Yan, J. Wang, X. Li, and L. Guo, "Architectural Design and Evaluation of an Efficient Web-crawling System," in Proc. Int. Symp. on Parallel and Distributed Processing, pp. 1824-1831, Apr. 2001
  12. B. Michel, K. Nikoloudakis, P. Reiher, and L. Zhang, "URL Forwarding and Compression in Adaptive Web Caching," in Proc. IEEE. INFOCOM, vol.2, pp. 670-678, Mar. 2000
  13. P. Gupta and N. McKeown, "Algorithms for packet classification," IEEE Network, vol. 15, no. 2, pp. 24-32, March 2001
  14. Erik Burckart and Aravind Srinivasan, "Multidimensional hashed tree based URL matching engine using progressive hashing," US Patent Publication Number 2005-0055437 A1, 2005
  15. Google Inc., http://www.google.com
  16. 플랜티넷, http://www.plantynet.com