An Efficient Approach for Single-Pass Mining of Web Traversal Sequences

Kim, Nak-Min;Jeong, Byeong-Soo;Ahmed, Chowdhury Farhan;

한국정보과학회논문지:데이타베이스 (Journal of KIISE:Databases)

제37권5호
/
Pages.221-227
/
2010
/
1229-7739(pISSN)

한국정보과학회 (Korean Institute of Information Scientists and Engineers)

단일 스캔을 통한 웹 방문 패턴의 탐색 기법

An Efficient Approach for Single-Pass Mining of Web Traversal Sequences

김낙민 (경희대학교 컴퓨터공학과) ;
정병수 (경희대학교 컴퓨터공학과) ;
아메드 파한 (경희대학교 컴퓨터공학과)

투고 : 2010.08.19
심사 : 2010.08.26
발행 : 2010.10.15

PDF KSCI

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

인터넷 사용의 급증과 더불어 보다 편리한 인터넷 서비스를 위한 여러 연구가 활발히 진행되어 왔다. 웹 로그 데이터로부터 빈번하게 발생되는 웹 페이지들의 방문 시퀀스를 탐색하는 기법 역시 효과적인 웹 사이트를 설계하기 위한 목적으로 많이 연구되어 왔다. 그러나 기존의 방법들은 모두 여러 번의 데이터베이스 스캔을 필요로 하는 방법으로 지속적으로 생성되는 웹 로그 데이터로부터 빠르게 실시간적으로 웹 페이지 방문 시퀀스를 탐색하기에는 많은 어려움이 있었다. 또한 점진적(incremental)이고 대화형식(interactive)의 탐색 기법 역시 지속적으로 생성되는 웹 로그 데이터를 처리하기 위하여 필요한 기능들이다. 본 논문에서는 지속적으로 생성되는 웹 로그 데이터로부터 단일 스캔을 통하여 빈번히 발생하는 웹 페이지 방문 시퀀스를 점진적이고 대화 형식적인 방법으로 탐색하는 방법을 제안한다. 제안하는 방법은 WTS(web traversal sequence)-트리 구조를 사용하며 다양한 실험을 통하여 기존의 방법들에 비해 성능적으로 우수하고 효과적인 방범임을 증명한다.

Web access sequence mining can discover the frequently accessed web pages pursued by users. Utility-based web access sequence mining handles non-binary occurrences of web pages and extracts more useful knowledge from web logs. However, the existing utility-based web access sequence mining approach considers web access sequences from the very beginning of web logs and therefore it is not suitable for mining data streams where the volume of data is huge and unbounded. At the same time, it cannot find the recent change of knowledge in data streams adaptively. The existing approach has many other limitations such as considering only forward references of web access sequences, suffers in the level-wise candidate generation-and-test methodology, needs several database scans, etc. In this paper, we propose a new approach for high utility web access sequence mining over data streams with a sliding window method. Our approach can not only handle large-scale data but also efficiently discover the recently generated information from data streams. Moreover, it can solve the other limitations of the existing algorithm over data streams. Extensive performance analyses show that our approach is very efficient and outperforms the existing algorithm.

키워드

참고문헌

Y.-S. Lee, S.-J. Yen, "Incremental and interactive mining of web traversal patterns," In Information Sciences, vol.178, pp.287-306, 2008. https://doi.org/10.1016/j.ins.2007.08.020
Y.-S. Lee, S.-J. Yen, G.H. Tu and M.C. Hsieh, "Web usage mining: Integrating path traversal patterns and association rules," In International Conference on Informatics, Cybernetics, and Systems, pp.1464-1469, 2003.
H.-F. Li, S.-Y. Lee and M.-K. Shen, "DSM-PLW: Single-pass mining of path traversal patterns over streaming web click-sequences," In Computer Networks, vol.50, pp.1474-1487, 2006. https://doi.org/10.1016/j.comnet.2005.10.018
B. Mobasher, N. Jain, E.-H. Han, J. Srivastava, "Web mining: Pattern discovery from World Wide Web transactions," In Tech Rep: TR96-050, 1996.
R. Cooley, B. Mobasher and J. Srivastava, "Web mining: Information and pattern discovery on the world wide web," In IEEE International Conference on Tools with Artificial Intelligence, pp.558-567, 1997.
M. Spiliopoulou, and L. C. Faulstich, "Wum: A web utilization miner," In EDBT Workshop Web-DB98, Springer Verlag, pp.109-115, 1996.
R. Agrawal, R. Srikant, "Mining Sequential Patterns," In IEEE International Conference on Data Engineering (ICDE), pp.3-14, 1995.
J. Pei, J. Han, B. Mortazavi-asl and H. Zhu, "Mining access patterns efficiently from web logs," In Pacific-Asia Conf. on Knowledge Discovery and Data Mining (PAKDD), pp.396-407, 2000.
Syed Khairuzzaman Tanbeer, Chowdhury Farhan Ahmed, Byeong-Soo Jeong, and Young-Koo Lee, "Sliding Window-based Frequent Pattern Mining over Data Streams," Information Sciences, vol.179, Issue 22, pp.3843-3865, 2009. https://doi.org/10.1016/j.ins.2009.07.012
http://www.almaden.ibm.com/cs/projects/iis/hdb/Projects/data mining/datasets/syndata.html
C.I. Ezeife, Y. Lu, "Mining web log sequential patterns with position coded pre-order linked WAP-t ree," In Data Mining and Knowledge Discovery, vol.10, pp.53-87, 2005.
S. Yang, J. Guo and Y. Zhu, "An efficient algorithm for web access pattern mining," In International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), pp.726-729, 2007.
S.J. Yen, Y.S. Lee, C.W. Cho, "Efficient approach for the maintenance of path traversal patterns," In IEEE International Conference on e-Technology, e-Commerce and e-Service, pp.207-214, 2004.
J. Srivastava, R. Cooley, M. Deshpande, and P.-N. Tan, "Web usage mining: discovery and applications of usage patterns from web data," In SIGKDD Explorations, vol.1, no.2, pp.12-23, 2000. https://doi.org/10.1145/846183.846188
W. Wang and P. T. Cao-Thai, "Novel positioncoded methods for mining web access patterns," In IEEE International Conference on Intelligence and Security Informatics(ISI), pp.194-196, 2008.
B. Zhou, S. C. Hui and A. Fong, "CS-Mine: An efficient wap-tree mining for web access patterns," In International Asia-Pacific Web Conference (APWeb), pp.523-532, 2004.

한국정보과학회논문지:데이타베이스 (Journal of KIISE:Databases)

단일 스캔을 통한 웹 방문 패턴의 탐색 기법

An Efficient Approach for Single-Pass Mining of Web Traversal Sequences

초록

키워드

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)