DOI QR코드

DOI QR Code

Framework for Efficient Web Page Prediction using Deep Learning

  • Received : 2020.12.04
  • Accepted : 2020.12.23
  • Published : 2020.12.31

Abstract

Recently, due to exponential growth of access information on the web, the importance of predicting a user's next web page use has been increasing. One of the methods that can be used for predicting user's next web page is deep learning. To predict next web page, web logs are analyzed by data preprocessing and then a user's next web page is predicted on the output of the analyzed web logs using a deep learning algorithm. In this paper, we propose a framework for web page prediction that includes methods for web log preprocessing followed by deep learning techniques for web prediction. To increase the speed of preprocessing of large web log, a Hadoop based MapReduce programming model is used. In addition, we present a web prediction system that uses an efficient deep learning technique on the output of web log preprocessing for training and prediction. Through experiment, we show the performance improvement of our proposed method over traditional methods. We also show the accuracy of our prediction.

웹에서 접근하는 정보의 폭발적인 증가에 따라 사용자의 다음 웹 페이지 사용을 예측하는 문제의 중요성이 증가되었다. 사용자의 다음 웹 페이지 접근을 예측하는 방법 중 하나가 딥 러닝 기법이다. 웹 페이지 예측 절차는 데이터 전처리 과정을 통해 웹 로그 정보들을 분석하고 딥 러닝 기법을 이용하여 분석된 웹 로그 결과를 가지고 사용자가 접근할 다음 웹 페이지를 예측한다. 본 논문에서는 웹 페이지 예측을 위한 효율적인 웹 로그 전처리 작업과 분석을 위해 딥 러닝 기법을 사용하는 웹 페이지 예측 프레임워크를 제안한다. 대용량 웹 로그 정보의 전처리 작업 속도를 높이기 위하여 Hadoop 기반 맵/리듀스(MapReduce) 프로그래밍 모델을 사용한다. 또한 웹 로그 정보의 전처리 결과를 이용한 학습과 예측을 위한 딥 러닝 기반 웹 예측 시스템을 제안한다. 실험을 통해 논문에서 제안한 방법이 기존의 방법과 비교하여 성능 개선이 있다는 사실을 보였고 아울러 다음 페이지 예측의 정확성을 보였다.

Keywords

References

  1. Neha Sharma, Pawan Makhija "Web usage Mining: Web user Session Construction using Map-Reduce", Global journal of Computer Science and Technology (E), volume 17, issue 4, 2017.
  2. Zidrina Pabarskaite, Aistis Raudys, "A process of knowledge discovery from web log data: Systemization and critical review", Journal of Intelligent Information System, Springer, 2007.
  3. Natheer Khasawnech, Chien-Chung Chan. "Active User-Based and Ontology-based Log Data Preprocessing for Web Usage Mininig" Proceedings of th 2006 ACM international conference on web Intelligence Applications, 2006
  4. Jeffrey Dean and Sanjay Ghemawat, "MapReduce: Simplifed Data Processing on Large Clusters" OSDI 2004
  5. Om Prakash Mandal, Hiteshware Kumar Azad "Web Access Prediction Model using Clustering and Artificial Neural Network", IJERT, Vol.3 Issue 9, 2014
  6. Https://towardsdatascience.com/recurrent-neural-networks-and-lstm-4b601dd822a
  7. Zhou, B., Hui, S. and Fong, A. "An effective approach for periodic web personalization", 2006 IEEE/WIC/ACM International Conference on Web Intelligence, 2006, pp. 284-292.", Journal of Intelligent Information System, Springer, 2007
  8. Castellano, G., Fanelli, A.M. and Torsello, M.A. (2007), "LODAP: a log data preprocessor for mining web browsing patterns", Proceedings of the 6th Conference on 6th WSEAS International Conference on Artificial Intelligence, Knowledge Engineering and Data Bases, World Scientific and Engineering Academy and Society (WSEAS), 2007, pp. 12-17
  9. Peng, Z. and Zhao, M. "Session identification algorithm for web log mining", 2010 International Conference on Management and Service Science, IEEE, 2010 pp. 1-4.
  10. Xinhua, H. and Qiong, W. , "Dynamic timeout-based a session identification algorithm", 2011 International Conference on Electric Information and Control Engineering, IEEE, 2011, pp. 346-349
  11. Chitraa, V. and Thanamani, A., "A novel technique for sessions identification in web usage mining preprocessing", International Journal of Computer Applications, Vol. 34 No. 9, 2011, pp. 24-28.
  12. Pruthvi, "Web-Users' Browsing behavior Prediction by Implementing Neural Network in MapReduce", IJAFRC, Vol.1 Issue 5, 2014
  13. Vidushi, Yashpal Singh, "SOM Improved Neural Network Approach for Next Page Prediction", International Journal of Computer Science and Mobile Computing, Vol.4 Issue.5, May-2015, pg. 175-181
  14. Http://ita.ee.lbl.gov/html/contrib/NASA-HTTP.html
  15. Http://ita.ee.lbl.gov/html/contrib/ClarkNet-HTTP.html