DOI QR코드

DOI QR Code

Research on Training and Implementation of Deep Learning Models for Web Page Analysis

웹페이지 분석을 위한 딥러닝 모델 학습과 구현에 관한 연구

  • 김정환 (고려대학교 디지털경영학과) ;
  • 조재원 (명지전문대학교 산업디자인학과) ;
  • 김진산 (서울사이버대 빅데이터 정보보호학과) ;
  • 이한진 (한동대학교 창의융합교육원)
  • Received : 2024.01.02
  • Accepted : 2024.01.31
  • Published : 2024.03.31

Abstract

This study aims to train and implement a deep learning model for the fusion of website creation and artificial intelligence, in the era known as the AI revolution following the launch of the ChatGPT service. The deep learning model was trained using 3,000 collected web page images, processed based on a system of component and layout classification. This process was divided into three stages. First, prior research on AI models was reviewed to select the most appropriate algorithm for the model we intended to implement. Second, suitable web page and paragraph images were collected, categorized, and processed. Third, the deep learning model was trained, and a serving interface was integrated to verify the actual outcomes of the model. This implemented model will be used to detect multiple paragraphs on a web page, analyzing the number of lines, elements, and features in each paragraph, and deriving meaningful data based on the classification system. This process is expected to evolve, enabling more precise analysis of web pages. Furthermore, it is anticipated that the development of precise analysis techniques will lay the groundwork for research into AI's capability to automatically generate perfect web pages.

본 연구는 ChatGPT 서비스의 개시 이후 인공지능 혁명이라 일컬어지는 시대적 배경 속에서, 웹사이트의 제작과 인공지능의 융합을 위해 딥러닝 모델을 학습 및 구현하고자 한다. 딥러닝 모델은 수집한 3,000개의 웹페이지 이미지를 구성요소와 레이아웃 분류체계 기반의 데이터 가공을 통해 학습하였으며, 다음과 같은 세 가지 단계로 구분하여 진행하였다. 첫째, 인공지능 모델에 관한 선행연구를 조사하여 구현하고자 하는 모델에 가장 적합한 알고리즘을 선택하였다. 둘째, 적합한 웹페이지 및 단락 이미지를 수집하고 분류 및 가공하였다. 셋째, 딥러닝 모델을 학습시키고 서빙 인터페이스를 연동해 모델의 실제 결과를 확인하였다. 이렇게 구현된 모델은 실제 웹페이지를 구성하는 복수의 단락을 탐지하고, 단락별 규모, 요소, 특징을 분석하여 분류체계를 기반으로 의미 있는 데이터를 도출할 것이다. 이 과정은 점차 발전하여 웹페이지를 보다 정밀하게 분석할 수 있게 될 것이다. 그리고 정밀 분석기법을 역으로 설계하여, 인공지능이 완벽한 웹페이지를 자동으로 생성할 수 있는 연구의 초석이 될 것으로 기대한다.

Keywords

References

  1. Kaluarachchi, T., and Wickramasinghe, M. (2023). A systematic literature review on automatic website generation. Journal of Computer Languages, 75. https://doi.org/10.1016/j.cola.2023.101202
  2. Lee, J.-S. (2003). Aspect-Oriented Programming and Subject-Oriented Programming. Korea Information Processing Society Review, Vol. 21, No. 9, pp. 94-101.
  3. Sivasubramanian, S., Szymaniak, M., Pierre, G., & Steen, M. v. (2004). Replication for web hosting systems. ACM Computing Surveys (CSUR), Vol. 36, No. 3, pp. 291-334. DOI : https://doi.org/10.1145/1035570.1035573
  4. Muhammad Garib, N. S. (2006). Online Website Builder for Non-Programmers.
  5. Bangboonrit, C. (2004). Site builder: build, market and manage business website with CMS.
  6. Xu, Y., Bo, L., Sun, X., Li, B., Jiang, J., & Zhou, W. (2021). image2emmet: Automatic code generation from web user interface image. Journal of Software: Evolution and Process, Vol. 33, No. 8, e2369, DOI : https://doi.org/10.1002/smr.2369
  7. Hashimoto, Y. and T. Igarashi (2005). Retrieving Web Page Layouts using Sketches to Support Example-based Web Design. SBM, DOI : http://dx.doi.org/10.2312/SBM/SBM05/155-164
  8. Chang, K. S.-P. and B. A. Myers (2012). WebCrystal: understanding and reusing examples in Web authoring. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. DOI : https://doi.org/10.1145/2207676.2208740
  9. Baule, D., von Wangenheim, C. G., von Wangenheim, A., Hauck, J. C., & Junior, E. C. V. (2021). Automatic code generation from sketches of mobile applications in end-user development using Deep Learning. arXiv preprint arXiv:2103.05704. DOI : https://doi.org/10.48550/arXiv.2103.05704
  10. Kaluarachchi, T. and M. Wickramasinghe (2023). "A systematic literature review on automatic Website generation." Journal of Computer Languages 75. DOI : https://doi.org/10.1016/j.cola.2023.101202
  11. Xi, C., and Chung, J. (2023). A Study on Character Design Using [Midjourney] Application. International Journal of Advanced Culture Technology, Vol. 11, No. 2, pp. 409-414. DOI : https://doi.org/10.17703/IJACT.2023.11.2.409
  12. McCarthy, J., Minsky, M. L., Rochester, N., & Shannon, C. E. (2006). A proposal for the dartmouth summer research project on artificial intelligence, august 31, 1955. AI magazine, Vol. 27, No. 4, pp. 12-12. DOI : https://doi.org/10.1609/aimag.v27i4.1904
  13. Hinton, G.E., Osindero, S., & Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural computation, Vol. 18, No. 7, 1527-1554. DOI : https://doi.org/10.1162/neco.2006.18.7.1527
  14. Cai, S., Bileschi, S., & Nielsen, E. (2020). Deep Learning with JavaScript: Neural networks in TensorFlow.js. Manning. https://books.google.co.kr/id=N2dswgEACAAJ
  15. Mahesh, B. (2020). Machine learning algorithms-a review. International Journal of Science and Research, Vol. 9, No. 1, pp. 381-386. DOI : https://doi.org/10.21275/ART20203995
  16. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2017). Imagenet classification with deep convolutional neural networks. Communications of the ACM, Vol. 60, No. 6, pp. 84-90. DOI : https://doi.org/10.1145/3065386
  17. Lozano-Diez, A., Zazo, R., Toledano, D. T., & Gonzalez-Rodriguez, J. (2017). An analysis of the influence of deep neural network (DNN) topology in bottleneck feature based language recognition. PloS one, Vol. 12, No. 8, e0182580. DOI : https://doi.org/10.1371/journal.pone.0182580
  18. Amanatullah. (2023). Vanishing Gradient Problem in Deep Learning: Understanding, Intuition, and Solutions. Retrieved from https://medium.com/@amanatulla1606
  19. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778). DOI: https://doi.org/10.1109/CVPR.2016.90
  20. Tan, M., and Le, Q. (2019, May). Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning (pp. 6105-6114). PMLR. DOI : https://doi.org/10.48550/arXiv.1905.11946
  21. Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 580-587). DOI : https://doi.org/10.48550/arXiv.1311.2524
  22. Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779-788). DOI : https://doi.org/10.1109/CVPR.2016.91
  23. Rath, S. (2023). YOLOv8 : Comprehensive Guide to State Of The Art Object Detection. https://learnopencv.com/ultralytics-yolov8/
  24. Hanjin Lee, Soyeon Kwon, & Daihwan Min (2021), The Empirical Research on the User Satisfaction of Mobile Grocery Shopping Customer Journey, Journal of Information Technology Applications & Management, Vol. 28, No. 4, pp. 59-78. DOI : https://doi.org/10.21219/jitam.2021.28.4.059
  25. Hyun-ju Kim, and Jinyoung Lee (2024). A Study on A Study on the University Education Plan Using ChatGPTfor University Students, The Journal of the Convergence on Culture Technology (JCCT), Vol. 10, No. 1, pp. 71-79. DOI : http://dx.doi.org/10.17703/JCCT.2024.10.1.71
  26. Hanjin Lee, Young-geun Park, & Daihwan Min, (2020). Analysis of Factors Affecting the Continuance Intention to Use Mobile Grocery Shopping. The Journal of Information Systems, 29(2), 95-110. DOI : https://doi.org/10.5859/KAIS.2020.29.2.95
  27. Suhyun Park, Yeeun Lee, & Hanjin Lee (2024). Research on Enhancing Customer Experience through AI-Supported Review Generation, The Transactions of the Korean Institute of Electrical Engineers, Vol. 73, No. 2, pp. 334-342. DOI : https://doi.org/10.5370/KIEE.2024.73.2.334
  28. Hyunjin Kim, Yeongjo Kim, Donghyeon Yun, & Hanjin Lee (2024). Empirical Research on the Interaction between Visual Art Creation and Artificial Intelligence Collaboration, The Journal of the Convergence on Culture Technology (JCCT), Vol. 10, No. 1, pp.517-524. DOI : http://dx.doi.org/10.17703/JCCT.2024.1.571