DOI QR코드

DOI QR Code

Crepe Search System Design using Web Crawling

웹 크롤링 이용한 크레페 검색 시스템 설계

  • Kim, Hyo-Jong (Department of Information Security, Tongmyong University) ;
  • Han, Kun-Hee (Division of Information & Communication Engineering, Baekseok University) ;
  • Shin, Seung-Soo (Department of Information Security, Tongmyong University)
  • 김효종 (동명대학교 정보보보학과) ;
  • 한군희 (백석대학교 정보통신학부) ;
  • 신승수 (동명대학교 정보보보학과)
  • Received : 2017.10.02
  • Accepted : 2017.11.20
  • Published : 2017.11.28

Abstract

The purpose of this paper is to provide a search system using a method of accessing the web in real time without using a database server in order to guarantee the up-to-date information in a single network, rather than using a plurality of bots connected by a wide area network Design. The method of the research is to design and analyze the system which can search the person and keyword quickly and accurately in crepe system. In the crepe server, when the user registers information, the body tag matching conversion process stores all the information as it is, since various styles are applied to each user, such as a font, a font size, and a color. The crepe server does not cause a problem of body tag matching. However, when executing the crepe retrieval system, the style and characteristics of users can not be formalized. This problem can be solved by using the html_img_parser function and the Go language html parser package. By applying queues and multiple threads to a general-purpose web crawler, rather than a web crawler design that targets a specific site, it is possible to utilize a multiplier that quickly and efficiently searches and collects various web sites in various applications.

본 연구의 목적은 광역 네트워크로 연결된 다수의 봇을 활용한 방식이 아닌 단일 네트워크에서 정보의 최신성을 보장하기 위해 데이터베이스 서버를 사용하지 않고 실시간으로 웹에 접속하여 정보를 불러오는 방식을 사용한 검색 시스템을 설계하는 것이다. 연구의 방법은 크레페 시스템에서 신속하고 정확한 인물과 키워드 검색을 할 수 있는 시스템을 설계하고 분석한다. 크레페 서버는 본문 태그 매칭 변환 과정은 사용자가 정보를 등록할 경우 글자체, 글자 크기, 색상등과 같이 사용자마다 여러 스타일이 적용되어 그 자체가 정보가 되기 때문에 모든 정보를 그대로 저장하게 된다. 크레페 서버는 본문 태그 매칭 문제점이 발생되지 않는다. 그러나 크레페 검색 시스템을 실행할 때에는 사용자들의 스타일 및 특성을 정형화할 수 없다. 이러한 문제점을 html_img_parser 함수와 Go언어의 html 파서 패키지를 사용하면 해결할 수 있다. 특정 사이트를 대상으로 하는 웹 크롤러 설계가 아닌 범용 웹 크롤러에 큐와 다중 스레드를 적용하여 다양한 웹 사이트를 빠르고 효율적으로 탐색, 수집한 빅 데이터를 다양한 응용 분야에 활용될 수 있을 것이다.

Keywords

References

  1. Jung-In Kim, Byung-Man Kim, Jung-Ju Kim, "A Development of Digital Curation System for Creativity and Personality Education", Journal of Korea Multimedia Society, Vol. 19, No. 9, pp.1710-1722, 2016. https://doi.org/10.9717/kmms.2016.19.9.1710
  2. Young-Hee Ahn, Ok-Wha Park, "Development of a Framework for Digital Curation Policy", Journal of Korean Library and Information Science Society, Vol 41, No. 1, pp.167-186, 2010. https://doi.org/10.16981/kliss.41.1.201003.167
  3. Kang Soon Lee, "Development of Elementary Dance Education Program Using ICT", Korean Society For The Study Of Physical Education, Vol. 18, No. 2, pp.77-89, 2013.
  4. H.K. Kim, Digital Curation Framework Research for Analyzing Issues Based on Big- Data, Master's Thesis of Chung-Ang University of Technology, 2014.
  5. Jung-In Kim, Byung-Man Kim, Jung-Ju Kim, "A Development of Digital Curation System for Creativity and Personality Education", Journal of Korea Multimedia Society, Vol. 19, No. 9, pp. 1710-1722, 2016. https://doi.org/10.9717/kmms.2016.19.9.1710
  6. S.S. Shin, J.I. Kim, and J.J. Youn, "Vulnerability Analysis of the Creativity and Personality Education Based on Digital Convergence Curation System," Journal of Korea Convergence Society, Vol. 6, No. 4, pp.225-234, 2015. https://doi.org/10.15207/JKCS.2015.6.4.225
  7. Kwang-Young Kim, Won-Goo Lee, Hwa-Mook Yoon, Sung-Ho Shin, Min-Ho Lee, "Development of Web Crawler for Archiving Web Resources," Journal of the Korea Contents Association, Vol. 11, No. 9, pp.9-16, 2011. https://doi.org/10.5392/JKCA.2011.11.9.009
  8. Wan-Sup Cho, Jeong-Eun Lee, Chi-Hwan Choi, "Refresh Cycle Optimization for Web Crawlers," Journal of the Korea Contents Association, Vol. 13, No. 6, pp.30-39, 2013. https://doi.org/10.5392/JKCA.2013.13.06.030
  9. N.E. Han and S.H. Kim, "Comparative Analysis on Digital Curation Process in Foreign Academic Libraries," Journal of Korean Library and Information Science Society, Vol. 45, No. 2, pp. 93-116, 2014.
  10. H.H. Lee and W.J. Lee, "A Study on the Design of Curation System of Customized Sport Convergence Contents for Activation of Sport for All," Journal of Korea Multimedia Society, Vol. 19, No. 2, pp. 396-404, 2016. https://doi.org/10.9717/kmms.2016.19.2.396
  11. B.H. Cho, "The Trend of Digital Curation Service," Week Technology Trends, Vol. 2013, No. 42, pp. 1-10, 2013.
  12. Myoung-sil Choi , "A Study on the Improvement of the Web-Crawler Performance based on Weighted Directed Graph," Department of Computer Science, Graduate School, Kyungpook National University, 2010.
  13. Dae Yu Kim, Jung Tae Kim, "Efficient Design of Web Searching Robot Engine Using Distributed Processing Method with Javascript Function," The journal of the Korea Institute of Maritime Information & Communication Sciences, Vol. 13, No. 12, pp.2595-2602, 2009.
  14. Kwang Hyun Kim, Joon Ho Lee, "A Methodology for Performance Evaluation of Web Robots," Information Processing Society, Vol. 11, No. 3, pp.563-570, 2006.