DOI QR코드

DOI QR Code

Issue Analysis on Gas Safety Based on a Distributed Web Crawler Using Amazon Web Services

AWS를 활용한 분산 웹 크롤러 기반 가스 안전 이슈 분석

  • Kim, Yong-Young (Division of International Business, Konkuk University) ;
  • Kim, Yong-Ki (Department of Computer Engineering, Chungbuk National University) ;
  • Kim, Dae-Sik (Department of Computer Engineering, Chungbuk National University) ;
  • Kim, Mi-Hye (Department of Computer Engineering, Chungbuk National University)
  • Received : 2018.11.05
  • Accepted : 2018.12.20
  • Published : 2018.12.28

Abstract

With the aim of creating new economic values and strengthening national competitiveness, governments and major private companies around the world are continuing their interest in big data and making bold investments. In order to collect objective data, such as news, securing data integrity and quality should be a prerequisite. For researchers or practitioners who wish to make decisions or trend analyses based on objective and massive data, such as portal news, the problem of using the existing Crawler method is that data collection itself is blocked. In this study, we implemented a method of collecting web data by addressing existing crawler-style problems using the cloud service platform provided by Amazon Web Services (AWS). In addition, we collected 'gas safety' articles and analyzed issues related to gas safety. In order to ensure gas safety, the research confirmed that strategies for gas safety should be established and systematically operated based on five categories: accident/occurrence, prevention, maintenance/management, government/policy and target.

새로운 경제적 가치를 창출하고 국가경쟁력을 강화할 목적으로 세계 각국의 정부와 주요 민간 기업들은 빅데이터에 지속적인 관심과 과감한 투자를 하고 있다. 뉴스와 같이 객관적인 데이터를 수집하기 위해서, 데이터 무결성 및 품질의 확보는 전제되어야 한다. 포털 뉴스와 같이 객관적이고 방대한 데이터를 바탕으로 의사결정이나 트렌드 분석을 하고자 하는 연구자나 실무자의 경우, 기존 크롤러 방식을 이용할 경우 데이터 수집 자체가 차단되는 문제점이 발생한다. 본 연구에서는 Amazon Web Services (AWS)에서 제공하는 클라우드 서비스 플랫폼을 이용하여 기존 크롤러 방식의 문제점을 해결하여 웹 데이터를 수집하는 방법을 구현하였다. 또한 이를 바탕으로 국민의 안전과 직결되는 가스 안전 관련 기사를 수집하여 가스 안전과 관련된 이슈를 분석하였다. 본 연구를 통해 가스 안전을 확보하기 위해 5가지 분류, 즉 사고/발생, 예방, 유지/관리, 정부/정책, 그리고 대상 등을 기준으로 가스 안전을 위한 전략이 수립되고, 체계적으로 운영되어야 함으로 확인하였다.

Keywords

DJTJBT_2018_v16n12_317_f0001.png 이미지

Fig. 1. Web Crawling Blocking Problems of Three Types of Web Crawler

DJTJBT_2018_v16n12_317_f0002.png 이미지

Fig. 2. The Structure of Distributed Web Crawler Using AWS

DJTJBT_2018_v16n12_317_f0003.png 이미지

Fig. 3. The Process of Distributed Web Crawler Using AWS

DJTJBT_2018_v16n12_317_f0004.png 이미지

Fig. 4. The Speed of Each Crawler

Table 1. Rank of Keyword Frequency based on ‘Gas Safety’

DJTJBT_2018_v16n12_317_t0001.png 이미지

Table 2. Rank of Keyword Frequency based on ‘Gas Accident’

DJTJBT_2018_v16n12_317_t0002.png 이미지

References

  1. S. Oh, J. M. Lee & Y. Y. Kim. (2017). A Study on the Job Satisfaction in the Smart Work Environment, Journal of the Korea Convergence Society, 8(11), 393-401. https://doi.org/10.15207/JKCS.2017.8.11.393
  2. H. Chen, R. H. L. Chiang & V. C. Storey. (2012). Business Intelligence and Analytics: From Big Data to Big Impact, MIS Quarterly, 36(4), 1165-1188. https://doi.org/10.2307/41703503
  3. A. De Mauro, M. Greco & M. Grimaldi. (2016). A Formal Definition of Big Data Based on Its Essential Features, Library Review, 65(3), 122-135. https://doi.org/10.1108/LR-06-2015-0061
  4. X. Wu et al. (2014). Data Mining with Big Data, IEEE Transactions on Knowledge and Data Engineering, 26(1), 97-107. https://doi.org/10.1109/TKDE.2013.109
  5. P. Philipp et al. (2017). A Semantic Framework for Sequential Decision Making, Journal of Web Engineering, 16(5-6), 471-504.
  6. B. Shin & H. Jeon. (2018). A Study on Disaster Information Support Using Big Data, Journal of the Korea Convergence Society, 9(8), 25-32. https://doi.org/10.15207/JKCS.2018.9.8.025
  7. I. A. T. Hashem et al. (2015). The Rise of "Big Data" on Cloud Computing: Review and Open Research Issues, Information Systems, 47, 98-115. https://doi.org/10.1016/j.is.2014.07.006
  8. A. S. Matteson, S. Choi & H. Lim. (2018), Inference of Korean Public Sentiment from Online News, Journal of the Korea Convergence Society, 9(7), 25-31. https://doi.org/10.15207/JKCS.2018.9.7.025
  9. H. Seo & H. Park. (2018). Design and Implementation of Potential Advertisement Keyword Extraction System Using SNS, Journal of the Korea Convergence Society, 9(7), 17-24. https://doi.org/10.15207/JKCS.2018.9.7.017
  10. Web Crawler. Available from: https://en.wikipedia.org/wiki/Web_crawler.
  11. S. Thenmalar & T. V. Geetha. (2014). The Modified Concept Based Focused Crawling Using Ontology, Journal of Web Engineering, 13(5-6), 525-538.
  12. S. Choudhary et al. (2014). Model-Based Rich Internet Applications Crawling: "Menu" and "Probability" Models, Journal of Web Engineering, 13(3-4), 243-262.
  13. J. Cho & H. Garcia-Molina. (2002). Parallel Crawlers, 11th International Conference on World Wide Web.
  14. J. Cho, H. Garcia-Molina & L. Page. (1998). Efficient Crawling through URL Ordering, Computer Networks and ISDN Systems, 30(1), 161-172. https://doi.org/10.1016/S0169-7552(98)00108-1
  15. A. Heydon & M. Najork. (1999). Mercator: A Scalable, Extensible Web Crawler, World Wide Web, 2(4), 219-229. https://doi.org/10.1023/A:1019213109274
  16. C. D. Manning, P. Raghavan & H. Schutze. (2008). Introduction to Information Retrieval, Cambridge University Press, 2008.
  17. J. Cho et al. (2006). Stanford WebBase Components and Applications, ACM Transactions on Internet Technology, 6(2), 153-186. https://doi.org/10.1145/1149121.1149124
  18. AWS. Available from: https://aws.amazon.com/