• Title/Summary/Keyword: 스크래핑

Search Result 18, Processing Time 0.029 seconds

A Study on the Analysis of Accident Types in Public and Private Construction Using Web Scraping and Text Mining (웹 스크래핑과 텍스트마이닝을 이용한 공공 및 민간공사의 사고유형 분석)

  • Yoon, Younggeun;Oh, Taekeun
    • The Journal of the Convergence on Culture Technology
    • /
    • v.8 no.5
    • /
    • pp.729-734
    • /
    • 2022
  • Various studies using accident cases are being conducted to identify the causes of accidents in the construction industry, but studies on the differences between public and private construction are insignificant. In this study, web scraping and text mining technologies were applied to analyze the causes of accidents by order type. Through statistical analysis and word cloud analysis of more than 10,000 structured and unstructured data collected, it was confirmed that there was a difference in the types and causes of accidents in public and private construction. In addition, it can contribute to the establishment of safety management measures in the future by identifying the correlation between major accident causes.

Design and implement Web sites for greater user convenience through R based data analysis (R기반의 data분석을 통한 사용자 편의성 증진을 위한 웹사이트 설계 및 구현)

  • Yoon, Kyung Seob;Kim, Yeon Hong
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2018.07a
    • /
    • pp.307-310
    • /
    • 2018
  • 우리 사회는 데이터를 기반으로 진화 하고 있어 데이터분석을 할 수 있는 통계패키지가 오늘날 상용화 되고 있다. 상용화되고 있는 통계패키지를 이용해 본 논문에서는 통계패키지 R을 Model1구조가 아닌 Model2 MVC구조로 적용하여, 웹사이트의 유지보수와 코드 효율성을 증진시키고자 한다. 이를 이용하여 웹 스크래핑을 통한 데이터를 수집 후 데이터 분석을 토대로 사용자가 분석내용을 쉽게 이해할 수 있도록, 편의성을 증진시키고 검색 할 수 있는 웹사이트를 설계 및 구현 하고자 한다.

  • PDF

Analysis of accident types at small and medium-sized construction sites based on web scraping and text mining (웹 스크래핑 및 텍스트마이닝에 기반한 중소규모 건설현장 사고유형 분석)

  • Younggeun Yoon
    • The Journal of the Convergence on Culture Technology
    • /
    • v.10 no.1
    • /
    • pp.609-615
    • /
    • 2024
  • The construction industry's fatality count stands at 402, comprising approximately 46% of total industrial accidents. Notably, construction costs less than 5 billion won account for about 69%, so strengthening safety management at small and medium-sized construction sites is required. In this study, 19,511 accident investigation data were collected using web scraping. Through statistical analysis of the collected structured data and text mining analysis of the unstructured data, accident types and causes of accidents were analyzed by construction costs at sites less than 5 billion won. As a result, it was confirmed that there were differences in accident types and causes depending on the construction costs. It is hoped that the results of this study will be used for customized safety management at small and medium-sized construction sites.

Improving Efficiency of Usage Statistics Collection and Analysis in E-Journal Consortia (컨소시엄 기반 전자저널 이용통계 수집 및 분석 개선 방안)

  • Jung, Young-Im;Kim, Jeong-Hwan
    • Journal of the Korean Society for information Management
    • /
    • v.29 no.2
    • /
    • pp.7-25
    • /
    • 2012
  • The proliferating use of e-journals has led increasing interest in collecting and analyzing usage statistic information. However, the existing manual method and simple journal usage reports provided by publishers hinder the effective collection of large-scale usage statistics and the comprehensive/in-depth analysis on them. Thus we have proposed a hybrid automatic method of collecting e-journal usage statistics based on screen scraping and SUSHI protocol. In addition, the generation method of summary statistics presented in graphs, charts and tables has been suggested in this study. By utilizing the suggested system and analysis data, librarians can compose various reports on budget or operation of the libraries.

Topic Analysis of Papers of JKIICE Using Text Mining (텍스트 마이닝을 이용한 한국정보통신학회 논문지의 주제 분석)

  • Woo, Young Woon;Cho, Kyoung Won;Lee, KwangEui
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2017.10a
    • /
    • pp.74-75
    • /
    • 2017
  • In this paper, we analyzed 3,668 papers of JKIICE from 2007 to 2016 using text mining methods for understanding research fields. We used web scraping programs of Python language for data collection, and utilized topic modeling methods based on LDA algorithm implemented by R language. In the results, we verified that representative research areas of JKIICE could be downsized to 9 areas only by the analysis though the submission areas were 19 areas by 2016.

  • PDF

The Integrated management system of Online marketplace for Intangible goods (무형상품 오픈마켓 통합관리 시스템)

  • Kim, Woochan;Kwak, Ho-Young;Kim, Sanghyuk
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2018.07a
    • /
    • pp.401-402
    • /
    • 2018
  • 최근 다양한 인터넷 쇼핑 서비스가 등장하고 보편화 되었다. 제주도는 국제자유도시로서 관광업에 관련된 서비스 업종이 많이 발달해 있다. 따라서 많은 수의 사업장이 무형상품을 제공하고 있다. 많은 소비자가 인터넷을 통한 구매를 진행하기 때문에 많은 사업장에서 인터넷을 통한 판매를 진행하고 있다. 이 과정에서 많은 사업장에서 오픈 마켓 관리에 어려움을 겪고 있다. 이 문제를 해결하기 위해 무형상품을 위한 오픈 마켓 통합관리 시스템을 구현하였다.

  • PDF

Header Text Generation based on Structural Information of Table (테이블 구조 정보를 활용한 헤더 텍스트 생성)

  • Haemin Jung;Myoseop Sim;Kyungkoo Min;Jooyoung Choi;Minjun Park;Stanley Jungkyu Choi
    • Annual Conference on Human and Language Technology
    • /
    • 2023.10a
    • /
    • pp.415-418
    • /
    • 2023
  • 테이블 데이터는 일반적으로 헤더와 데이터로 구성되며, 헤더는 데이터의 구조와 내용을 이해하는데 중요한 역할을 한다. 하지만 웹 스크래핑 등을 통해 얻은 데이터와 같이 다양한 상황에서 헤더 정보가 누락될 수 있다. 수동으로 헤더를 생성하는 것은 시간이 많이 걸리고 비효율적이기 때문에, 본 논문에서는 자동으로 헤더를 생성하는 태스크를 정의하고 이를 해결하기 위한 모델을 제안한다. 이 모델은 BART를 기반으로 각 열을 구성하는 텍스트와 열 간의 관계를 분석하여 헤더 텍스트를 생성한다. 이 과정을 통해 테이블 데이터의 구성요소 간의 관계에 대해 이해하고, 테이블 데이터의 헤더를 생성하여 다양한 애플리케이션에서의 활용할 수 있다. 실험을 통해 그 성능을 평가한 결과, 테이블 구조 정보를 종합적으로 활용하는 것이 더 높은 성능을 보임을 확인하였다.

  • PDF

'GANerate', A Mass Image Creation and TradingPlatform based on User Input using GAN (GAN을 활용한 사용자 입력 기반의 대량 이미지 생성 및 거래 플랫폼 'GANerate')

  • Choi-Pil Hwa;Han-Jong Won;Choi-Yeon A;Park-Jeong Min;Sang-Oh Yoo
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.11a
    • /
    • pp.922-923
    • /
    • 2023
  • 인터넷에는 많은 이미지 데이터가 존재하지만, 대규모 이미지를 효과적으로 수집하는 것은 어려운 실정이다. 본 논문은 GAN을 통해 사용자가 지정한 개수만큼 원하는 이미지를 생성하는 웹 플랫폼을 제안한다. 기존의 단일 이미지다운, 크롤링, 웹 스크래핑을 통한 이미지 데이터 수집 방법보다 다량 이미지 데이터를 안전하게 수집할 수 있을 것으로 기대된다.

A Fuzzy-AHP-based Movie Recommendation System using the GRU Language Model (GRU 언어 모델을 이용한 Fuzzy-AHP 기반 영화 추천 시스템)

  • Oh, Jae-Taek;Lee, Sang-Yong
    • Journal of Digital Convergence
    • /
    • v.19 no.8
    • /
    • pp.319-325
    • /
    • 2021
  • With the advancement of wireless technology and the rapid growth of the infrastructure of mobile communication technology, systems applying AI-based platforms are drawing attention from users. In particular, the system that understands users' tastes and interests and recommends preferred items is applied to advanced e-commerce customized services and smart homes. However, there is a problem that these recommendation systems are difficult to reflect in real time the preferences of various users for tastes and interests. In this research, we propose a Fuzzy-AHP-based movies recommendation system using the Gated Recurrent Unit (GRU) language model to address a problem. In this system, we apply Fuzzy-AHP to reflect users' tastes or interests in real time. We also apply GRU language model-based models to analyze the public interest and the content of the film to recommend movies similar to the user's preferred factors. To validate the performance of this recommendation system, we measured the suitability of the learning model using scraping data used in the learning module, and measured the rate of learning performance by comparing the Long Short-Term Memory (LSTM) language model with the learning time per epoch. The results show that the average cross-validation index of the learning model in this work is suitable at 94.8% and that the learning performance rate outperforms the LSTM language model.

Implementation of Standard Platform for Distributing Usage Statistics of Digital Scholarly Information (전자학술정보 이용통계 유통을 위한 표준 플랫폼 구축)

  • Jung, Youngim;Kim, Jayhoon;Kim, Kwangyoung;Kim, Hwanmin
    • The Journal of Society for e-Business Studies
    • /
    • v.19 no.4
    • /
    • pp.61-72
    • /
    • 2014
  • Recently usage on digital scholarly information has been analyzed with various respects by various parties since the rapid expansion of digital scholarly information use and the increasing availability of large-scale log data. Nevertheless, no standard platform for distributing usage statistics of scholarly information at the national scale has been suggested so far. Therefore, this paper suggests a generalized SUSHI (Standardized Usage Statistics Harvesting Initiative) platform for distributing usage statistics of digital scholarly information at the national scale.