• Title/Summary/Keyword: Web Text Analysis

Search Result 279, Processing Time 0.025 seconds

Analysis of accident types at small and medium-sized construction sites based on web scraping and text mining (웹 스크래핑 및 텍스트마이닝에 기반한 중소규모 건설현장 사고유형 분석)

  • Younggeun Yoon
    • The Journal of the Convergence on Culture Technology
    • /
    • v.10 no.1
    • /
    • pp.609-615
    • /
    • 2024
  • The construction industry's fatality count stands at 402, comprising approximately 46% of total industrial accidents. Notably, construction costs less than 5 billion won account for about 69%, so strengthening safety management at small and medium-sized construction sites is required. In this study, 19,511 accident investigation data were collected using web scraping. Through statistical analysis of the collected structured data and text mining analysis of the unstructured data, accident types and causes of accidents were analyzed by construction costs at sites less than 5 billion won. As a result, it was confirmed that there were differences in accident types and causes depending on the construction costs. It is hoped that the results of this study will be used for customized safety management at small and medium-sized construction sites.

Construction of Korean FrameNet through Manual Translation of English FrameNet (영어 FrameNet의 수동번역을 통한 한국어 FrameNet 구축 개발)

  • Nam, Sejin;Kim, Youngsik;Park, Jungyeul;Hahm, Younggyun;Hwang, Dosam;Choi, Key-Sun
    • Annual Conference on Human and Language Technology
    • /
    • 2014.10a
    • /
    • pp.38-43
    • /
    • 2014
  • 본 논문은, 현존하는 영어 FrameNet 데이터를 기반으로 하여, FrameNet에 대한 전문 지식이 없는 번역가들을 통해 수행할 수 있는 한국어 FrameNet의 수동 구축 개발 과정을 제시한다. 우리 연구팀은 실제로, NLTK가 제공하는 영어 FrameNet 버전 1.5의 Full Text를 이루고 있는 5,945개의 문장들 중에서, Frame 데이터를 가진 4,025개의 문장들을 추출해내어, 번역가들에 의해 한국어로 수동번역 함으로써, 한국어 FrameNet 구축 개발을 향한 의미 있는 초석을 마련하였으며, 제시한 방법의 실효성을 입증하는 연구결과들을 웹에 공개하기도 하였다.

  • PDF

Comparison and Analysis of Web Accessibility for the Korea, USA, and Japan's Broadcast Web Sites (한·미·일 지상파 방송사의 웹 접근성 비교·분석)

  • Park, Seong-Je;Kim, Yung-Keun;Kim, Jong-Weon
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.19 no.4
    • /
    • pp.105-117
    • /
    • 2014
  • Acquisition of information through the broadcast media is essential for modern life and each broadcaster has progressed its service over the internet with the development of digital technology. Under this circumstance, this study presented the results which compared and analyzed the web accessibility evaluation for Korea, USA, and Japan's leading broadcaster web sites. According to the study results, there was no significant difference in the level of accessibility in all web sites of three countries, but accessibility compliance rate such as alternate text, skip-navigation of repeated region, and title was somewhat insufficient for Korean web sites. In addition, accessibility errors in the brightness contrast of the text contents, the run of the functions that a user doesn't have any intention, the clear statement of the default language, and the label provision were investigated. Therefore, Korean broadcasters should urgently improve and modify these errors and problems for effective web accessibility.

Analysis on Topic Trends and Topic Modeling of KSHSM Journal Papers using Text Mining (텍스트마이닝을 활용한 보건의료산업학회지의 토픽 모델링 및 토픽트렌드 분석)

  • Cho, Kyoung-Won;Bae, Sung-Kwon;Woo, Young-Woon
    • The Korean Journal of Health Service Management
    • /
    • v.11 no.4
    • /
    • pp.213-224
    • /
    • 2017
  • Objectives : The purpose of this study was to analyze representative topics and topic trends of papers in Korean Society and Health Service Management(KSHSM) Journal. Methods : We collected English abstracts and key words of 516 papers in KSHSM Journal from 2007 to 2017. We utilized Python web scraping programs for collecting the papers from Korea Citation Index web site, and RStudio software for topic analysis based on latent Dirichlet allocation algorithm. Results : 9 topics were decided as the best number of topics by perplexity analysis and the resultant 9 topics for all the papers were extracted using Gibbs sampling method. We could refine 9 topics to 5 topics by deep consideration of meanings of each topics and analysis of intertopic distance map. In topic trends analysis from 2007 to 2017, we could verify 'Health Management' and 'Hospital Service' were two representative topics, and 'Hospital Service' was prevalent topic by 2011, but the ratio of the two topics became to be similar from 2012. Conclusions : We discovered 5 topics were the best number of topics and the topic trends reflected the main issues of KSHSM Journal, such as name revision of the society in 2012.

Layout Analysis for Calculation of Web Page Similarity as Image

  • Mitsuhashi, Noriaki;Yamaguchi, Toru;Takama, Yasufumi
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2003.09a
    • /
    • pp.142-145
    • /
    • 2003
  • When we search information on the Web using search engines, they only analyze the text information collected from the source files of Web pages. However, there is a limit to analyze the layout of a Web page only from its source file, although Web page design is the most important factor for a user to estimate a page. In particular it often happens on the Web that the pages of similar design ofter similar information. We propose a method to analyze layout for comparing the design of pages by treating the displayed page as image.

  • PDF

A Study for Research Area of Library and Information Science by Network Text Analysis (네트워크 텍스트 분석을 통한 문헌정보학 최근 연구 경향 분석)

  • Cho, Jane
    • Journal of the Korean Society for information Management
    • /
    • v.28 no.4
    • /
    • pp.65-83
    • /
    • 2011
  • In this study, Network Text Analysis was performed on 1,752 articles which had been published in recent 7 years and drew the subject concept distribution and their relations in Library and Information Science research areas. Furthermore, for analyzing more recent trends and changing aspects, this study performed secondary analysis based on 482 articles published in recent 2 years. Results show that "public library", and "academic library" concepts were most frequently studied in the field and "evaluation", "education", and "web" concepts showed the highest-degree centrality during the recent 7 years. In the result of recent two years analysis, "web", and "classification" concepts showed high frequency and "user", and "public library" showed an improvement in high degree centrality.

Web Accessibility Evaluation of Professional Sports Clubs in Korea (프로스포츠 웹 사이트의 접근성 평가)

  • Choi, Kyoung-Ho;You, Kang-Soo
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.16 no.3
    • /
    • pp.399-406
    • /
    • 2012
  • The government is supporting the law not to be uncomfortable in all departments of some sports and cultural activities for the handicapped, making the Welfare Law for People with Disability(Article 25) in Korea. Moreover web sites which are places of business more than 300 employees including other public organizations are making it mandatory to observe web accessibility for the handicapped. This study analyzed in statistical aspects to investigate systematically how professional sports clubs observe the accessibility of web site to some degree. As a result, it turned out that the compliance record on the items of the providing of text alternatives(44.92%) for non-text content and the keyboard accessible(46.79%) was low. However, by and large we are able to recognize that the compliance record of the web site is on an increasing trend with the course of time.

Text Extraction Algorithm using the HTML Logical Structure Analysis (HTML 논리적 구조분석을 통한 본문추출 알고리즘)

  • Jeon, Hyun-Gee;KOH, Chan
    • Journal of Digital Contents Society
    • /
    • v.16 no.3
    • /
    • pp.445-455
    • /
    • 2015
  • According as internet and computer technology develops, the amount of information has increased exponentially, arising from a variety of web authoring tools and is a new web standard of appearance and a wide variety of web content accessibility as more convenient for the web are produced very quickly. However, web documents are put out on a variety of topics divided into some blocks where each of the blocks are dealing with a topic unrelated to one another as well as you can not see with contents such as many navigations, simple decorations, advertisements, copyright. Extract only the exact area of the web document body to solve this problem and to meet user requirements, and to study the effective information. Later on, as the reconstruction method, we propose a web search system can be optimized systematically manage documents.

Improving spaCy dependency annotation and PoS tagging web service using independent NER services

  • Colic, Nico;Rinaldi, Fabio
    • Genomics & Informatics
    • /
    • v.17 no.2
    • /
    • pp.21.1-21.6
    • /
    • 2019
  • Dependency parsing is often used as a component in many text analysis pipelines. However, performance, especially in specialized domains, suffers from the presence of complex terminology. Our hypothesis is that including named entity annotations can improve the speed and quality of dependency parses. As part of BLAH5, we built a web service delivering improved dependency parses by taking into account named entity annotations obtained by third party services. Our evaluation shows improved results and better speed.

Comparision and Analysis of Algorithm for web Sites Researching (웹 사이트 탐색 알고리즘 비교분석)

  • 김덕수;권영직
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.8 no.3
    • /
    • pp.91-98
    • /
    • 2003
  • Visitors who browse the web from wireless PDAs, cell phones are frequently frustrated by interfaces. Simply replacing graphics with text and reformatting tables does not solve this problem, because deep link structures can still require more time. To solve this problem, in the paper we propose an algorithm, Minimal Path Algorithm that automatically improves wireless web navigation by suggesting useful shortcut links in real time. In the result of this paper, Minimal Path algorithm offer the shortcut and the number of shortest links to web users.

  • PDF